Linus Torvalds [Thu, 17 Dec 2015 22:05:22 +0000 (14:05 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix uninitialized variable warnings in nfnetlink_queue, a lot of
people reported this... From Arnd Bergmann.
2) Don't init mutex twice in i40e driver, from Jesse Brandeburg.
3) Fix spurious EBUSY in rhashtable, from Herbert Xu.
4) Missing DMA unmaps in mvpp2 driver, from Marcin Wojtas.
5) Fix race with work structure access in pppoe driver causing
corruptions, from Guillaume Nault.
6) Fix OOPS due to sh_eth_rx() not checking whether netdev_alloc_skb()
actually succeeded or not, from Sergei Shtylyov.
7) Don't lose flags when settifn IFA_F_OPTIMISTIC in ipv6 code, from
Bjørn Mork.
8) VXLAN_HD_RCO defined incorrectly, fix from Jiri Benc.
9) Fix clock source used for cookies in SCTP, from Marcelo Ricardo
Leitner.
10) aurora driver needs HAS_DMA dependency, from Geert Uytterhoeven.
11) ndo_fill_metadata_dst op of vxlan has to handle ipv6 tunneling
properly as well, from Jiri Benc.
12) Handle request sockets properly in xfrm layer, from Eric Dumazet.
13) Double stats update in ipv6 geneve transmit path, fix from Pravin B
Shelar.
14) sk->sk_policy[] needs RCU protection, and as a result
xfrm_policy_destroy() needs to free policies using an RCU grace
period, from Eric Dumazet.
15) SCTP needs to clone ipv6 tx options in order to avoid use after
free, from Eric Dumazet.
16) Missing kbuild export if ila.h, from Stephen Hemminger.
17) Missing mdiobus_alloc() return value checking in mdio-mux.c, from
Tobias Klauser.
18) Validate protocol value range in ->create() methods, from Hannes
Frederic Sowa.
19) Fix early socket demux races that result in illegal dst reuse, from
Eric Dumazet.
20) Validate socket address length in pptp code, from WANG Cong.
21) skb_reorder_vlan_header() uses incorrect offset and can corrupt
packets, from Vlad Yasevich.
22) Fix memory leaks in nl80211 registry code, from Ola Olsson.
23) Timeout loop count handing fixes in mISDN, xgbe, qlge, sfc, and
qlcnic. From Dan Carpenter.
24) msg.msg_iocb needs to be cleared in recvfrom() otherwise, for
example, AF_ALG will interpret it as an async call. From Tadeusz
Struk.
25) inetpeer_set_addr_v4 forgets to initialize the 'vif' field, from
Eric Dumazet.
26) rhashtable enforces the minimum table size not early enough,
breaking how we calculate the per-cpu lock allocations. From
Herbert Xu.
27) Fix FCC port lockup in 82xx driver, from Martin Roth.
28) FOU sockets need to be freed using RCU, from Hannes Frederic Sowa.
29) Fix out-of-bounds access in __skb_complete_tx_timestamp() and
sock_setsockopt() wrt. timestamp handling. From WANG Cong.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (117 commits)
net: check both type and procotol for tcp sockets
drivers: net: xgene: fix Tx flow control
tcp: restore fastopen with no data in SYN packet
af_unix: Revert 'lock_interruptible' in stream receive code
fou: clean up socket with kfree_rcu
82xx: FCC: Fixing a bug causing to FCC port lock-up
gianfar: Don't enable RX Filer if not supported
net: fix warnings in 'make htmldocs' by moving macro definition out of field declaration
rhashtable: Fix walker list corruption
rhashtable: Enforce minimum size on initial hash table
inet: tcp: fix inetpeer_set_addr_v4()
ipv6: automatically enable stable privacy mode if stable_secret set
net: fix uninitialized variable issue
bluetooth: Validate socket address length in sco_sock_bind().
net_sched: make qdisc_tree_decrease_qlen() work for non mq
ser_gigaset: remove unnecessary kfree() calls from release method
ser_gigaset: fix deallocation of platform device structure
ser_gigaset: turn nonsense checks into WARN_ON
ser_gigaset: fix up NULL checks
qlcnic: fix a timeout loop
...
WANG Cong [Thu, 17 Dec 2015 07:39:04 +0000 (23:39 -0800)]
net: check both type and procotol for tcp sockets
Dmitry reported the following out-of-bound access:
Call Trace:
[<
ffffffff816cec2e>] __asan_report_load4_noabort+0x3e/0x40
mm/kasan/report.c:294
[<
ffffffff84affb14>] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
[< inline >] SYSC_setsockopt net/socket.c:1746
[<
ffffffff84aed7ee>] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
[<
ffffffff85c18c76>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
This is because we mistake a raw socket as a tcp socket.
We should check both sk->sk_type and sk->sk_protocol to ensure
it is a tcp socket.
Willem points out __skb_complete_tx_timestamp() needs to fix as well.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Thu, 17 Dec 2015 06:26:05 +0000 (22:26 -0800)]
drivers: net: xgene: fix Tx flow control
Currently the Tx flow control is based on reading the hardware state,
which is not accurate since it may not reflect the descriptors that
are not yet reached the memory.
To accurately control the Tx flow, changing it to be software based.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Dec 2015 21:53:10 +0000 (13:53 -0800)]
tcp: restore fastopen with no data in SYN packet
Yuchung tracked a regression caused by commit
57be5bdad759 ("ip: convert
tcp_sendmsg() to iov_iter primitives") for TCP Fast Open.
Some Fast Open users do not actually add any data in the SYN packet.
Fixes:
57be5bdad759 ("ip: convert tcp_sendmsg() to iov_iter primitives")
Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rainer Weikusat [Wed, 16 Dec 2015 20:09:25 +0000 (20:09 +0000)]
af_unix: Revert 'lock_interruptible' in stream receive code
With
b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
receive code was changed from using mutex_lock(&u->readlock) to
mutex_lock_interruptible(&u->readlock) to prevent signals from being
delayed for an indefinite time if a thread sleeping on the mutex
happened to be selected for handling the signal. But this was never a
problem with the stream receive code (as opposed to its datagram
counterpart) as that never went to sleep waiting for new messages with the
mutex held and thus, wouldn't cause secondary readers to block on the
mutex waiting for the sleeping primary reader. As the interruptible
locking makes the code more complicated in exchange for no benefit,
change it back to using mutex_lock.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 17 Dec 2015 19:55:29 +0000 (11:55 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Some i915 fixes, one omap fix, one core regression fix.
Not even enough fixes for a twelve days of xmas song, which seemms
good"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm: Don't overwrite UNVERFIED mode status to OK
drm/omap: fix fbdev pix format to support all platforms
drm/i915: Do a better job at disabling primary plane in the noatomic case.
drm/i915/skl: Double RC6 WRL always on
drm/i915/skl: Disable coarse power gating up until F0
drm/i915: Remove incorrect warning in context cleanup
Will Deacon [Fri, 11 Dec 2015 17:46:41 +0000 (17:46 +0000)]
locking/osq: Fix ordering of node initialisation in osq_lock
The Cavium guys reported a soft lockup on their arm64 machine, caused by
commit
c55a6ffa6285 ("locking/osq: Relax atomic semantics"):
mutex_optimistic_spin+0x9c/0x1d0
__mutex_lock_slowpath+0x44/0x158
mutex_lock+0x54/0x58
kernfs_iop_permission+0x38/0x70
__inode_permission+0x88/0xd8
inode_permission+0x30/0x6c
link_path_walk+0x68/0x4d4
path_openat+0xb4/0x2bc
do_filp_open+0x74/0xd0
do_sys_open+0x14c/0x228
SyS_openat+0x3c/0x48
el0_svc_naked+0x24/0x28
This is because in osq_lock we initialise the node for the current CPU:
node->locked = 0;
node->next = NULL;
node->cpu = curr;
and then publish the current CPU in the lock tail:
old = atomic_xchg_acquire(&lock->tail, curr);
Once the update to lock->tail is visible to another CPU, the node is
then live and can be both read and updated by concurrent lockers.
Unfortunately, the ACQUIRE semantics of the xchg operation mean that
there is no guarantee the contents of the node will be visible before
lock tail is updated. This can lead to lock corruption when, for
example, a concurrent locker races to set the next field.
Fixes:
c55a6ffa6285 ("locking/osq: Relax atomic semantics"):
Reported-by: David Daney <ddaney@caviumnetworks.com>
Reported-by: Andrew Pinski <andrew.pinski@caviumnetworks.com>
Tested-by: Andrew Pinski <andrew.pinski@caviumnetworks.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1449856001-21177-1-git-send-email-will.deacon@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 17 Dec 2015 19:20:13 +0000 (11:20 -0800)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- Two bug fixes for misuse of PAGE_MASK in scatterlist and dma-debug.
These are tagged for -stable. The scatterlist impact is potentially
corrupted dma addresses on HIGHMEM enabled platforms.
- A minor locking fix for the NFIT hot-add implementation that is new
in 4.4-rc. This would only trigger in the case a hot-add raced
driver removal.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dma-debug: Fix dma_debug_entry offset calculation
Revert "scatterlist: use sg_phys()"
nfit: acpi_nfit_notify(): Do not leave device locked
Hannes Frederic Sowa [Tue, 15 Dec 2015 20:01:53 +0000 (21:01 +0100)]
fou: clean up socket with kfree_rcu
fou->udp_offloads is managed by RCU. As it is actually included inside
the fou sockets, we cannot let the memory go out of scope before a grace
period. We either can synchronize_rcu or switch over to kfree_rcu to
manage the sockets. kfree_rcu seems appropriate as it is used by vxlan
and geneve.
Fixes:
23461551c00628c ("fou: Support for foo-over-udp RX path")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Dec 2015 23:33:38 +0000 (18:33 -0500)]
Merge tag 'mac80211-for-davem-2015-12-15' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Another set of fixes:
* memory leak fixes (from Ola)
* operating mode notification spec compliance fix (from Eyal)
* copy rfkill names in case pointer becomes invalid (myself)
* two hardware restart fixes (myself)
* get rid of "limiting TX power" log spam (myself)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Roth [Tue, 15 Dec 2015 02:17:53 +0000 (04:17 +0200)]
82xx: FCC: Fixing a bug causing to FCC port lock-up
The patch fixes FCC port lock-up, which occurs as a result of a bug
during underrun/collision handling. Within the tx_startup() function
in mac-fcc.c, the address of last BD is not calculated correctly.
As a result of wrong calculation of the last BD address, the next
transmitted BD may be set to an area out of the transmit BD ring.
This actually causes to port lock-up and it is not recoverable.
Signed-off-by: Martin Roth <martin.roth@motorolasolutions.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hamish Martin [Tue, 15 Dec 2015 01:14:50 +0000 (14:14 +1300)]
gianfar: Don't enable RX Filer if not supported
After commit
15bf176db1fb ("gianfar: Don't enable the Filer w/o the
Parser"), 'TSEC' model controllers (for example as seen on MPC8541E)
always have 8 bytes stripped from the front of received frames.
Only 'eTSEC' gianfar controllers have the RX Filer capability (amongst
other enhancements). Previously this was treated as always enabled
for both 'TSEC' and 'eTSEC' controllers.
In commit
15bf176db1fb ("gianfar: Don't enable the Filer w/o the Parser")
a subtle change was made to the setting of 'uses_rxfcb' to effectively
always set it (since 'rx_filer_enable' was always true). This had the
side-effect of always stripping 8 bytes from the front of received frames
on 'TSEC' type controllers.
We now only enable the RX Filer capability on controller types that
support it, thereby avoiding the issue for 'TSEC' type controllers.
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Reviewed-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mentz [Wed, 16 Dec 2015 01:38:48 +0000 (17:38 -0800)]
dma-debug: Fix dma_debug_entry offset calculation
dma-debug uses struct dma_debug_entry to keep track of dma coherent
memory allocation requests. The virtual address is converted into a pfn
and an offset. Previously, the offset was calculated using an incorrect
bit mask. As a result, we saw incorrect error messages from dma-debug
like the following:
"DMA-API: exceeded 7 overlapping mappings of cacheline 0x03e00000"
Cacheline 0x03e00000 does not exist on our platform.
Cc: <stable@vger.kernel.org>
Fixes:
0abdd7a81b7e ("dma-debug: introduce debug_dma_assert_idle()")
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Linus Torvalds [Wed, 16 Dec 2015 18:57:24 +0000 (10:57 -0800)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Further ARM fixes:
- Anson Huang noticed that we were corrupting a register we shouldn't
be during suspend on some CPUs.
- Shengjiu Wang spotted a bug in the 'swp' instruction emulation.
- Will Deacon fixed a bug in the ASID allocator.
- Laura Abbott fixed the kernel permission protection to apply to all
threads running in the system.
- I've fixed two bugs with the domain access control register
handling, one to do with printing an appropriate value at oops
time, and the other to further fix the uaccess_with_memcpy code"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8475/1: SWP emulation: Restore original *data when failed
ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted
ARM: fix uaccess_with_memcpy() with SW_DOMAIN_PAN
ARM: report proper DACR value in oops dumps
ARM: 8464/1: Update all mm structures with section adjustments
ARM: 8465/1: mm: keep reserved ASIDs in sync with mm after multiple rollovers
Hannes Frederic Sowa [Mon, 14 Dec 2015 22:30:43 +0000 (23:30 +0100)]
net: fix warnings in 'make htmldocs' by moving macro definition out of field declaration
Docbook does not like the definition of macros inside a field declaration
and adds a warning. Move the definition out.
Fixes:
79462ad02e86180 ("net: add validation for the socket syscall protocol argument")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 16 Dec 2015 08:45:54 +0000 (16:45 +0800)]
rhashtable: Fix walker list corruption
The commit
ba7c95ea3870fe7b847466d39a049ab6f156aa2c ("rhashtable:
Fix sleeping inside RCU critical section in walk_stop") introduced
a new spinlock for the walker list. However, it did not convert
all existing users of the list over to the new spin lock. Some
continued to use the old mutext for this purpose. This obviously
led to corruption of the list.
The fix is to use the spin lock everywhere where we touch the list.
This also allows us to do rcu_rad_lock before we take the lock in
rhashtable_walk_start. With the old mutex this would've deadlocked
but it's safe with the new spin lock.
Fixes:
ba7c95ea3870 ("rhashtable: Fix sleeping inside RCU...")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 16 Dec 2015 10:13:14 +0000 (18:13 +0800)]
rhashtable: Enforce minimum size on initial hash table
William Hua <william.hua@canonical.com> wrote:
>
> I wasn't aware there was an enforced minimum size. I simply set the
> nelem_hint in the rhastable_params struct to 1, expecting it to grow as
> needed. This caused a segfault afterwards when trying to insert an
> element.
OK we're doing the size computation before we enforce the limit
on min_size.
---8<---
We need to do the initial hash table size computation after we
have obtained the correct min_size/max_size parameters. Otherwise
we may end up with a hash table whose size is outside the allowed
envelope.
Fixes:
a998f712f77e ("rhashtable: Round up/down min/max_size to...")
Reported-by: William Hua <william.hua@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Dec 2015 04:56:44 +0000 (20:56 -0800)]
inet: tcp: fix inetpeer_set_addr_v4()
David Ahern added a vif field in the a4 part of inetpeer_addr struct.
This broke IPv4 TCP fast open client side and more generally tcp metrics
cache, because inetpeer_addr_cmp() is now comparing two u32 instead of
one.
inetpeer_set_addr_v4() needs to properly init vif field, otherwise
the comparison result depends on uninitialized data.
Fixes:
192132b9a034 ("net: Add support for VRFs to inetpeer cache")
Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Tue, 15 Dec 2015 21:59:12 +0000 (22:59 +0100)]
ipv6: automatically enable stable privacy mode if stable_secret set
Bjørn reported that while we switch all interfaces to privacy stable mode
when setting the secret, we don't set this mode for new interfaces. This
does not make sense, so change this behaviour.
Fixes:
622c81d57b392cc ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Reported-by: Bjørn Mork <bjorn@mork.no>
Cc: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Williams [Tue, 15 Dec 2015 20:54:06 +0000 (12:54 -0800)]
Revert "scatterlist: use sg_phys()"
commit
db0fa0cb0157 "scatterlist: use sg_phys()" did replacements of
the form:
phys_addr_t phys = page_to_phys(sg_page(s));
phys_addr_t phys = sg_phys(s) & PAGE_MASK;
However, this breaks platforms where sizeof(phys_addr_t) >
sizeof(unsigned long). Revert for 4.3 and 4.4 to make room for a
combined helper in 4.5.
Cc: <stable@vger.kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes:
db0fa0cb0157 ("scatterlist: use sg_phys()")
Suggested-by: Joerg Roedel <joro@8bytes.org>
Reported-by: Vitaly Lavrov <vel21ripn@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
tadeusz.struk@intel.com [Tue, 15 Dec 2015 18:46:17 +0000 (10:46 -0800)]
net: fix uninitialized variable issue
msg_iocb needs to be initialized on the recv/recvfrom path.
Otherwise afalg will wrongly interpret it as an async call.
Cc: stable@vger.kernel.org
Reported-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Dec 2015 20:39:08 +0000 (15:39 -0500)]
bluetooth: Validate socket address length in sco_sock_bind().
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 15 Dec 2015 18:56:39 +0000 (10:56 -0800)]
Merge tag 'dmaengine-fix-4.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"This has fixes spread thru driver, notably among them:
- edma fixes for recent edma DT changes which went into 4.4
- odd fixes for at_hdmac
- minor fixes on bc dma and mic dma"
* tag 'dmaengine-fix-4.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: at_xdmac: fix at_xdmac_prep_dma_memcpy()
dmaengine: edma: DT: Change reserved slot array from 16bit to 32bit type
dmaengine: edma: DT: Change memcpy channel array from 16bit to 32bit type
dmaengine: mic_x100: add missing spin_unlock
dmaengine: bcm2835-dma: Convert to use DMA pool
dmaengine: at_xdmac: fix bad behavior in interleaved mode
dmaengine: at_xdmac: fix false condition for memset_sg transfers
dmaengine: at_xdmac: fix macro typo
Linus Torvalds [Tue, 15 Dec 2015 18:50:13 +0000 (10:50 -0800)]
Merge tag 'fbdev-fixes-4.4' of git://git./linux/kernel/git/tomba/linux
Pull two fbdev fixes from Tomi Valkeinen:
- OMAP: fix analog tv-out when using omapdrm
- fsl: Fix kernel crash when diu_ops is not implemented
* tag 'fbdev-fixes-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
OMAPDSS: fix timings for VENC to match what omapdrm expects
video: fbdev: fsl: Fix kernel crash when diu_ops is not implemented
Linus Torvalds [Tue, 15 Dec 2015 18:45:29 +0000 (10:45 -0800)]
Merge tag 'please-pull-mlock2' of git://git./linux/kernel/git/aegl/linux
Pull ia64 fix from Tony Luck:
"Wire up mlock2() syscall for ia64"
* tag 'please-pull-mlock2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
[IA64] Enable mlock2 syscall for ia64
Eric Dumazet [Tue, 15 Dec 2015 17:43:12 +0000 (09:43 -0800)]
net_sched: make qdisc_tree_decrease_qlen() work for non mq
Stas Nichiporovich reported a regression in his HFSC qdisc setup
on a non multi queue device.
It turns out I mistakenly added a TCQ_F_NOPARENT flag on all qdisc
allocated in qdisc_create() for non multi queue devices, which was
rather buggy. I was clearly mislead by the TCQ_F_ONETXQUEUE that is
also set here for no good reason, since it only matters for the root
qdisc.
Fixes:
4eaf3b84f288 ("net_sched: fix qdisc_tree_decrease_qlen() races")
Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Tested-by: Stas Nichiporovich <stasn77@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Dec 2015 18:24:22 +0000 (13:24 -0500)]
Merge branch 'ser_gigaset-platform-device-dealloc'
Paul Bolle says:
====================
ser_gigaset: fix deallocation of platform device structure
Sascha Levin reported that the syzkaller fuzzer triggered a WARNING in
ser_gigaset (see https://lkml.kernel.org/g/
56587467.
8050102@oracle.com ). It
turned out that ser_gigaset has always deallocated its platform device
structure incorrectly. Tilman submitted the patch that fixes that (3/4) and a
related cleanup (4/4).
Tilman also submitted a minor cleanup of some NULL checks (1/4) that prompted
Alan to turn those checks into WARN_ONs (2/4). If no one hits these WARN_ONs in
the next couple of releases these WARN_ONs should be removed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Tue, 15 Dec 2015 17:11:31 +0000 (18:11 +0100)]
ser_gigaset: remove unnecessary kfree() calls from release method
device->platform_data and platform_device->resource are never used
and remain NULL through their entire life. Drops the kfree() calls
for them from the device release method.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Tue, 15 Dec 2015 17:11:30 +0000 (18:11 +0100)]
ser_gigaset: fix deallocation of platform device structure
When shutting down the device, the struct ser_cardstate must not be
kfree()d immediately after the call to platform_device_unregister()
since the embedded struct platform_device is still in use.
Move the kfree() call to the release method instead.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Fixes:
2869b23e4b95 ("drivers/isdn/gigaset: new M101 driver (v2)")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alan Cox [Tue, 15 Dec 2015 17:11:29 +0000 (18:11 +0100)]
ser_gigaset: turn nonsense checks into WARN_ON
These checks do nothing useful to protect the code from races. On the
other hand if the old code has been masking a real bug we would like to
know about it.
The check for tiocmset is kept because it is valid for a tty driver to
have a NULL tiocmset method. That in itself is probably a mistake given
modern coding practices - but needs fixing in the tty layer.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Tue, 15 Dec 2015 17:11:28 +0000 (18:11 +0100)]
ser_gigaset: fix up NULL checks
Commit
f34d7a5b7010 ("tty: The big operations rework") changed
tty->driver to tty->ops but left NULL checks for tty->driver untouched.
Fix.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
[pebolle: removed Fixes tag]
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 15 Dec 2015 18:21:04 +0000 (10:21 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a boundary condition in the blkcipher SG walking code that
can lead to a crash when used with the new chacha20 algorithm"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: skcipher - Copy iv from desc even for 0-len walks
Linus Torvalds [Tue, 15 Dec 2015 18:15:57 +0000 (10:15 -0800)]
Fix user-visible spelling error
Pavel Machek reports a warning about W+X pages found in the "Persisent"
kmap area. After grepping for it (using the correct spelling), and not
finding it, I noticed how the debug printk was just misspelled. Fix it.
The actual mapping bug that Pavel reported is still open. It's
apparently a separate issue from the known EFI page tables, looks like
it's related to the HIGHMEM mappings.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Tue, 15 Dec 2015 13:56:16 +0000 (16:56 +0300)]
qlcnic: fix a timeout loop
The problem here is that at the end of the loop we test for if
idc->vnic_wait_limit is zero, but since idc->vnic_wait_limit-- is a
post-op, it actually ends up set to (u8)-1. I have fixed this by
moving the decrement inside the loop.
Fixes:
486a5bc77a4a ('qlcnic: Add support for 83xx suspend and resume.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 15 Dec 2015 11:06:08 +0000 (14:06 +0300)]
sfc: fix a timeout loop
We test for if "tries" is zero at the end but "tries--" is a post-op so
it will end with "tries" set to -1. I have changed it to a pre-op
instead.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 15 Dec 2015 10:52:36 +0000 (13:52 +0300)]
qlge: fix a timeout loop in ql_change_rx_buffers()
The problem here is that after the loop we test for "if (!i) " but
because "i--" is a post-op we exit with i set to -1. I have fixed this
by changing it to a pre-op instead. I had to change the starting value
from 3 to 4 so that we still iterate 3 times.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 15 Dec 2015 10:12:29 +0000 (13:12 +0300)]
amd-xgbe: fix a couple timeout loops
At the end of the loop we test "if (!count)" but because "count--" is
a post-op then the loop will end with count set to -1. I have fixed
this by changing it to --count.
Fixes:
c5aa9e3b8156 ('amd-xgbe: Initial AMD 10GbE platform driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 15 Dec 2015 10:07:52 +0000 (13:07 +0300)]
mISDN: fix a loop count
There are two issue here.
1) cnt starts as maxloop + 1 so all these loops iterate one more time
than intended.
2) At the end of the loop we test for "if (maxloop && !cnt)" but for
the first two loops, we end with cnt equal to -1. Changing this to
a pre-op means we end with cnt set to 0.
Fixes:
cae86d4a4e56 ('mISDN: Add driver for Infineon ISDN chipset family')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrzej Hajda [Mon, 14 Dec 2015 10:05:58 +0000 (11:05 +0100)]
net/mlx4_core: fix handling return value of mlx4_slave_convert_port
The function can return negative values, so its result should
be assigned to signed variable.
The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
[1]: http://permalink.gmane.org/gmane.linux.kernel/
2046107
Fixes:
fc48866f7 ('net/mlx4: Adapt code for N-Port VF')
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eyal Shapira [Tue, 8 Dec 2015 14:04:36 +0000 (16:04 +0200)]
mac80211: handle width changes from opmode notification IE in beacon
An AP can send an operating channel width change in a beacon
opmode notification IE as long as there's a change in the nss as
well (See 802.11ac-2013 section 10.41).
So don't limit updating to nss only from an opmode notification IE.
Signed-off-by: Eyal Shapira <eyalx.shapira@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Tue, 8 Dec 2015 14:04:37 +0000 (16:04 +0200)]
mac80211: suppress unchanged "limiting TX power" messages
When the AP is advertising limited TX power, the message can be
printed over and over again. Suppress it when the power level
isn't changing.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=106011
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Tue, 8 Dec 2015 14:04:39 +0000 (16:04 +0200)]
mac80211: reprogram in interface order
During reprogramming, mac80211 currently first adds all the channel
contexts, then binds them to the vifs and then goes to reconfigure
all the interfaces. Drivers might, perhaps implicitly, rely on the
operation order for certain things that typically happen within a
single function elsewhere in mac80211. To avoid problems with that,
reorder the code in mac80211's restart/reprogramming to work fully
within the interface loop so that the order of operations is like
in normal operation.
For iwlwifi, this fixes a firmware crash when reprogramming with an
AP/GO interface active.
Reported-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Tue, 8 Dec 2015 14:04:38 +0000 (16:04 +0200)]
mac80211: run scan completed work on reconfig failure
When reconfiguration during resume fails while a scan is pending
for completion work, that work will never run, and the scan will
be stuck forever. Factor out the code to recover this and call it
also in ieee80211_handle_reconfig_failure().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ola Olsson [Fri, 11 Dec 2015 20:04:52 +0000 (21:04 +0100)]
nl80211: Fix potential memory leak in nl80211_connect
Free cached keys if the last early return path is taken.
Signed-off-by: Ola Olsson <ola.olsson@sonymobile.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ola Olsson [Sat, 12 Dec 2015 22:17:17 +0000 (23:17 +0100)]
nl80211: Fix potential memory leak in nl80211_set_wowlan
Compared to cfg80211_rdev_free_wowlan in core.h,
the error goto label lacks the freeing of nd_config.
Fix that.
Signed-off-by: Ola Olsson <ola.olsson@sonymobile.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ola Olsson [Sun, 13 Dec 2015 18:12:03 +0000 (19:12 +0100)]
nl80211: fix a few memory leaks in reg.c
The first leak occurs when entering the default case
in the switch for the initiator in set_regdom.
The second leaks a platform_device struct if the
platform registration in regulatory_init succeeds but
the sub sequent regulatory hint fails due to no memory.
Signed-off-by: Ola Olsson <ola.olsson@sonymobile.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Shengjiu Wang [Tue, 8 Dec 2015 12:37:19 +0000 (13:37 +0100)]
ARM: 8475/1: SWP emulation: Restore original *data when failed
__user_swpX_asm maybe failed in first STREX operation, emulate_swpX
will try again, but the *data has been changed in first time. which
causes the result is wrong.
This patch is to fix this issue. When STREX succeed, change the *data.
if it fail, *data is not changed.
Signed-off-by: Shengjiu Wang <shengjiu.wang@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Anson Huang [Mon, 7 Dec 2015 09:09:19 +0000 (10:09 +0100)]
ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted
In cpu_v7_do_suspend routine, r11 is used while it is NOT
saved/restored, different compiler may have different usage
of ARM general registers, so it may cause issues during
calling cpu_v7_do_suspend.
We meet kernel fault occurs when using GCC 4.8.3, r11 contains
valid value before calling into cpu_v7_do_suspend, but when returned
from this routine, r11 is corrupted and lead to kernel fault.
Doing save/restore for those corrupted registers is a must in
assemble code.
Signed-off-by: Anson Huang <Anson.Huang@freescale.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Cc: <stable@vger.kernel.org> # v3.3+
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Sat, 5 Dec 2015 13:42:07 +0000 (13:42 +0000)]
ARM: fix uaccess_with_memcpy() with SW_DOMAIN_PAN
The uaccess_with_memcpy() code is currently incompatible with the SW
PAN code: it takes locks within the region that we've changed the DACR,
potentially sleeping as a result. As we do not save and restore the
DACR across co-operative sleep events, can lead to an incorrect DACR
value later in this code path.
Reported-by: Peter Rosin <peda@axentia.se>
Tested-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Vlad Yasevich [Mon, 14 Dec 2015 22:44:10 +0000 (17:44 -0500)]
skbuff: Fix offset error in skb_reorder_vlan_header
skb_reorder_vlan_header is called after the vlan header has
been pulled. As a result the offset of the begining of
the mac header has been incrased by 4 bytes (VLAN_HLEN).
When moving the mac addresses, include this incrase in
the offset calcualation so that the mac addresses are
copied correctly.
Fixes:
a6e18ff1117 (vlan: Fix untag operations of stacked vlans with REORDER_HEADER off)
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Vladislav Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Mon, 14 Dec 2015 21:48:36 +0000 (13:48 -0800)]
pptp: verify sockaddr_len in pptp_bind() and pptp_connect()
Reported-by: Dmitry Vyukov <dvyukov@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kazuya Mizuguchi [Sun, 13 Dec 2015 15:15:58 +0000 (00:15 +0900)]
ravb: Add disable 10base
Ethernet AVB does not support 10 Mbps transfer speed.
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 13 Dec 2015 20:05:07 +0000 (23:05 +0300)]
sh_eth: fix descriptor access endianness
The driver never calls cpu_to_edmac() when writing the descriptor address
and edmac_to_cpu() when reading it, although it should -- fix this.
Note that the frame/buffer length descriptor field accesses also need fixing
but since they are both 16-bit we can't use {cpu|edmac}_to_{edmac|cpu}()...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 13 Dec 2015 18:27:04 +0000 (21:27 +0300)]
sh_eth: fix TX buffer byte-swapping
For the little-endian SH771x kernels the driver has to byte-swap the RX/TX
buffers, however yet unset physcial address from the TX descriptor is used
to call sh_eth_soft_swap(). Use 'skb->data' instead...
Fixes:
31fcb99d9958 ("net: sh_eth: remove __flush_purge_region")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 14 Dec 2015 22:08:53 +0000 (14:08 -0800)]
net: fix IP early demux races
David Wilder reported crashes caused by dst reuse.
<quote David>
I am seeing a crash on a distro V4.2.3 kernel caused by a double
release of a dst_entry. In ipv4_dst_destroy() the call to
list_empty() finds a poisoned next pointer, indicating the dst_entry
has already been removed from the list and freed. The crash occurs
18 to 24 hours into a run of a network stress exerciser.
</quote>
Thanks to his detailed report and analysis, we were able to understand
the core issue.
IP early demux can associate a dst to skb, after a lookup in TCP/UDP
sockets.
When socket cache is not properly set, we want to store into
sk->sk_dst_cache the dst for future IP early demux lookups,
by acquiring a stable refcount on the dst.
Problem is this acquisition is simply using an atomic_inc(),
which works well, unless the dst was queued for destruction from
dst_release() noticing dst refcount went to zero, if DST_NOCACHE
was set on dst.
We need to make sure current refcount is not zero before incrementing
it, or risk double free as David reported.
This patch, being a stable candidate, adds two new helpers, and use
them only from IP early demux problematic paths.
It might be possible to merge in net-next skb_dst_force() and
skb_dst_force_safe(), but I prefer having the smallest patch for stable
kernels : Maybe some skb_dst_force() callers do not expect skb->dst
can suddenly be cleared.
Can probably be backported back to linux-3.6 kernels
Reported-by: David J. Wilder <dwilder@us.ibm.com>
Tested-by: David J. Wilder <dwilder@us.ibm.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ville Syrjälä [Thu, 3 Dec 2015 21:14:09 +0000 (23:14 +0200)]
drm: Don't overwrite UNVERFIED mode status to OK
The way the mode probing works is this:
1. All modes currently on the mode list are marked as UNVERIFIED
2. New modes are on the probed_modes list (they start with
status OK)
3. Modes are moved from the probed_modes list to the actual
mode list. If a mode already on the mode list is deemed
to match one of the probed modes, the duplicate is dropped
and the mode status updated to OK. After this the
probed_modes list will be empty.
4. All modes on the mode list are verified to not violate any
constraints. Any that do are marked as such.
5. Any mode left with a non-OK status is pruned from the list,
with an appropriate debug message.
What all this means is that any mode on the original list that
didn't have a duplicate on the probed_modes list, should be left
with status UNVERFIED (or previously could have been left with
some other status, but never OK).
I broke that in
commit
05acaec334fc ("drm: Reorganize probed mode validation")
by always assigning something to the mode->status during the validation
step. So any mode from the old list that still passed the validation
would be left on the list with status OK in the end.
Fix this by not doing the basic mode validation unless the mode
already has status OK (meaning it came from the probed_modes list,
or at least a duplicate of it was on that list). This way we will
correctly prune away any mode from the old mode list that didn't
appear on the probed_modes list.
Cc: stable@vger.kernel.org
Cc: Adam Jackson <ajax@redhat.com>
Fixes:
05acaec334fc ("drm: Reorganize probed mode validation")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1449177255-9515-2-git-send-email-ville.syrjala@linux.intel.com
Testcase: igt/kms_force_connector_basic/prune-stale-modes
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93332
[danvet: Also applying to drm-misc to avoid too much conflict hell -
there's a big pile of patches from Ville on top of this one.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 15 Dec 2015 00:25:21 +0000 (10:25 +1000)]
Merge tag 'drm-intel-fixes-2015-12-11' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Here are some i915 fixes for v4.4, sorry for being late this week.
* tag 'drm-intel-fixes-2015-12-11' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Do a better job at disabling primary plane in the noatomic case.
drm/i915/skl: Double RC6 WRL always on
drm/i915/skl: Disable coarse power gating up until F0
drm/i915: Remove incorrect warning in context cleanup
Dave Airlie [Tue, 15 Dec 2015 00:24:52 +0000 (10:24 +1000)]
Merge tag 'omapdrm-4.4-fixes' of git://git./linux/kernel/git/tomba/linux into drm-fixes
omapdrm fix for 4.4
* tag 'omapdrm-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
drm/omap: fix fbdev pix format to support all platforms
Sergei Shtylyov [Sat, 12 Dec 2015 22:44:50 +0000 (01:44 +0300)]
sh_eth: uninline sh_eth_{write|read}()
Commit
3365711df024 ("sh_eth: WARN on access to a register not implemented in
in a particular chip") added WARN_ON() to sh_eth_{read|write}(), thus making
it unacceptable for these functions to be *inline* anymore. Remove *inline*
and move the functions from the header to the driver itself. Below is our
code economy with ARM gcc 4.7.3:
$ size drivers/net/ethernet/renesas/sh_eth.o{~,}
text data bss dec hex filename
32489 1140 0 33629 835d drivers/net/ethernet/renesas/sh_eth.o~
25413 1140 0 26553 67b9 drivers/net/ethernet/renesas/sh_eth.o
Suggested-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Fri, 11 Dec 2015 10:03:49 +0000 (18:03 +0800)]
stmmac: dwmac-sunxi: Call exit cleanup function in probe error path
dwmac-sunxi has 2 callbacks that were called from stmmac_platform as
part of the probe and remove sequences.
Ater the conversion of dwmac-sunxi into a standalone platform driver,
the .init function is called before calling into the stmmac driver
core, but .exit is not called to clean up if stmmac returns an error.
This patch fixes the probe error path. This properly cleans up and
releases resources when the driver core fails to probe.
Cc: Joachim Eastwood <manabian@gmail.com>
Fixes:
9a9e9a1edee8 ("stmmac: dwmac-sunxi: turn setup callback into a
probe function")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Mon, 14 Dec 2015 21:03:39 +0000 (22:03 +0100)]
net: add validation for the socket syscall protocol argument
郭永刚 reported that one could simply crash the kernel as root by
using a simple program:
int socket_fd;
struct sockaddr_in addr;
addr.sin_port = 0;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = 10;
socket_fd = socket(10,3,0x40000000);
connect(socket_fd , &addr,16);
AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.
This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.
kernel: Call Trace:
kernel: [<
ffffffff816db90e>] ? inet_autobind+0x2e/0x70
kernel: [<
ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
kernel: [<
ffffffff81645069>] SYSC_connect+0xd9/0x110
kernel: [<
ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
kernel: [<
ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
kernel: [<
ffffffff81645e0e>] SyS_connect+0xe/0x10
kernel: [<
ffffffff81779515>] tracesys_phase2+0x84/0x89
I found no particular commit which introduced this problem.
CVE: CVE-2015-8543
Cc: Cong Wang <cwang@twopensource.com>
Reported-by: 郭永刚 <guoyonggang@360.cn>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Mon, 14 Dec 2015 12:51:51 +0000 (13:51 +0100)]
net: phy: mdio-mux: Check return value of mdiobus_alloc()
mdiobus_alloc() might return NULL, but its return value is not
checked in mdio_mux_init(). This could potentially lead to a NULL
pointer dereference. Fix it by checking the return value
Fixes:
0ca2997d1452 ("netdev/of/phy: Add MDIO bus multiplexer support.")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 14 Dec 2015 13:29:58 +0000 (14:29 +0100)]
openvswitch: fix trivial comment typo
The commit
33db4125ec74 ("openvswitch: Rename LABEL->LABELS") left
over an old OVS_CT_ATTR_LABEL instance, fix it.
Fixes:
33db4125ec74 ("openvswitch: Rename LABEL->LABELS")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Luck [Mon, 14 Dec 2015 18:30:02 +0000 (10:30 -0800)]
[IA64] Enable mlock2 syscall for ia64
New system call added in
commit
a8ca5d0ecbdde5cc3d7accacbd69968b0c98764e
mm: mlock: add new mlock system call
Signed-off-by: Tony Luck <tony.luck@intel.com>
David S. Miller [Mon, 14 Dec 2015 16:09:01 +0000 (11:09 -0500)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
netfilter fixes for net
The following patchset contains Netfilter fixes for you net tree,
specifically for nf_tables and nfnetlink_queue, they are:
1) Avoid a compilation warning in nfnetlink_queue that was introduced
in the previous merge window with the simplification of the conntrack
integration, from Arnd Bergmann.
2) nfnetlink_queue is leaking the pernet subsystem registration from
a failure path, patch from Nikolay Borisov.
3) Pass down netns pointer to batch callback in nfnetlink, this is the
largest patch and it is not a bugfix but it is a dependency to
resolve a splat in the correct way.
4) Fix a splat due to incorrect socket memory accounting with nfnetlink
skbuff clones.
5) Add missing conntrack dependencies to NFT_DUP_IPV4 and NFT_DUP_IPV6.
6) Traverse the nftables commit list in reverse order from the commit
path, otherwise we crash when the user applies an incremental update
via 'nft -f' that deletes an object that was just introduced in this
batch, from Xin Long.
Regarding the compilation warning fix, many people have sent us (and
keep sending us) patches to address this, that's why I'm including this
batch even if this is not critical.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tomi Valkeinen [Tue, 8 Dec 2015 16:32:14 +0000 (18:32 +0200)]
drm/omap: fix fbdev pix format to support all platforms
omap_fbdev always creates a framebuffer with ARGB8888 pixel format. On
OMAP3 we have VIDEO1 overlay that does not support ARGB8888, and on
OMAP2 none of the overlays support ARGB888.
This patch changes the omap_fbdev's fb to XRGB8888, which is supported
by all platforms.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
David Ahern [Thu, 10 Dec 2015 18:25:24 +0000 (10:25 -0800)]
net: Flush local routes when device changes vrf association
The VRF driver cycles netdevs when an interface is enslaved or released:
the down event is used to flush neighbor and route tables and the up
event (if the interface was already up) effectively moves local and
connected routes to the proper table.
As of
4f823defdd5b the local route is left hanging around after a link
down, so when a netdev is moved from one VRF to another (or released
from a VRF altogether) local routes are left in the wrong table.
Fix by handling the NETDEV_CHANGEUPPER event. When the upper dev is
an L3mdev then call fib_disable_ip to flush all routes, local ones
to.
Fixes:
4f823defdd5b ("ipv4: fix to not remove local route on link down")
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Thu, 10 Dec 2015 15:23:10 +0000 (17:23 +0200)]
net:hns: print MAC with %pM
printf() has a dedicated specifier to print MAC addresses. Use it instead of
pushing each byte via stack.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Thu, 10 Dec 2015 15:23:09 +0000 (17:23 +0200)]
net:hns: annotate IO address space properly
Mark address pointer with __iomem in the IO accessors.
Otherwise we will get a sparse complain like following
.../hns/hns_dsaf_reg.h:991:36: warning: incorrect type in argument 1 (different address spaces)
.../hns/hns_dsaf_reg.h:991:36: expected unsigned char [noderef] [usertype] <asn:2>*base
.../hns/hns_dsaf_reg.h:991:36: got void *base
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 14 Dec 2015 01:42:58 +0000 (17:42 -0800)]
Linux 4.4-rc5
Peter Zijlstra [Sun, 13 Dec 2015 21:11:16 +0000 (22:11 +0100)]
sched/wait: Fix the signal handling fix
Jan Stancek reported that I wrecked things for him by fixing things for
Vladimir :/
His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
should not be possible, however my previous patch made this possible by
unconditionally checking signal_pending().
We cannot use current->state as was done previously, because the
instruction after the store to that variable it can be changed. We must
instead pass the initial state along and use that.
Fixes:
68985633bccb ("sched/wait: Fix signal handling in bit wait helpers")
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Chris Mason <clm@fb.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Chris Mason <clm@fb.com>
Reviewed-by: Paul Turner <pjt@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: tglx@linutronix.de
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xin Long [Mon, 7 Dec 2015 10:48:07 +0000 (18:48 +0800)]
netfilter: nf_tables: use reverse traversal commit_list in nf_tables_abort
When we use 'nft -f' to submit rules, it will build multiple rules into
one netlink skb to send to kernel, kernel will process them one by one.
meanwhile, it add the trans into commit_list to record every commit.
if one of them's return value is -EAGAIN, status |= NFNL_BATCH_REPLAY
will be marked. after all the process is done. it will roll back all the
commits.
now kernel use list_add_tail to add trans to commit, and use
list_for_each_entry_safe to roll back. which means the order of adding
and rollback is the same. that will cause some cases cannot work well,
even trigger call trace, like:
1. add a set into table foo [return -EAGAIN]:
commit_list = 'add set trans'
2. del foo:
commit_list = 'add set trans' -> 'del set trans' -> 'del tab trans'
then nf_tables_abort will be called to roll back:
firstly process 'add set trans':
case NFT_MSG_NEWSET:
trans->ctx.table->use--;
list_del_rcu(&nft_trans_set(trans)->list);
it will del the set from the table foo, but it has removed when del
table foo [step 2], then the kernel will panic.
the right order of rollback should be:
'del tab trans' -> 'del set trans' -> 'add set trans'.
which is opposite with commit_list order.
so fix it by rolling back commits with reverse order in nf_tables_abort.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Sun, 13 Dec 2015 20:46:04 +0000 (12:46 -0800)]
Merge tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfix from Trond Myklebust:
"SUNRPC: Fix a NFSv4.1 callback channel regression"
* tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Fix callback channel
Linus Torvalds [Sun, 13 Dec 2015 20:41:10 +0000 (12:41 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixlets from Thomas Gleixner:
"Two trivial fixes which add missing header fileas and forward
declarations so the code will compile even when the magic include
chains are different"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Add missing include for barrier.h
irqchip/gic-v3: Add missing struct device_node declaration
Linus Torvalds [Sun, 13 Dec 2015 20:36:23 +0000 (12:36 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix to unbreak a clocksource driver which has more than 32bit
counter width"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Mmio: remove artificial 32bit limitation
Linus Torvalds [Sun, 13 Dec 2015 20:29:22 +0000 (12:29 -0800)]
Merge tag 'char-misc-4.4-rc5' of git://git./linux/kernel/git/gregkh/char-misc
Pull fpga driver fixes from Greg KH:
"Only two small fpga driver fixes here, both have been in linux-next
for a while, and resolve some reported issues"
* tag 'char-misc-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
fpga manager: Fix firmware resource leak on error
fpga manager: remove label
Linus Torvalds [Sun, 13 Dec 2015 20:24:39 +0000 (12:24 -0800)]
Merge tag 'staging-4.4-rc5' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are a few staging and IIO driver fixes for 4.4-rc5.
All of them resolve reported problems and have been in linux-next for
a while. Nothing major here, just small fixes where needed"
* tag 'staging-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: lustre: echo_copy.._lsm() dereferences userland pointers directly
iio: adc: spmi-vadc: add missing of_node_put
iio: fix some warning messages
iio: light: apds9960: correct ->last_busy count
iio: lidar: return -EINVAL on invalid signal
staging: iio: dummy: complete IIO events delivery to userspace
Linus Torvalds [Sun, 13 Dec 2015 19:58:18 +0000 (11:58 -0800)]
Merge tag 'usb-4.4-rc5' of git://git./linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are a number of small USB fixes for 4.4-rc5. All of them have
been in linux-next. The majority are gadget and phy issues, with a
few new quirks and device ids added as well"
* tag 'usb-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (32 commits)
USB: add quirk for devices with broken LPM
xhci: fix usb2 resume timing and races.
usb: musb: fail with error when no DMA controller set
usb: gadget: uvc: fix permissions of configfs attributes
usb: musb: core: Fix pm runtime for deferred probe
usb: phy: msm: fix a possible NULL dereference
USB: host: ohci-at91: fix a crash in ohci_hcd_at91_overcurrent_irq
usb: Quiet down false peer failure messages
usb: xhci: fix config fail of FS hub behind a HS hub with MTT
xhci: Fix memory leak in xhci_pme_acpi_rtd3_enable()
usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message
USB: whci-hcd: add check for dma mapping error
usb: core : hub: Fix BOS 'NULL pointer' kernel panic
USB: quirks: Apply ALWAYS_POLL to all ELAN devices
usb-storage: Fix scsi-sd failure "Invalid field in cdb" for USB adapter JMicron
USB: quirks: Fix another ELAN touchscreen
usb: dwc3: gadget: don't prestart interrupt endpoints
USB: serial: Another Infineon flash loader USB ID
USB: cdc_acm: Ignore Infineon Flash Loader utility
USB: cp210x: Remove CP2110 ID from compatibility list
...
Linus Torvalds [Sun, 13 Dec 2015 00:43:44 +0000 (16:43 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Here are a bunch of small bug fixes for various ARM platforms, nothing
really sticks out this week, most of either fixes bugs in code that
was just added in 4.4, or that has been broken for many years without
anyone noticing.
at91/sama5d2:
- fix sama5de hardware setup of sd/mmc interface
- proper selection of pinctrl drivers. PIO4 is necessary for sama5d2
berlin:
- fix incorrect clock input for SDIO
exynos:
- Fix potential NULL pointer dereference in Exynos PMU driver.
imx:
- Fix vf610 SAI clock configuration bug which is discovered by the
newly added master mode support in SAI audio driver.
- Fix buggy L2 cache latency values in vf610 device trees, which may
cause system hang when cpu runs at a higher frequency.
ixp4xx:
- fix prototypes for readl/writel functions
ls2080a:
- use little-endian register access for GPIO and SDHCI
omap:
- Fix clock source for ARM TWD and global timers on am437x
- Always select REGULATOR_FIXED_VOLTAGE for omap2+ instead of when
MACH_OMAP3_PANDORA is selected
- Fix SPI DMA handles for dm816x as only some were mapped
- Fix up mbox cells for dm816x to make mailbox usable
pxa:
- use PWM lookup table for all ezx machines
s3c24xx:
- Remove incorrect __init annotation from s3c24xx cpufreq driver
structures.
versatile:
- fix PCI IRQ mapping on Versatile PB"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ls2080a/dts: Add little endian property for GPIO IP block
dt-bindings: define little-endian property for QorIQ GPIO
ARM64: dts: ls2080a: fix eSDHC endianness
ARM: dts: vf610: use reset values for L2 cache latencies
ARM: pxa: use PWM lookup table for all machines
ARM: dts: berlin: add 2nd clock for BG2Q sdhci0 and sdhci1
ARM: dts: berlin: correct BG2Q's sdhci2 2nd clock
ARM: dts: am4372: fix clock source for arm twd and global timers
ARM: at91: fix pinctrl driver selection
ARM: at91/dt: add always-on to 1.8V regulator
ARM: dts: vf610: fix clock definition for SAI2
ARM: imx: clk-vf610: fix SAI clock tree
ARM: ixp4xx: fix read{b,w,l} return types
irqchip/versatile-fpga: Fix PCI IRQ mapping on Versatile PB
ARM: OMAP2+: enable REGULATOR_FIXED_VOLTAGE
ARM: dts: add dm816x missing spi DT dma handles
ARM: dts: add dm816x missing #mbox-cells
cpufreq: s3c24xx: Do not mark s3c2410_plls_add as __init
ARM: EXYNOS: Fix potential NULL pointer access in exynos_sys_powerdown_conf
Linus Torvalds [Sat, 12 Dec 2015 21:39:59 +0000 (13:39 -0800)]
Merge tag 'powerpc-4.4-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- opal-irqchip: Fix double endian conversion from Alistair Popple
- cxl: Set endianess of kernel contexts from Frederic Barrat
- sbc8641: drop bogus PHY IRQ entries from DTS file from Paul Gortmaker
- Revert "powerpc/eeh: Don't unfreeze PHB PE after reset" from Andrew
Donnellan
* tag 'powerpc-4.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
Revert "powerpc/eeh: Don't unfreeze PHB PE after reset"
powerpc/sbc8641: drop bogus PHY IRQ entries from DTS file
cxl: Set endianess of kernel contexts
powerpc/opal-irqchip: Fix double endian conversion
Linus Torvalds [Sat, 12 Dec 2015 18:44:49 +0000 (10:44 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"17 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
MIPS: fix DMA contiguous allocation
sh64: fix __NR_fgetxattr
ocfs2: fix SGID not inherited issue
mm/oom_kill.c: avoid attempting to kill init sharing same memory
drivers/base/memory.c: prohibit offlining of memory blocks with missing sections
tmpfs: fix shmem_evict_inode() warnings on i_blocks
mm/hugetlb.c: fix resv map memory leak for placeholder entries
mm: hugetlb: call huge_pte_alloc() only if ptep is null
kernel: remove stop_machine() Kconfig dependency
mm: kmemleak: mark kmemleak_init prototype as __init
mm: fix kerneldoc on mem_cgroup_replace_page
osd fs: __r4w_get_page rely on PageUptodate for uptodate
MAINTAINERS: make Vladimir co-maintainer of the memory controller
mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo
memcg: fix memory.high target
mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
Linus Torvalds [Sat, 12 Dec 2015 18:34:20 +0000 (10:34 -0800)]
Merge branch 'parisc-4.4-3' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"Fix the boot crash on Mako machines with Huge Pages, prevent a panic
with SATA controllers (and others) by correctly calculating the IOMMU
space, hook up the mlock2 syscall and drop unneeded code in the parisc
pci code"
* 'parisc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Disable huge pages on Mako machines
parisc: Wire up mlock2 syscall
parisc: Remove unused pcibios_init_bus()
parisc iommu: fix panic due to trying to allocate too large region
Linus Torvalds [Sat, 12 Dec 2015 18:24:00 +0000 (10:24 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
"A set of fixes for the current series. This contains:
- A bunch of fixes for lightnvm, should be the last round for this
series. From Matias and Wenwei.
- A writeback detach inode fix from Ilya, also marked for stable.
- A block (though it says SCSI) fix for an OOPS in SCSI runtime power
management.
- Module init error path fixes for null_blk from Minfei"
* 'for-linus' of git://git.kernel.dk/linux-block:
null_blk: Fix error path in module initialization
lightnvm: do not compile in debugging by default
lightnvm: prevent gennvm module unload on use
lightnvm: fix media mgr registration
lightnvm: replace req queue with nvmdev for lld
lightnvm: comments on constants
lightnvm: check mm before use
lightnvm: refactor spin_unlock in gennvm_get_blk
lightnvm: put blks when luns configure failed
lightnvm: use flags in rrpc_get_blk
block: detach bdev inode from its wb in __blkdev_put()
SCSI: Fix NULL pointer dereference in runtime PM
Linus Torvalds [Sat, 12 Dec 2015 18:16:26 +0000 (10:16 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Update the linker script to use L1_CACHE_BYTES instead of hard-coded
64. We recently changed L1_CACHE_BYTES to 128
- Improve race condition reporting on set_pte_at() and change the BUG
to WARN_ONCE. With hardware update of the accessed/dirty state, we
need to ensure that set_pte_at() does not inadvertently override
hardware updated state. The patch also makes the checks ignore
!pte_valid() new entries
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Improve error reporting on set_pte_at() checks
arm64: update linker script to increased L1_CACHE_BYTES value
Qais Yousef [Fri, 11 Dec 2015 21:41:09 +0000 (13:41 -0800)]
MIPS: fix DMA contiguous allocation
Recent changes to how GFP_ATOMIC is defined seems to have broken the
condition to use mips_alloc_from_contiguous() in
mips_dma_alloc_coherent().
I couldn't bottom out the exact change but I think it's this commit
d0164adc89f6 ("mm, page_alloc: distinguish between being unable to
sleep, unwilling to sleep and avoiding waking kswapd").
GFP_ATOMIC has multiple bits set and the check for !(gfp & GFP_ATOMIC)
isn't enough.
The reason behind this condition is to check whether we can potentially
do a sleeping memory allocation. Use gfpflags_allow_blocking() instead
which should be more robust.
Signed-off-by: Qais Yousef <qais.yousef@imgtec.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry V. Levin [Fri, 11 Dec 2015 21:41:06 +0000 (13:41 -0800)]
sh64: fix __NR_fgetxattr
According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
has to be defined to 259, but it doesn't. Instead, it's defined to 269,
which is of course used by another syscall, __NR_sched_setaffinity in this
case.
This bug was found by strace test suite.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Fri, 11 Dec 2015 21:41:03 +0000 (13:41 -0800)]
ocfs2: fix SGID not inherited issue
Commit
8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an
issue, SGID of sub dir was not inherited from its parents dir. It is
because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
is overwritten by "mode" which don't have SGID set later.
Fixes:
8f1eb48758aa ("ocfs2: fix umask ignored issue")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chen Jie [Fri, 11 Dec 2015 21:41:00 +0000 (13:41 -0800)]
mm/oom_kill.c: avoid attempting to kill init sharing same memory
It's possible that an oom killed victim shares an ->mm with the init
process and thus oom_kill_process() would end up trying to kill init as
well.
This has been shown in practice:
Out of memory: Kill process 9134 (init) score 3 or sacrifice child
Killed process 9134 (init) total-vm:1868kB, anon-rss:84kB, file-rss:572kB
Kill process 1 (init) sharing same memory
...
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
And this will result in a kernel panic.
If a process is forked by init and selected for oom kill while still
sharing init_mm, then it's likely this system is in a recoverable state.
However, it's better not to try to kill init and allow the machine to
panic due to unkillable processes.
[rientjes@google.com: rewrote changelog]
[akpm@linux-foundation.org: fix inverted test, per Ben]
Signed-off-by: Chen Jie <chenjie6@huawei.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Seth Jennings [Fri, 11 Dec 2015 21:40:57 +0000 (13:40 -0800)]
drivers/base/memory.c: prohibit offlining of memory blocks with missing sections
Commit
bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory
x86-64 systems") and
982792c782ef ("x86, mm: probe memory block size for
generic x86 64bit") introduced large block sizes for x86. This made it
possible to have multiple sections per memory block where previously,
there was a only every one section per block.
Since blocks consist of contiguous ranges of section, there can be holes
in the blocks where sections are not present. If one attempts to
offline such a block, a crash occurs since the code is not designed to
deal with this.
This patch is a quick fix to gaurd against the crash by not allowing
blocks with non-present sections to be offlined.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781
Signed-off-by: Seth Jennings <sjennings@variantweb.net>
Reported-by: Andrew Banman <abanman@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 11 Dec 2015 21:40:55 +0000 (13:40 -0800)]
tmpfs: fix shmem_evict_inode() warnings on i_blocks
Dmitry Vyukov provides a little program, autogenerated by syzkaller,
which races a fault on a mapping of a sparse memfd object, against
truncation of that object below the fault address: run repeatedly for a
few minutes, it reliably generates shmem_evict_inode()'s
WARN_ON(inode->i_blocks).
(But there's nothing specific to memfd here, nor to the fstat which it
happened to use to generate the fault: though that looked suspicious,
since a shmem_recalc_inode() had been added there recently. The same
problem can be reproduced with open+unlink in place of memfd_create, and
with fstatfs in place of fstat.)
v3.7 commit
0f3c42f522dc ("tmpfs: change final i_blocks BUG to WARNING")
explains one cause of such a warning (a race with shmem_writepage to
swap), and possible solutions; but we never took it further, and this
syzkaller incident turns out to have a different cause.
shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
then found to be beyond eof, looks plausible - decrementing the alloced
count that was just before incremented - but in fact can go wrong, if a
racing thread (the truncator, for example) gets its shmem_recalc_inode()
in just after our delete_from_page_cache(). delete_from_page_cache()
decrements nrpages, that shmem_recalc_inode() will balance the books by
decrementing alloced itself, then our decrement of alloced take it one
too low: leading to the WARNING when the object is finally evicted.
Once the new page has been exposed in the page cache,
shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get
the accounting right in all cases (and not fall through from "trunc:" to
"decused:"). Adjust that error recovery block; and the reinitialization
of info and sbinfo can be removed too.
While we're here, fix shmem_writepage() to avoid the original issue: it
will be safe against a racing shmem_recalc_inode(), if it merely
increments swapped before the shmem_delete_from_page_cache() which
decrements nrpages (but it must then do its own shmem_recalc_inode()
before that, while still in balance, instead of after). (Aside: why do
we shmem_recalc_inode() here in the swap path? Because its raison d'etre
is to cope with clean sparse shmem pages being reclaimed behind our
back: so here when swapping is a good place to look for that case.) But
I've not now managed to reproduce this bug, even without the patch.
I don't see why I didn't do that earlier: perhaps inhibited by the
preference to eliminate shmem_recalc_inode() altogether. Driven by this
incident, I do now have a patch to do so at last; but still want to sit
on it for a bit, there's a couple of questions yet to be resolved.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Fri, 11 Dec 2015 21:40:52 +0000 (13:40 -0800)]
mm/hugetlb.c: fix resv map memory leak for placeholder entries
Dmitry Vyukov reported the following memory leak
unreferenced object 0xffff88002eaafd88 (size 32):
comm "a.out", pid 5063, jiffies
4295774645 (age 15.810s)
hex dump (first 32 bytes):
28 e9 4e 63 00 88 ff ff 28 e9 4e 63 00 88 ff ff (.Nc....(.Nc....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
kmalloc include/linux/slab.h:458
region_chg+0x2d4/0x6b0 mm/hugetlb.c:398
__vma_reservation_common+0x2c3/0x390 mm/hugetlb.c:1791
vma_needs_reservation mm/hugetlb.c:1813
alloc_huge_page+0x19e/0xc70 mm/hugetlb.c:1845
hugetlb_no_page mm/hugetlb.c:3543
hugetlb_fault+0x7a1/0x1250 mm/hugetlb.c:3717
follow_hugetlb_page+0x339/0xc70 mm/hugetlb.c:3880
__get_user_pages+0x542/0xf30 mm/gup.c:497
populate_vma_page_range+0xde/0x110 mm/gup.c:919
__mm_populate+0x1c7/0x310 mm/gup.c:969
do_mlock+0x291/0x360 mm/mlock.c:637
SYSC_mlock2 mm/mlock.c:658
SyS_mlock2+0x4b/0x70 mm/mlock.c:648
Dmitry identified a potential memory leak in the routine region_chg,
where a region descriptor is not free'ed on an error path.
However, the root cause for the above memory leak resides in region_del.
In this specific case, a "placeholder" entry is created in region_chg.
The associated page allocation fails, and the placeholder entry is left
in the reserve map. This is "by design" as the entry should be deleted
when the map is released. The bug is in the region_del routine which is
used to delete entries within a specific range (and when the map is
released). region_del did not handle the case where a placeholder entry
exactly matched the start of the range range to be deleted. In this
case, the entry would not be deleted and leaked. The fix is to take
these special placeholder entries into account in region_del.
The region_chg error path leak is also fixed.
Fixes:
feba16e25a57 ("mm/hugetlb: add region_del() to delete a specific range of entries")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Fri, 11 Dec 2015 21:40:49 +0000 (13:40 -0800)]
mm: hugetlb: call huge_pte_alloc() only if ptep is null
Currently at the beginning of hugetlb_fault(), we call huge_pte_offset()
and check whether the obtained *ptep is a migration/hwpoison entry or
not. And if not, then we get to call huge_pte_alloc(). This is racy
because the *ptep could turn into migration/hwpoison entry after the
huge_pte_offset() check. This race results in BUG_ON in
huge_pte_alloc().
We don't have to call huge_pte_alloc() when the huge_pte_offset()
returns non-NULL, so let's fix this bug with moving the code into else
block.
Note that the *ptep could turn into a migration/hwpoison entry after
this block, but that's not a problem because we have another
!pte_present check later (we never go into hugetlb_no_page() in that
case.)
Fixes:
290408d4a250 ("hugetlb: hugepage migration core")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org> [2.6.36+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Wilson [Fri, 11 Dec 2015 21:40:46 +0000 (13:40 -0800)]
kernel: remove stop_machine() Kconfig dependency
Currently the full stop_machine() routine is only enabled on SMP if
module unloading is enabled, or if the CPUs are hotpluggable. This
leads to configurations where stop_machine() is broken as it will then
only run the callback on the local CPU with irqs disabled, and not stop
the other CPUs or run the callback on them.
For example, this breaks MTRR setup on x86 in certain configs since
ea8596bb2d8d379 ("kprobes/x86: Remove unused text_poke_smp() and
text_poke_smp_batch() functions") as the MTRR is only established on the
boot CPU.
This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
and HOTPLUG_CPU config options to compile the correct stop_machine() for
the architecture, removing the false dependency on MODULE_UNLOAD in the
process.
Link: https://lkml.org/lkml/2014/10/8/124
References: https://bugs.freedesktop.org/show_bug.cgi?id=84794
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Iooss [Fri, 11 Dec 2015 21:40:43 +0000 (13:40 -0800)]
mm: kmemleak: mark kmemleak_init prototype as __init
The kmemleak_init() definition in mm/kmemleak.c is marked __init but its
prototype in include/linux/kmemleak.h is marked __ref since commit
a6186d89c913 ("kmemleak: Mark the early log buffer as __initdata").
This causes a section mismatch which is reported as a warning when
building with clang -Wsection, because kmemleak_init() is declared in
section .ref.text but defined in .init.text.
Fix this by marking kmemleak_init() prototype __init.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 11 Dec 2015 21:40:40 +0000 (13:40 -0800)]
mm: fix kerneldoc on mem_cgroup_replace_page
Whoops, I missed removing the kerneldoc comment of the lrucare arg
removed from mem_cgroup_replace_page; but it's a good comment, keep it.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 11 Dec 2015 21:40:38 +0000 (13:40 -0800)]
osd fs: __r4w_get_page rely on PageUptodate for uptodate
Commit
42cb14b110a5 ("mm: migrate dirty page without
clear_page_dirty_for_io etc") simplified the migration of a PageDirty
pagecache page: one stat needs moving from zone to zone and that's about
all.
It's convenient and safest for it to shift the PageDirty bit from old
page to new, just before updating the zone stats: before copying data
and marking the new PageUptodate. This is all done while both pages are
isolated and locked, just as before; and just as before, there's a
moment when the new page is visible in the radix_tree, but not yet
PageUptodate. What's new is that it may now be briefly visible as
PageDirty before it is PageUptodate.
When I scoured the tree to see if this could cause a problem anywhere,
the only places I found were in two similar functions __r4w_get_page():
which look up a page with find_get_page() (not using page lock), then
claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.
I'm not sure whether that was right before, but now it might be wrong
(on rare occasions): only claim the page is uptodate if PageUptodate.
Or perhaps the page in question could never be migratable anyway?
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Boaz Harrosh <ooo@electrozaur.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 11 Dec 2015 21:40:35 +0000 (13:40 -0800)]
MAINTAINERS: make Vladimir co-maintainer of the memory controller
Vladimir architected and authored much of the current state of the
memcg's slab memory accounting and tracking. Make sure he gets CC'd on
bug reports ;-)
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 11 Dec 2015 21:40:32 +0000 (13:40 -0800)]
mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
Tetsuo Handa has reported that the system might basically livelock in
OOM condition without triggering the OOM killer.
The issue is caused by internal dependency of the direct reclaim on
vmstat counter updates (via zone_reclaimable) which are performed from
the workqueue context. If all the current workers get assigned to an
allocation request, though, they will be looping inside the allocator
trying to reclaim memory but zone_reclaimable can see stalled numbers so
it will consider a zone reclaimable even though it has been scanned way
too much. WQ concurrency logic will not consider this situation as a
congested workqueue because it relies that worker would have to sleep in
such a situation. This also means that it doesn't try to spawn new
workers or invoke the rescuer thread if the one is assigned to the
queue.
In order to fix this issue we need to do two things. First we have to
let wq concurrency code know that we are in trouble so we have to do a
short sleep. In order to prevent from issues handled by
0e093d99763e
("writeback: do not sleep on the congestion queue if there are no
congested BDIs or if significant congestion is not being encountered in
the current zone") we limit the sleep only to worker threads which are
the ones of the interest anyway.
The second thing to do is to create a dedicated workqueue for vmstat and
mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
have a spare worker thread for it.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Fri, 11 Dec 2015 21:40:29 +0000 (13:40 -0800)]
mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo
Commit
016c13daa5c9 ("mm, page_alloc: use masks and shifts when
converting GFP flags to migrate types") has swapped MIGRATE_MOVABLE and
MIGRATE_RECLAIMABLE in the enum definition. However, migratetype_names
wasn't updated to reflect that.
As a result, the file /proc/pagetypeinfo shows the counts for Movable as
Reclaimable and vice versa.
Additionally, commit
0aaa29a56e4f ("mm, page_alloc: reserve pageblocks
for high-order atomic allocations on demand") introduced
MIGRATE_HIGHATOMIC, but did not add a letter to distinguish it into
show_migration_types(), so it doesn't appear in the listing of free
areas during page alloc failures or oom kills.
This patch fixes both problems. The atomic reserves will show with a
letter 'H' in the free areas listings.
Fixes:
016c13daa5c9 ("mm, page_alloc: use masks and shifts when converting GFP flags to migrate types")
Fixes:
0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Fri, 11 Dec 2015 21:40:24 +0000 (13:40 -0800)]
memcg: fix memory.high target
When the memory.high threshold is exceeded, try_charge() schedules a
task_work to reclaim the excess. The reclaim target is set to the
number of pages requested by try_charge().
This is wrong, because try_charge() usually charges more pages than
requested (batch > nr_pages) in order to refill per cpu stocks. As a
result, a process in a cgroup can easily exceed memory.high
significantly when doing a lot of charges w/o returning to userspace
(e.g. reading a file in big chunks).
Fix this issue by assuring that when exceeding memory.high a process
reclaims as many pages as were actually charged (i.e. batch).
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>