Linus Torvalds [Fri, 31 Oct 2014 22:00:48 +0000 (15:00 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc update from David Miller:
"Two changes:
1) It makes no sense to execute a VTOC partition table request in the
Sun virtual block device driver and fail to load if it doesn't
succeed because a) we don't use the result at all and b) it won't
succeed if there is an EFI partition on the disk, for example.
We read the partition table via the normal means in the block layer
anyways, so this is really completely useless, so just remove it.
From Dwight Engen.
2) Hook up new bpf system call"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sunvdc: don't call VD_OP_GET_VTOC
sparc: Hook up bpf system call.
Linus Torvalds [Fri, 31 Oct 2014 21:43:42 +0000 (14:43 -0700)]
Merge tag 'microblaze-3.18-rc3' of git://git.monstr.eu/linux-2.6-microblaze
Pull Microblaze updates from Michal Simek:
- wire-up new bpf syscall
- fix PCI bug
- fix Kconfig warning
* tag 'microblaze-3.18-rc3' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Wire up bpf syscall
microblaze: Fix IO space breakage after of_pci_range_to_resource() change
microblaze: Fix missing NR_CPUS in menuconfig
Linus Torvalds [Fri, 31 Oct 2014 21:30:16 +0000 (14:30 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fixes from all around the place:
- hyper-V 32-bit PAE guest kernel fix
- two IRQ allocation fixes on certain x86 boards
- intel-mid boot crash fix
- intel-quark quirk
- /proc/interrupts duplicate irq chip name fix
- cma boot crash fix
- syscall audit fix
- boot crash fix with certain TSC configurations (seen on Qemu)
- smpboot.c build warning fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
x86, intel-mid: Create IRQs for APB timers and RTC timers
x86: Don't enable F00F workaround on Intel Quark processors
x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
x86, cma: Reserve DMA contiguous area after initmem_init()
i386/audit: stop scribbling on the stack frame
x86, apic: Handle a bad TSC more gracefully
x86: ACPI: Do not translate GSI number if IOAPIC is disabled
x86/smpboot: Move data structure to its primary usage scope
Linus Torvalds [Fri, 31 Oct 2014 21:05:35 +0000 (14:05 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Various scheduler fixes all over the place: three SCHED_DL fixes,
three sched/numa fixes, two generic race fixes and a comment fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/dl: Fix preemption checks
sched: Update comments for CLONE_NEWNS
sched: stop the unbound recursion in preempt_schedule_context()
sched/fair: Fix division by zero sysctl_numa_balancing_scan_size
sched/fair: Care divide error in update_task_scan_period()
sched/numa: Fix unsafe get_task_struct() in task_numa_assign()
sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer()
sched/deadline: Don't replenish from a !SCHED_DEADLINE entity
sched: Fix race between task_group and sched_task_group
Linus Torvalds [Fri, 31 Oct 2014 21:01:47 +0000 (14:01 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, plus on the kernel side:
- a revert for a newly introduced PMU driver which isn't complete yet
and where we ran out of time with fixes (to be tried again in
v3.19) - this makes up for a large chunk of the diffstat.
- compilation warning fixes
- a printk message fix
- event_idx usage fixes/cleanups"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Trivial typo fix for --demangle
perf tools: Fix report -F dso_from for data without branch info
perf tools: Fix report -F dso_to for data without branch info
perf tools: Fix report -F symbol_from for data without branch info
perf tools: Fix report -F symbol_to for data without branch info
perf tools: Fix report -F mispredict for data without branch info
perf tools: Fix report -F in_tx for data without branch info
perf tools: Fix report -F abort for data without branch info
perf tools: Make CPUINFO_PROC an array to support different kernel versions
perf callchain: Use global caching provided by libunwind
perf/x86/intel: Revert incomplete and undocumented Broadwell client support
perf/x86: Fix compile warnings for intel_uncore
perf: Fix typos in sample code in the perf_event.h header
perf: Fix and clean up initialization of pmu::event_idx
perf: Fix bogus kernel printk
perf diff: Add missing hists__init() call at tool start
Linus Torvalds [Fri, 31 Oct 2014 20:57:45 +0000 (13:57 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull futex fixes from Ingo Molnar:
"This contains two futex fixes: one fixes a race condition, the other
clarifies shared/private futex comments"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Fix a race condition between REQUEUE_PI and task death
futex: Mention key referencing differences between shared and private futexes
Dwight Engen [Thu, 30 Oct 2014 19:55:35 +0000 (15:55 -0400)]
sunvdc: don't call VD_OP_GET_VTOC
The VD_OP_GET_VTOC operation will succeed only if the vdisk backend has a
VTOC label, otherwise it will fail. In particular, it will return error
48 (ENOTSUP) if the disk has an EFI label. VTOC disk labels are already
handled by directly reading the disk in block/partitions/sun.c (enabled by
CONFIG_SUN_PARTITION which defaults to y on SPARC). Since port->label is
unused in the driver, remove the call and the field.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 31 Oct 2014 19:43:52 +0000 (12:43 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
"The tree contains two RCU fixes and a compiler quirk comment fix"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Make rcu_barrier() understand about missing rcuo kthreads
compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles
rcu: More on deadlock between CPU hotplug and expedited grace periods
Linus Torvalds [Fri, 31 Oct 2014 19:33:05 +0000 (12:33 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"As you requested in the rc2 release mail the timer department serves
you a few real bug fixes:
- Fix the probe logic of the architected arm/arm64 timer
- Plug a stack info leak in posix-timers
- Prevent a shift out of bounds issue in the clockevents core"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM/ARM64: arch-timer: fix arch_timer_probed logic
clockevents: Prevent shift out of bounds
posix-timers: Fix stack info leak in timer_create()
Linus Torvalds [Fri, 31 Oct 2014 19:28:38 +0000 (12:28 -0700)]
Merge tag 'trace-fixes-v3.18-rc1-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"ARM has system calls outside the NR_syscalls range, and the generic
tracing system does not support that and without checks, it can cause
an oops to be reported.
Rabin Vincent added checks in the return code on syscall events to
make sure that the system call number is within the range that tracing
knows about, and if not, simply ignores the system call.
The system call tracing infrastructure needs to be rewritten to handle
these cases better, but for now, to keep from oopsing, this patch will
do"
* tag 'trace-fixes-v3.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/syscalls: Ignore numbers outside NR_syscalls' range
Linus Torvalds [Fri, 31 Oct 2014 18:55:40 +0000 (11:55 -0700)]
Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
Pull documentation fixes from Jonathan Corbet:
"So this is my first pull request since I rashly agreed to look after
the documentation subtree. It contains some typo fixes, a few minor
documentation improvements, and, most importantly, fixes for a couple
of build problems in various bits of sample code.
I fully intend to start sending pull requests with signed tags.
However, due to poor planning on my part and the general obnoxiousness
of life, I'm 2000 miles away from my private key which is sitting on a
powered-down machine. This should be fixed before my next request.
Meanwhile git.lwn.net is a machine under my control, the patches are
all trivial, and all have done time in linux-next"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6:
Documentation/SubmittingPatches: Reported-by tags and permission
Documentation: remove outdated references to the linux-next wiki
Documentation: Restrict TSC test code to x86
doc: kernel-parameters.txt: Add ide-generic.probe-mask
vdso: don't require 64-bit math in standalone test
Documentation: Add CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF case
Documentation: Add default kmemleak off case in kernel-parameters.txt
Docs: Document that the sticky bit is understood by hugetlbfs
DocBook: Reduce noise from make cleandocs
Documentation: fix vdso_standalone_test_x86 on 32-bit
Documentation: dt-bindings: Explain order in patch series
Documentation/ABI/testing/sysfs-ibft: fix a typo
Rabin Vincent [Wed, 29 Oct 2014 22:06:58 +0000 (23:06 +0100)]
tracing/syscalls: Ignore numbers outside NR_syscalls' range
ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls. If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.
# trace-cmd record -e raw_syscalls:* true && trace-cmd report
...
true-653 [000] 384.675777: sys_enter: NR 192 (0, 1000, 3,
4000022,
ffffffff, 0)
true-653 [000] 384.675812: sys_exit: NR 192 =
1995915264
true-653 [000] 384.675971: sys_enter: NR 983045 (
76f74480,
76f74000,
76f74b28,
76f74480,
76f76f74, 1)
true-653 [000] 384.675988: sys_exit: NR 983045 = 0
...
# trace-cmd record -e syscalls:* true
[ 17.289329] Unable to handle kernel paging request at virtual address
aaaaaace
[ 17.289590] pgd =
9e71c000
[ 17.289696] [
aaaaaace] *pgd=
00000000
[ 17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 17.290169] Modules linked in:
[ 17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
[ 17.290585] task:
9f4dab00 ti:
9e710000 task.ti:
9e710000
[ 17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
[ 17.290866] LR is at syscall_trace_enter+0x124/0x184
Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.
Commit
cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.
Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in
Fixes:
cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Eric Rannaud [Thu, 30 Oct 2014 08:51:01 +0000 (01:51 -0700)]
fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
The man page for open(2) indicates that when O_CREAT is specified, the
'mode' argument applies only to future accesses to the file:
Note that this mode applies only to future accesses of the newly
created file; the open() call that creates a read-only file
may well return a read/write file descriptor.
The man page for open(2) implies that 'mode' is treated identically by
O_CREAT and O_TMPFILE.
O_TMPFILE, however, behaves differently:
int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
assert(fd == -1);
assert(errno == EACCES);
int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
assert(fd > 0);
For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:
if (*opened & FILE_CREATED) {
/* Don't check for write permission, don't truncate */
open_flag &= ~O_TRUNC;
will_truncate = false;
acc_mode = MAY_OPEN;
path_to_nameidata(path, nd);
goto finish_open_created;
}
But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
may_open().
This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
inode is created, may_open() is called with acc_mode = MAY_OPEN, in
do_tmpfile().
A different, but related glibc bug revealed the discrepancy:
https://sourceware.org/bugzilla/show_bug.cgi?id=17523
The glibc lazily loads the 'mode' argument of open() and openat() using
va_arg() only if O_CREAT is present in 'flags' (to support both the 2
argument and the 3 argument forms of open; same idea for openat()).
However, the glibc ignores the 'mode' argument if O_TMPFILE is in
'flags'.
On x86_64, for open(), it magically works anyway, as 'mode' is in
RDX when entering open(), and is still in RDX on SYSCALL, which is where
the kernel looks for the 3rd argument of a syscall.
But openat() is not quite so lucky: 'mode' is in RCX when entering the
glibc wrapper for openat(), while the kernel looks for the 4th argument
of a syscall in R10. Indeed, the syscall calling convention differs from
the regular calling convention in this respect on x86_64. So the kernel
sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
fails with EACCES.
Signed-off-by: Eric Rannaud <e@nanocritical.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 30 Oct 2014 16:34:35 +0000 (09:34 -0700)]
Merge tag 'fbdev-fixes-3.18' of git://git./linux/kernel/git/tomba/linux
Pull fbdev fixes from Tomi Valkeinen:
- fix fb console option parsing
- fixes for OMAPDSS/OMAPFB crashes related to module unloading and
device/driver binding & unbinding.
- fix for OMAP HDMI PLL locking failing in certain cases
- misc minor fixes for atmel lcdfb and OMAP
* tag 'fbdev-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
omap: dss: connector-analog-tv: Add missing module device table
OMAPDSS: DSI: Fix PLL_SELFEQDCO field width
OMAPDSS: fix dispc register dump for preload & mflag
OMAPDSS: DISPC: fix mflag offset
OMAPDSS: HDMI: fix regsd write
OMAPDSS: HDMI: fix PLL GO bit handling
OMAPFB: fix releasing overlays
OMAPFB: fix overlay disable when freeing resources.
OMAPDSS: apply: wait pending updates on manager disable
OMAPFB: remove __exit annotation
OMAPDSS: set suppress_bind_attrs
OMAPFB: add missing MODULE_ALIAS()
drivers: video: fbdev: atmel_lcdfb.c: remove unnecessary header
video/console: Resolve several shadow warnings
fbcon: Fix option parsing control flow in fb_console_setup
Linus Torvalds [Thu, 30 Oct 2014 16:11:38 +0000 (09:11 -0700)]
Merge tag 'sound-3.18-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Although the diffstat looks scary, it's just because of the removal of
the dead code (s6000), thus it must not affect anything serious.
Other than that, all small fixes. The only core fix is zero-clear for
a PCM compat ioctl. The rest are driver-specific, bebob, sgtl500,
adau1761, intel-sst, ad1889 and a few HD-audio quirks as usual"
* tag 'sound-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add workaround for CMI8888 snoop behavior
ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
ALSA: bebob: Uninitialized id returned by saffirepro_both_clk_src_get
ALSA: hda/realtek - New SSID for Headset quirk
ALSA: ad1889: Fix probable mask then right shift defects
ALSA: bebob: fix wrong decoding of clock information for Terratec PHASE 88 Rack FW
ALSA: hda/realtek - Update restore default value for ALC283
ALSA: hda/realtek - Update restore default value for ALC282
ASoC: fsl: use strncpy() to prevent copying of over-long names
ASoC: adau1761: Fix input PGA volume
ASoC: s6000: remove driver
ASoC: Intel: HSW/BDW only support S16 and S24 formats.
ASoC: sgtl500: Document the required supplies
Tomi Valkeinen [Thu, 30 Oct 2014 12:53:49 +0000 (14:53 +0200)]
Merge branch '3.18/omapdss-fixes' into 3.18/fbdev-fixes
Marek Belisko [Mon, 27 Oct 2014 20:24:03 +0000 (21:24 +0100)]
omap: dss: connector-analog-tv: Add missing module device table
Without that fix connector-analog-tv driver isn't probed when compiled
as module.
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Ingo Molnar [Thu, 30 Oct 2014 06:37:37 +0000 (07:37 +0100)]
Merge branch 'urgent-for-mingo' of git://git./linux/kernel/git/paulmck/linux-rcu into core/urgent
Pull two RCU fixes from Paul E. McKenney:
" - Complete the work of commit
dd56af42bd82 (rcu: Eliminate deadlock
between CPU hotplug and expedited grace periods), which was
intended to allow synchronize_sched_expedited() to be safely
used when holding locks acquired by CPU-hotplug notifiers.
This commit makes the put_online_cpus() avoid the deadlock
instead of just handling the get_online_cpus().
- Complete the work of commit
35ce7f29a44a (rcu: Create rcuo
kthreads only for onlined CPUs), which was intended to allow
RCU to avoid allocating unneeded kthreads on systems where the
firmware says that there are more CPUs than are really present.
This commit makes rcu_barrier() aware of the mismatch, so that
it doesn't hang waiting for non-existent CPUs. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 30 Oct 2014 06:32:34 +0000 (07:32 +0100)]
Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix report -F (abort, in_tx, mispredict, etc) segfaults for sample.data files
without branch info (Jiri Olsa)
- Add patch that should have went in a previous patchkit to use global cache
provided by libunwind (Namhyung Kim)
- Make CPUINFO_PROC an array to support different kernels, problem
detected when the information reported via /proc/cpuinfo changed on ARM (Wang Nan)
- 'perf probe' --demangle typo fix and a new --quiet option (Masami Hiramatsu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Wed, 29 Oct 2014 23:38:48 +0000 (16:38 -0700)]
Merge branch 'akpm' (incoming from Andrew Morton)
Merge misc fixes from Andrew Morton:
"21 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
mm/balloon_compaction: fix deflation when compaction is disabled
sh: fix sh770x SCIF memory regions
zram: avoid NULL pointer access in concurrent situation
mm/slab_common: don't check for duplicate cache names
ocfs2: fix d_splice_alias() return code checking
mm: rmap: split out page_remove_file_rmap()
mm: memcontrol: fix missed end-writeback page accounting
mm: page-writeback: inline account_page_dirtied() into single caller
lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
drivers/rtc/rtc-bq32k.c: fix register value
memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()
drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock
kernel/kmod: fix use-after-free of the sub_info structure
drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc
mm, thp: fix collapsing of hugepages on madvise
drivers: of: add return value to of_reserved_mem_device_init()
mm: free compound page with correct order
gcov: add ARM64 to GCOV_PROFILE_ALL
fsnotify: next_i is freed during fsnotify_unmount_inodes.
mm/compaction.c: avoid premature range skip in isolate_migratepages_range
...
Konstantin Khlebnikov [Wed, 29 Oct 2014 21:51:02 +0000 (14:51 -0700)]
mm/balloon_compaction: fix deflation when compaction is disabled
If CONFIG_BALLOON_COMPACTION=n balloon_page_insert() does not link pages
with balloon and doesn't set PagePrivate flag, as a result
balloon_page_dequeue() cannot get any pages because it thinks that all
of them are isolated. Without balloon compaction nobody can isolate
ballooned pages. It's safe to remove this check.
Fixes:
d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management").
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Reported-by: Matt Mullins <mmullins@mmlx.us>
Cc: <stable@vger.kernel.org> [3.17]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andriy Skulysh [Wed, 29 Oct 2014 21:50:59 +0000 (14:50 -0700)]
sh: fix sh770x SCIF memory regions
Resources scif1_resources & scif2_resources overlap. Actual SCIF region
size is 0x10.
This is regression from commit
d850acf975be ("sh: Declare SCIF register
base and IRQ as resources")
Signed-off-by: Andriy Skulysh <askulysh@gmail.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Weijie Yang [Wed, 29 Oct 2014 21:50:57 +0000 (14:50 -0700)]
zram: avoid NULL pointer access in concurrent situation
There is a rare NULL pointer bug in mem_used_total_show() and
mem_used_max_store() in concurrent situation, like this:
zram is not initialized, process A is a mem_used_total reader which runs
periodically, while process B try to init zram.
process A process B
access meta, get a NULL value
init zram, done
init_done() is true
access meta->mem_pool, get a NULL pointer BUG
This patch fixes this issue.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikulas Patocka [Wed, 29 Oct 2014 21:50:55 +0000 (14:50 -0700)]
mm/slab_common: don't check for duplicate cache names
The SLUB cache merges caches with the same size and alignment and there
was long standing bug with this behavior:
- create the cache named "foo"
- create the cache named "bar" (which is merged with "foo")
- delete the cache named "foo" (but it stays allocated because "bar"
uses it)
- create the cache named "foo" again - it fails because the name "foo"
is already used
That bug was fixed in commit
694617474e33 ("slab_common: fix the check
for duplicate slab names") by not warning on duplicate cache names when
the SLUB subsystem is used.
Recently, cache merging was implemented the with SLAB subsystem too, in
12220dea07f1 ("mm/slab: support slab merge")). Therefore we need stop
checking for duplicate names even for the SLAB subsystem.
This patch fixes the bug by removing the check.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Wed, 29 Oct 2014 21:50:53 +0000 (14:50 -0700)]
ocfs2: fix d_splice_alias() return code checking
d_splice_alias() can return a valid dentry, NULL or an ERR_PTR.
Currently the code checks not for ERR_PTR and will cuase an oops in
ocfs2_dentry_attach_lock(). Fix this by using IS_ERR_OR_NULL().
Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 29 Oct 2014 21:50:51 +0000 (14:50 -0700)]
mm: rmap: split out page_remove_file_rmap()
page_remove_rmap() has too many branches on PageAnon() and is hard to
follow. Move the file part into a separate function.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 29 Oct 2014 21:50:48 +0000 (14:50 -0700)]
mm: memcontrol: fix missed end-writeback page accounting
Commit
0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") changed
page migration to uncharge the old page right away. The page is locked,
unmapped, truncated, and off the LRU, but it could race with writeback
ending, which then doesn't unaccount the page properly:
test_clear_page_writeback() migration
wait_on_page_writeback()
TestClearPageWriteback()
mem_cgroup_migrate()
clear PCG_USED
mem_cgroup_update_page_stat()
if (PageCgroupUsed(pc))
decrease memcg pages under writeback
release pc->mem_cgroup->move_lock
The per-page statistics interface is heavily optimized to avoid a
function call and a lookup_page_cgroup() in the file unmap fast path,
which means it doesn't verify whether a page is still charged before
clearing PageWriteback() and it has to do it in the stat update later.
Rework it so that it looks up the page's memcg once at the beginning of
the transaction and then uses it throughout. The charge will be
verified before clearing PageWriteback() and migration can't uncharge
the page as long as that is still set. The RCU lock will protect the
memcg past uncharge.
As far as losing the optimization goes, the following test results are
from a microbenchmark that maps, faults, and unmaps a 4GB sparse file
three times in a nested fashion, so that there are two negative passes
that don't account but still go through the new transaction overhead.
There is no actual difference:
old: 33.
195102545 seconds time elapsed ( +- 0.01% )
new: 33.
199231369 seconds time elapsed ( +- 0.03% )
The time spent in page_remove_rmap()'s callees still adds up to the
same, but the time spent in the function itself seems reduced:
# Children Self Command Shared Object Symbol
old: 0.12% 0.11% filemapstress [kernel.kallsyms] [k] page_remove_rmap
new: 0.12% 0.08% filemapstress [kernel.kallsyms] [k] page_remove_rmap
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: <stable@vger.kernel.org> [3.17.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 29 Oct 2014 21:50:46 +0000 (14:50 -0700)]
mm: page-writeback: inline account_page_dirtied() into single caller
A follow-up patch would have changed the call signature. To save the
trouble, just fold it instead.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: <stable@vger.kernel.org> [3.17.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 29 Oct 2014 21:50:44 +0000 (14:50 -0700)]
lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
If __bitmap_shift_left() or __bitmap_shift_right() are asked to shift by
a multiple of BITS_PER_LONG, they will try to shift a long value by
BITS_PER_LONG bits which is undefined. Change the functions to avoid
the undefined shift.
Coverity id:
1192175
Coverity id:
1192174
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Machek [Wed, 29 Oct 2014 21:50:42 +0000 (14:50 -0700)]
drivers/rtc/rtc-bq32k.c: fix register value
Fix register value in bq32000 trickle charging.
Mike reported that I'm using wrong value in one trickle-charging case,
and after checking docs, I must admit he's right.
Signed-off-by: Pavel Machek <pavel@denx.de>
Reported-by: Mike Bremford <mike@bfo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yasuaki Ishimatsu [Wed, 29 Oct 2014 21:50:40 +0000 (14:50 -0700)]
memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()
When hot adding the same memory after hot removal, the following
messages are shown:
WARNING: CPU: 20 PID: 6 at mm/page_alloc.c:4968 free_area_init_node+0x3fe/0x426()
...
Call Trace:
dump_stack+0x46/0x58
warn_slowpath_common+0x81/0xa0
warn_slowpath_null+0x1a/0x20
free_area_init_node+0x3fe/0x426
hotadd_new_pgdat+0x90/0x110
add_memory+0xd4/0x200
acpi_memory_device_add+0x1aa/0x289
acpi_bus_attach+0xfd/0x204
acpi_bus_attach+0x178/0x204
acpi_bus_scan+0x6a/0x90
acpi_device_hotplug+0xe8/0x418
acpi_hotplug_work_fn+0x1f/0x2b
process_one_work+0x14e/0x3f0
worker_thread+0x11b/0x510
kthread+0xe1/0x100
ret_from_fork+0x7c/0xb0
The detaled explanation is as follows:
When hot removing memory, pgdat is set to 0 in try_offline_node(). But
if the pgdat is allocated by bootmem allocator, the clearing step is
skipped.
And when hot adding the same memory, the uninitialized pgdat is reused.
But free_area_init_node() checks wether pgdat is set to zero. As a
result, free_area_init_node() hits WARN_ON().
This patch clears pgdat which is allocated by bootmem allocator in
try_offline_node().
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marek Szyprowski [Wed, 29 Oct 2014 21:50:38 +0000 (14:50 -0700)]
drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock
Fix unconditional initialization failure on non-exynos3250 SoCs.
Commit
df9e26d093d3 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
introduced rtc source clock support, but also added initialization
failure on SoCs, which doesn't need such clock.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Martin Schwidefsky [Wed, 29 Oct 2014 21:50:35 +0000 (14:50 -0700)]
kernel/kmod: fix use-after-free of the sub_info structure
Found this in the message log on a s390 system:
BUG kmalloc-192 (Not tainted): Poison overwritten
Disabling lock debugging due to kernel taint
INFO: 0x00000000684761f4-0x00000000684761f7. First byte 0xff instead of 0x6b
INFO: Allocated in call_usermodehelper_setup+0x70/0x128 age=71 cpu=2 pid=648
__slab_alloc.isra.47.constprop.56+0x5f6/0x658
kmem_cache_alloc_trace+0x106/0x408
call_usermodehelper_setup+0x70/0x128
call_usermodehelper+0x62/0x90
cgroup_release_agent+0x178/0x1c0
process_one_work+0x36e/0x680
worker_thread+0x2f0/0x4f8
kthread+0x10a/0x120
kernel_thread_starter+0x6/0xc
kernel_thread_starter+0x0/0xc
INFO: Freed in call_usermodehelper_exec+0x110/0x1b8 age=71 cpu=2 pid=648
__slab_free+0x94/0x560
kfree+0x364/0x3e0
call_usermodehelper_exec+0x110/0x1b8
cgroup_release_agent+0x178/0x1c0
process_one_work+0x36e/0x680
worker_thread+0x2f0/0x4f8
kthread+0x10a/0x120
kernel_thread_starter+0x6/0xc
kernel_thread_starter+0x0/0xc
There is a use-after-free bug on the subprocess_info structure allocated
by the user mode helper. In case do_execve() returns with an error
____call_usermodehelper() stores the error code to sub_info->retval, but
sub_info can already have been freed.
Regarding UMH_NO_WAIT, the sub_info structure can be freed by
__call_usermodehelper() before the worker thread returns from
do_execve(), allowing memory corruption when do_execve() failed after
exec_mmap() is called.
Regarding UMH_WAIT_EXEC, the call to umh_complete() allows
call_usermodehelper_exec() to continue which then frees sub_info.
To fix this race the code needs to make sure that the call to
call_usermodehelper_freeinfo() is always done after the last store to
sub_info->retval.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stanimir Varbanov [Wed, 29 Oct 2014 21:50:33 +0000 (14:50 -0700)]
drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc
Adds support for RTC device inside PM8941 PMIC. The RTC in this PMIC
have two register spaces. Thus the rtc-pm8xxx is slightly reworked to
reflect these differences.
The register set for different PMIC chips are selected on DT compatible
string base.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: simplify and fix locking in pm8xxx_rtc_set_time()]
Signed-off-by: Stanimir Varbanov <svarbanov@mm-sol.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Josh Cartwright <joshc@codeaurora.org>
Cc: Stanimir Varbanov <svarbanov@mm-sol.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 29 Oct 2014 21:50:31 +0000 (14:50 -0700)]
mm, thp: fix collapsing of hugepages on madvise
If an anonymous mapping is not allowed to fault thp memory and then
madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
collapse this memory into thp memory.
This occurs because the madvise(2) handler for thp, hugepage_madvise(),
clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
until the final action of madvise_behavior(). This causes the
khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
the vma had previously had VM_NOHUGEPAGE set.
Fix this by passing the correct vma flags to the khugepaged mm slot
handler. There's no chance khugepaged can run on this vma until after
madvise_behavior() returns since we hold mm->mmap_sem.
It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
in hugepage_advise(), but I didn't want to introduce special case
behavior into madvise_behavior(). I think it's best to just let it
always set vma->vm_flags itself.
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marek Szyprowski [Wed, 29 Oct 2014 21:50:29 +0000 (14:50 -0700)]
drivers: of: add return value to of_reserved_mem_device_init()
Driver calling of_reserved_mem_device_init() might be interested if the
initialization has been successful or not, so add support for returning
error code.
This fixes a build warining caused by commit
7bfa5ab6fa1b ("drivers:
dma-coherent: add initialization from device tree"), which has been
merged without this change and without fixing function return value.
Fixes:
7bfa5ab6fa1b1 ("drivers: dma-coherent: add initialization from device tree")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Josh Cartwright <joshc@codeaurora.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yu Zhao [Wed, 29 Oct 2014 21:50:26 +0000 (14:50 -0700)]
mm: free compound page with correct order
Compound page should be freed by put_page() or free_pages() with correct
order. Not doing so will cause tail pages leaked.
The compound order can be obtained by compound_order() or use
HPAGE_PMD_ORDER in our case. Some people would argue the latter is
faster but I prefer the former which is more general.
This bug was observed not just on our servers (the worst case we saw is
11G leaked on a 48G machine) but also on our workstations running Ubuntu
based distro.
$ cat /proc/vmstat | grep thp_zero_page_alloc
thp_zero_page_alloc 55
thp_zero_page_alloc_failed 0
This means there is (thp_zero_page_alloc - 1) * (2M - 4K) memory leaked.
Fixes:
97ae17497e99 ("thp: implement refcounting for huge zero page")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: <stable@vger.kernel.org> [3.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Riku Voipio [Wed, 29 Oct 2014 21:50:24 +0000 (14:50 -0700)]
gcov: add ARM64 to GCOV_PROFILE_ALL
Following up the arm testing of gcov, turns out gcov on ARM64 works fine
as well. Only change needed is adding ARM64 to Kconfig depends.
Tested with qemu and mach-virt
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jerry Hoemann [Wed, 29 Oct 2014 21:50:22 +0000 (14:50 -0700)]
fsnotify: next_i is freed during fsnotify_unmount_inodes.
During file system stress testing on 3.10 and 3.12 based kernels, the
umount command occasionally hung in fsnotify_unmount_inodes in the
section of code:
spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
spin_unlock(&inode->i_lock);
continue;
}
As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.
Multiple crash dumps showed:
The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself. As this is not the value of list upon entry to the
function, the kernel never exits the loop.
To help narrow down problem, the call to list_del_init in
inode_sb_list_del was changed to list_del. This poisons the pointers in
the i_sb_list and causes a kernel to panic if it transverse a freed
inode.
Subsequent stress testing paniced in fsnotify_unmount_inodes at the
bottom of the list_for_each_entry_safe loop showing next_i had become
free.
We believe the root cause of the problem is that next_i is being freed
during the window of time that the list_for_each_entry_safe loop
temporarily releases inode_sb_list_lock to call fsnotify and
fsnotify_inode_delete.
The code in fsnotify_unmount_inodes attempts to prevent the freeing of
inode and next_i by calling __iget. However, the code doesn't do the
__iget call on next_i
if i_count == 0 or
if i_state & (I_FREEING | I_WILL_FREE)
The patch addresses this issue by advancing next_i in the above two cases
until we either find a next_i which we can __iget or we reach the end of
the list. This makes the handling of next_i more closely match the
handling of the variable "inode."
The time to reproduce the hang is highly variable (from hours to days.) We
ran the stress test on a 3.10 kernel with the proposed patch for a week
without failure.
During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate. Advance next_i in those cases where
__iget is not done.
Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ken Helias <kenhelias@firemail.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Wed, 29 Oct 2014 21:50:20 +0000 (14:50 -0700)]
mm/compaction.c: avoid premature range skip in isolate_migratepages_range
Commit
edc2ca612496 ("mm, compaction: move pageblock checks up from
isolate_migratepages_range()") commonizes isolate_migratepages variants
and make them use isolate_migratepages_block().
isolate_migratepages_block() could stop the execution when enough pages
are isolated, but, there is no code in isolate_migratepages_range() to
handle this case. In the result, even if isolate_migratepages_block()
returns prematurely without checking all pages in the range,
isolate_migratepages_block() is called repeately on the following
pageblock and some pages in the previous range are skipped to check.
Then, CMA is failed frequently due to this fact.
To fix this problem, this patch let isolate_migratepages_range() know
the situation that enough pages are isolated and stop the isolation in
that case.
Note that isolate_migratepages() has no such problem, because, it always
stops the isolation after just one call of isolate_migratepages_block().
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wang Nan [Wed, 29 Oct 2014 21:50:18 +0000 (14:50 -0700)]
cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.
Commit
ff7ee93f4715 ("cgroup/kmemleak: Annotate alloc_page() for cgroup
allocations") introduces kmemleak_alloc() for alloc_page_cgroup(), but
corresponding kmemleak_free() is missing, which makes kmemleak be
wrongly disabled after memory offlining. Log is pasted at the end of
this commit message.
This patch add kmemleak_free() into free_page_cgroup(). During page
offlining, this patch removes corresponding entries in kmemleak rbtree.
After that, the freed memory can be allocated again by other subsystems
without killing kmemleak.
bash # for x in 1 2 3 4; do echo offline > /sys/devices/system/memory/memory$x/state ; sleep 1; done ; dmesg | grep leak
Offlined Pages 32768
kmemleak: Cannot insert 0xffff880016969000 into the object search tree (overlaps existing)
CPU: 0 PID: 412 Comm: sleep Not tainted 3.17.0-rc5+ #86
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x46/0x58
create_object+0x266/0x2c0
kmemleak_alloc+0x26/0x50
kmem_cache_alloc+0xd3/0x160
__sigqueue_alloc+0x49/0xd0
__send_signal+0xcb/0x410
send_signal+0x45/0x90
__group_send_sig_info+0x13/0x20
do_notify_parent+0x1bb/0x260
do_exit+0x767/0xa40
do_group_exit+0x44/0xa0
SyS_exit_group+0x17/0x20
system_call_fastpath+0x16/0x1b
kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xffff880016900000 (size 524288):
kmemleak: comm "swapper/0", pid 0, jiffies
4294667296
kmemleak: min_count = 0
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
log_early+0x63/0x77
kmemleak_alloc+0x4b/0x50
init_section_page_cgroup+0x7f/0xf5
page_cgroup_init+0xc5/0xd0
start_kernel+0x333/0x408
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0xf5/0xfc
Fixes:
ff7ee93f4715 (cgroup/kmemleak: Annotate alloc_page() for cgroup allocations)
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org> [3.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 29 Oct 2014 18:57:10 +0000 (11:57 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
"A small collection of fixes for the current kernel. This contains:
- Two error handling fixes from Jan Kara. One for null_blk on
failure to add a device, and the other for the block/scsi_ioctl
SCSI_IOCTL_SEND_COMMAND fixing up the error jump point.
- A commit added in the merge window for the bio integrity bits
unfortunately disabled merging for all requests if
CONFIG_BLK_DEV_INTEGRITY wasn't set. Reverse the logic, so that
integrity checking wont disallow merges when not enabled.
- A fix from Ming Lei for merging and generating too many segments.
This caused a BUG in virtio_blk.
- Two error handling printk() fixups from Robert Elliott, improving
the information given when we rate limit.
- Error handling fixup on elevator_init() failure from Sudip
Mukherjee.
- A fix from Tony Battersby, fixing up a memory leak in the
scatterlist handling with scsi-mq"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: Fix merge logic when CONFIG_BLK_DEV_INTEGRITY is not defined
lib/scatterlist: fix memory leak with scsi-mq
block: fix wrong error return in elevator_init()
scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
null_blk: Cleanup error recovery in null_add_dev()
blk-merge: recaculate segment if it isn't less than max segments
fs: clarify rate limit suppressed buffer I/O errors
fs: merge I/O error prints into one line
Linus Torvalds [Wed, 29 Oct 2014 18:52:35 +0000 (11:52 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- workarounds for a couple of misbehaving Elan Touchscreens, by Adel
Gadllah
- fix for TransducerSerialNumber field implementation, by Jason Gerecke
- a couple of new HID usages (added by HUT), by Olivier Gay
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: input: Fix TransducerSerialNumber implementation
HID: add keyboard input assist hid usages
HID: usbhid: enable always-poll quirk for Elan Touchscreen 016f
HID: usbhid: enable always-poll quirk for Elan Touchscreen 009b
Linus Torvalds [Wed, 29 Oct 2014 18:47:42 +0000 (11:47 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull Integrity subsystem fix from James Morris:
"These changes fix a bug in xattr handling, where the evm and ima
inode_setxattr() functions do not check for empty xattrs being passed
from userspace (leading to user-triggerable null pointer
dereferences)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
evm: check xattr value length and type in evm_inode_setxattr()
ima: check xattr value length and type in the ima_inode_setxattr()
Linus Torvalds [Wed, 29 Oct 2014 18:11:44 +0000 (11:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mpe/linux
Pull powerpc updates from Michael Ellerman:
"There's some bug fixes or cleanups to facilitate fixes, a MAINTAINERS
update, and a new syscall (bpf)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc/numa: ensure per-cpu NUMA mappings are correct on topology update
powerpc/numa: use cached value of update->cpu in update_cpu_topology
cxl: Fix PSL error due to duplicate segment table entries
powerpc/mm: Use appropriate ESID mask in copro_calculate_slb()
cxl: Refactor cxl_load_segment() and find_free_sste()
cxl: Disable secondary hash in segment table
Revert "powerpc/powernv: Fix endian bug in LPC bus debugfs accessors"
powernv: Use _GLOBAL_TOC for opal wrappers
powerpc: Wire up sys_bpf() syscall
MAINTAINERS: nx-842 driver maintainer change
powerpc/mm: Remove redundant #if case
powerpc/mm: Fix build error with hugetlfs disabled
Takashi Iwai [Wed, 29 Oct 2014 15:13:05 +0000 (16:13 +0100)]
ALSA: hda - Add workaround for CMI8888 snoop behavior
CMI8888 shows the stuttering playback when the snooping is disabled
on the audio buffer. Meanwhile, we've got reports that CORB/RIRB
doesn't work in the snooped mode. So, as a compromise, disable the
snoop only for CORB/RIRB and enable the snoop for the stream buffers.
The resultant patch became a bit ugly, unfortunately, but we still can
live with it.
Reported-and-tested-by: Geoffrey McRae <geoff@spacevs.com>
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Dan Carpenter [Wed, 29 Oct 2014 10:01:36 +0000 (13:01 +0300)]
Documentation/SubmittingPatches: Reported-by tags and permission
The reported-by text says you have to ask for permission, but that
should only be if the bug was reported in private. These days the
standard is to always give reported-by credit or it's considered a bit
rude.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Masami Hiramatsu [Mon, 27 Oct 2014 20:31:24 +0000 (16:31 -0400)]
perf probe: Trivial typo fix for --demangle
Replace "Disable" with "Enable", since --demangle option enables symbol
demangling, not disable it.
perf probe has --demangle and --no-demangle options, but the
command-line help (--help) shows only --demangle option. So it should
explain about --demangle.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20141027203124.21219.68278.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 16 Oct 2014 14:07:07 +0000 (16:07 +0200)]
perf tools: Fix report -F dso_from for data without branch info
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F dso_from
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-8-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 16 Oct 2014 14:07:06 +0000 (16:07 +0200)]
perf tools: Fix report -F dso_to for data without branch info
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F dso_to
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 16 Oct 2014 14:07:05 +0000 (16:07 +0200)]
perf tools: Fix report -F symbol_from for data without branch info
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F symbol_from
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 16 Oct 2014 14:07:04 +0000 (16:07 +0200)]
perf tools: Fix report -F symbol_to for data without branch info
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F symbol_to
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 16 Oct 2014 14:07:03 +0000 (16:07 +0200)]
perf tools: Fix report -F mispredict for data without branch info
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F mispredict
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 16 Oct 2014 14:07:02 +0000 (16:07 +0200)]
perf tools: Fix report -F in_tx for data without branch info
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F in_tx
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 16 Oct 2014 14:07:01 +0000 (16:07 +0200)]
perf tools: Fix report -F abort for data without branch info
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F abort
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wang Nan [Fri, 24 Oct 2014 01:45:26 +0000 (09:45 +0800)]
perf tools: Make CPUINFO_PROC an array to support different kernel versions
After kernel 3.7 (commit
b4b8f770eb10a1bccaf8aa0ec1956e2dd7ed1e0a),
/proc/cpuinfo replaces 'Processor' to 'model name'.
This patch makes CPUINFO_PROC to an array and provides two choices for
ARM, makes it compatible for different kernel version.
v1 -> v2: minor changes as suggested by Namhyung Kim:
- Doesn't pass @h and @evlist to __write_cpudesc;
- Coding style fix.
v2 -> v3:
- Rebase:
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/core
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1414115126-7479-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 6 Oct 2014 00:46:01 +0000 (09:46 +0900)]
perf callchain: Use global caching provided by libunwind
The libunwind provides two caching policy which are global and
per-thread. As perf unwinds callchains in a single thread, it'd
sufficient to use global caching.
This speeds up my perf report from 14s to 7s on a ~260MB data file.
Although the output sometimes contains a slight difference (~0.01% in
terms of number of lines printed) on callchains which were not resolved.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Arun Sharma <asharma@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1412556363-26229-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Wed, 29 Oct 2014 09:18:17 +0000 (10:18 +0100)]
perf/x86/intel: Revert incomplete and undocumented Broadwell client support
These patches:
86a349a28b24 ("perf/x86/intel: Add Broadwell core support")
c46e665f0377 ("perf/x86: Add INST_RETIRED.ALL workarounds")
fdda3c4aacec ("perf/x86/intel: Use Broadwell cache event list for Haswell")
introduced magic constants and unexplained changes:
https://lkml.org/lkml/2014/10/28/1128
https://lkml.org/lkml/2014/10/27/325
https://lkml.org/lkml/2014/8/27/546
https://lkml.org/lkml/2014/10/28/546
Peter Zijlstra has attempted to help out, to clean up the mess:
https://lkml.org/lkml/2014/10/28/543
But has not received helpful and constructive replies which makes
me doubt wether it can all be finished in time until v3.18 is
released.
Despite various review feedback the author (Andi Kleen) has answered
only few of the review questions and has generally been uncooperative,
only giving replies when prompted repeatedly, and only giving minimal
answers instead of constructively explaining and helping along the effort.
That kind of behavior is not acceptable.
There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
commit from Andi Kleen:
e735b9db12d7 ("perf/x86/intel/uncore: Add Haswell-EP uncore support")
https://lkml.org/lkml/2014/10/22/730
Which is not yet resolved. The uncore driver is independent in theory,
but the crash makes me worry about how well all these patches were
tested and makes me uneasy about the level of interminging that the
Broadwell and Haswell code has received by the commits above.
As a first step to resolve the mess revert the Broadwell client commits
back to the v3.17 version, before we run out of time and problematic
code hits a stable upstream kernel.
( If the Haswell-EP crash is not resolved via a simple fix then we'll have
to revert the Haswell-EP uncore driver as well. )
The Broadwell client series has to be submitted in a clean fashion, with
single, well documented changes per patch. If they are submitted in time
and are accepted during review then they can possibly go into v3.19 but
will need additional scrutiny due to the rocky history of this patch set.
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jason Gerecke [Tue, 23 Sep 2014 18:09:28 +0000 (11:09 -0700)]
HID: input: Fix TransducerSerialNumber implementation
The commit which introduced TransducerSerialNumber (
368c966) is missing
two crucial implementation details. Firstly, the commit does not set the
type/code/bit/max fields as expected later down the code which can cause
the driver to crash when a tablet with this usage is connected. Secondly,
the call to 'set_bit' causes MSC_PULSELED to be sent instead of the
expected MSC_SERIAL. This commit addreses both issues.
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Ping Cheng <pingc@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Dexuan Cui [Wed, 29 Oct 2014 10:53:37 +0000 (03:53 -0700)]
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
pte_pfn() returns a PFN of long (32 bits in 32-PAE), so "long <<
PAGE_SHIFT" will overflow for PFNs above 4GB.
Due to this issue, some Linux 32-PAE distros, running as guests on Hyper-V,
with 5GB memory assigned, can't load the netvsc driver successfully and
hence the synthetic network device can't work (we can use the kernel parameter
mem=3000M to work around the issue).
Cast pte_pfn() to phys_addr_t before shifting.
Fixes: "commit
d76565344512: x86, mm: Create slow_virt_to_phys()"
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: gregkh@linuxfoundation.org
Cc: linux-mm@kvack.org
Cc: olaf@aepfle.de
Cc: apw@canonical.com
Cc: jasowang@redhat.com
Cc: dave.hansen@intel.com
Cc: riel@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1414580017-27444-1-git-send-email-decui@microsoft.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jiang Liu [Mon, 27 Oct 2014 05:21:33 +0000 (13:21 +0800)]
ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
Function mp_register_gsi() returns blindly the GSI number for the ACPI
SCI interrupt. That causes a regression when the GSI for ACPI SCI is
shared with other devices.
The regression was caused by commit
84245af7297ced9e8fe "x86, irq, ACPI:
Change __acpi_register_gsi to return IRQ number instead of GSI" and
exposed on a SuperMicro system, which shares one GSI between ACPI SCI
and PCI device, with following failure:
http://sourceforge.net/p/linux1394/mailman/linux1394-user/?viewmonth=201410
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 20 low
level)
[ 2.699224] firewire_ohci 0000:06:00.0: failed to allocate interrupt
20
Return mp_map_gsi_to_irq(gsi, 0) instead of the GSI number.
Reported-and-Tested-by: Daniel Robbins <drobbins@funtoo.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiang Liu [Mon, 27 Oct 2014 05:21:32 +0000 (13:21 +0800)]
x86, intel-mid: Create IRQs for APB timers and RTC timers
Intel MID platforms has no legacy interrupts, so no IRQ descriptors
preallocated. We need to call mp_map_gsi_to_irq() to create IRQ
descriptors for APB timers and RTC timers, otherwise it may cause
invalid memory access as:
[ 0.116839] BUG: unable to handle kernel NULL pointer dereference at
0000003a
[ 0.123803] IP: [<
c1071c0e>] setup_irq+0xf/0x4d
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Jones [Tue, 28 Oct 2014 17:57:53 +0000 (13:57 -0400)]
x86: Don't enable F00F workaround on Intel Quark processors
The Intel Quark processor is a part of family 5, but does not have the
F00F bug present in Pentiums of the same family.
Pentiums were models 0 through 8, Quark is model 9.
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Link: http://lkml.kernel.org/r/20141028175753.GA12743@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
James Morris [Wed, 29 Oct 2014 04:03:54 +0000 (15:03 +1100)]
Merge branch 'for-linus' of git://git./linux/kernel/git/zohar/linux-integrity into for-linus
Martin K. Petersen [Wed, 29 Oct 2014 02:27:43 +0000 (20:27 -0600)]
block: Fix merge logic when CONFIG_BLK_DEV_INTEGRITY is not defined
Commit
4eaf99beadce switched to returning bool and as a result reversed
the logic of the integrity merge checks. However, the empty stubs used
when the block integrity code is compiled out were still returning
0. Make these stubs return "true".
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Tested-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Nishanth Aravamudan [Sat, 18 Oct 2014 00:50:40 +0000 (17:50 -0700)]
powerpc/numa: ensure per-cpu NUMA mappings are correct on topology update
We received a report of warning in kernel/sched/core.c where the sched
group was NULL on an LPAR after a topology update. This seems to occur
because after the topology update has moved the CPUs, cpu_to_node is
returning the old value still, which ends up breaking the consistency of
the NUMA topology in the per-cpu maps. Ensure that we update the per-cpu
fields when we re-map CPUs.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nishanth Aravamudan [Sat, 18 Oct 2014 00:49:44 +0000 (17:49 -0700)]
powerpc/numa: use cached value of update->cpu in update_cpu_topology
There isn't any need to keep referring to update->cpu, as we've already
checked cpu == update->cpu at this point.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Tue, 28 Oct 2014 20:32:06 +0000 (13:32 -0700)]
Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux
Pull two nfsd fixes from Bruce Fields:
"One regression from the 3.16 xdr rewrite, one an older bug exposed by
a separate bug in the client's new SEEK code"
* 'for-3.18' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix crash on unknown operation number
nfsd4: fix response size estimation for OP_SEQUENCE
Linus Torvalds [Tue, 28 Oct 2014 20:27:19 +0000 (13:27 -0700)]
Merge tag 'trace-fixes-v3.18-rc1' of git://git./linux/kernel/git/rostedt/linux-trace
Pull ftrace trampoline accounting fixes from Steven Rostedt:
"Adding the new code for 3.19, I discovered a couple of minor bugs with
the accounting of the ftrace_ops trampoline logic.
One was that the old hash was not updated before calling the modify
code for an ftrace_ops. The second bug was what let the first bug go
unnoticed, as the update would check the current hash for all
ftrace_ops (where it should only check the old hash for modified
ones). This let things work when only one ftrace_ops was registered
to a function, but could break if more than one was registered
depending on the order of the look ups.
The worse thing that can happen if this bug triggers is that the
ftrace self checks would find an anomaly and shut itself down"
* tag 'trace-fixes-v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix checking of trampoline ftrace_ops in finding trampoline
ftrace: Set ops->old_hash on modifying what an ops hooks to
Paul E. McKenney [Mon, 27 Oct 2014 16:15:54 +0000 (09:15 -0700)]
rcu: Make rcu_barrier() understand about missing rcuo kthreads
Commit
35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online. This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit
35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.
It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix. The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.
It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread. This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks. It is therefore required to wait even for those callbacks
that cannot possibly be invoked. Even if doing so hangs the system.
Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case. Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().
So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.
Reported-by: Yanko Kaneti <yaneti@declera.com>
Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Eric B Munson <emunson@akamai.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Eric B Munson <emunson@akamai.com>
Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Tested-by: Yanko Kaneti <yaneti@declera.com>
Tested-by: Kevin Fenzi <kevin@scrye.com>
Tested-by: Meelis Roos <mroos@linux.ee>
Linus Torvalds [Tue, 28 Oct 2014 20:17:11 +0000 (13:17 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A couple of ARM fixes.
We fix some printk formats for ptrdiff_t quantities which cause GCC
4.9 to complain, and we also blacklist known buggy GCC 4.8.x compilers
as their miscompilation is serious enough to cause filesystem
corruption, even through many distros have fixed their versions"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: fix some printk formats
ARM: Blacklist GCC 4.8.0 to GCC 4.8.2 - PR58854
Will Deacon [Tue, 28 Oct 2014 20:16:28 +0000 (13:16 -0700)]
zap_pte_range: update addr when forcing flush after TLB batching faiure
When unmapping a range of pages in zap_pte_range, the page being
unmapped is added to an mmu_gather_batch structure for asynchronous
freeing. If we run out of space in the batch structure before the range
has been completely unmapped, then we break out of the loop, force a
TLB flush and free the pages that we have batched so far. If there are
further pages to unmap, then we resume the loop where we left off.
Unfortunately, we forget to update addr when we break out of the loop,
which causes us to truncate the range being invalidated as the end
address is exclusive. When we re-enter the loop at the same address, the
page has already been freed and the pte_present test will fail, meaning
that we do not reconsider the address for invalidation.
This patch fixes the problem by incrementing addr by the PAGE_SIZE
before breaking out of the loop on batch failure.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Tue, 28 Oct 2014 18:30:43 +0000 (11:30 -0700)]
sparc: Hook up bpf system call.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Battersby [Thu, 23 Oct 2014 19:10:21 +0000 (15:10 -0400)]
lib/scatterlist: fix memory leak with scsi-mq
Fix a memory leak with scsi-mq triggered by commands with large data
transfer length.
Fixes:
c53c6d6a68b1 ("scatterlist: allow chaining to preallocated chunks")
Cc: <stable@vger.kernel.org> # 3.17.x
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Dmitry Kasatkin [Tue, 28 Oct 2014 12:28:49 +0000 (14:28 +0200)]
evm: check xattr value length and type in evm_inode_setxattr()
evm_inode_setxattr() can be called with no value. The function does not
check the length so that following command can be used to produce the
kernel oops: setfattr -n security.evm FOO. This patch fixes it.
Changes in v3:
* there is no reason to return different error codes for EVM_XATTR_HMAC
and non EVM_XATTR_HMAC. Remove unnecessary test then.
Changes in v2:
* testing for validity of xattr type
[ 1106.396921] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1106.398192] IP: [<
ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48
[ 1106.399244] PGD
29048067 PUD
290d7067 PMD 0
[ 1106.399953] Oops: 0000 [#1] SMP
[ 1106.400020] Modules linked in: bridge stp llc evdev serio_raw i2c_piix4 button fuse
[ 1106.400020] CPU: 0 PID: 3635 Comm: setxattr Not tainted 3.16.0-kds+ #2936
[ 1106.400020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1106.400020] task:
ffff8800291a0000 ti:
ffff88002917c000 task.ti:
ffff88002917c000
[ 1106.400020] RIP: 0010:[<
ffffffff812af7b8>] [<
ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48
[ 1106.400020] RSP: 0018:
ffff88002917fd50 EFLAGS:
00010246
[ 1106.400020] RAX:
0000000000000000 RBX:
ffff88002917fdf8 RCX:
0000000000000000
[ 1106.400020] RDX:
0000000000000000 RSI:
ffffffff818136d3 RDI:
ffff88002917fdf8
[ 1106.400020] RBP:
ffff88002917fd68 R08:
0000000000000000 R09:
00000000003ec1df
[ 1106.400020] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff8800438a0a00
[ 1106.400020] R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
[ 1106.400020] FS:
00007f7dfa7d7740(0000) GS:
ffff88005da00000(0000) knlGS:
0000000000000000
[ 1106.400020] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 1106.400020] CR2:
0000000000000000 CR3:
000000003763e000 CR4:
00000000000006f0
[ 1106.400020] Stack:
[ 1106.400020]
ffff8800438a0a00 ffff88002917fdf8 0000000000000000 ffff88002917fd98
[ 1106.400020]
ffffffff812a1030 ffff8800438a0a00 ffff88002917fdf8 0000000000000000
[ 1106.400020]
0000000000000000 ffff88002917fde0 ffffffff8116d08a ffff88002917fdc8
[ 1106.400020] Call Trace:
[ 1106.400020] [<
ffffffff812a1030>] security_inode_setxattr+0x5d/0x6a
[ 1106.400020] [<
ffffffff8116d08a>] vfs_setxattr+0x6b/0x9f
[ 1106.400020] [<
ffffffff8116d1e0>] setxattr+0x122/0x16c
[ 1106.400020] [<
ffffffff811687e8>] ? mnt_want_write+0x21/0x45
[ 1106.400020] [<
ffffffff8114d011>] ? __sb_start_write+0x10f/0x143
[ 1106.400020] [<
ffffffff811687e8>] ? mnt_want_write+0x21/0x45
[ 1106.400020] [<
ffffffff811687c0>] ? __mnt_want_write+0x48/0x4f
[ 1106.400020] [<
ffffffff8116d3e6>] SyS_setxattr+0x6e/0xb0
[ 1106.400020] [<
ffffffff81529da9>] system_call_fastpath+0x16/0x1b
[ 1106.400020] Code: c3 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 d5 41 54 49 89 fc 53 48 89 f3 48 c7 c6 d3 36 81 81 48 89 df e8 18 22 04 00 85 c0 75 07 <41> 80 7d 00 02 74 0d 48 89 de 4c 89 e7 e8 5a fe ff ff eb 03 83
[ 1106.400020] RIP [<
ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48
[ 1106.400020] RSP <
ffff88002917fd50>
[ 1106.400020] CR2:
0000000000000000
[ 1106.428061] ---[ end trace
ae08331628ba3050 ]---
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Dmitry Kasatkin [Tue, 28 Oct 2014 11:31:22 +0000 (13:31 +0200)]
ima: check xattr value length and type in the ima_inode_setxattr()
ima_inode_setxattr() can be called with no value. Function does not
check the length so that following command can be used to produce
kernel oops: setfattr -n security.ima FOO. This patch fixes it.
Changes in v3:
* for stable reverted "allow setting hash only in fix or log mode"
It will be a separate patch.
Changes in v2:
* testing validity of xattr type
* allow setting hash only in fix or log mode (Mimi)
[ 261.562522] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 261.564109] IP: [<
ffffffff812af272>] ima_inode_setxattr+0x3e/0x5a
[ 261.564109] PGD
3112f067 PUD
42965067 PMD 0
[ 261.564109] Oops: 0000 [#1] SMP
[ 261.564109] Modules linked in: bridge stp llc evdev serio_raw i2c_piix4 button fuse
[ 261.564109] CPU: 0 PID: 3299 Comm: setxattr Not tainted 3.16.0-kds+ #2924
[ 261.564109] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 261.564109] task:
ffff8800428c2430 ti:
ffff880042be0000 task.ti:
ffff880042be0000
[ 261.564109] RIP: 0010:[<
ffffffff812af272>] [<
ffffffff812af272>] ima_inode_setxattr+0x3e/0x5a
[ 261.564109] RSP: 0018:
ffff880042be3d50 EFLAGS:
00010246
[ 261.564109] RAX:
0000000000000001 RBX:
0000000000000000 RCX:
0000000000000015
[ 261.564109] RDX:
0000001500000000 RSI:
0000000000000000 RDI:
ffff8800375cc600
[ 261.564109] RBP:
ffff880042be3d68 R08:
0000000000000000 R09:
00000000004d6256
[ 261.564109] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff88002149ba00
[ 261.564109] R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
[ 261.564109] FS:
00007f6c1e219740(0000) GS:
ffff88005da00000(0000) knlGS:
0000000000000000
[ 261.564109] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 261.564109] CR2:
0000000000000000 CR3:
000000003b35a000 CR4:
00000000000006f0
[ 261.564109] Stack:
[ 261.564109]
ffff88002149ba00 ffff880042be3df8 0000000000000000 ffff880042be3d98
[ 261.564109]
ffffffff812a101b ffff88002149ba00 ffff880042be3df8 0000000000000000
[ 261.564109]
0000000000000000 ffff880042be3de0 ffffffff8116d08a ffff880042be3dc8
[ 261.564109] Call Trace:
[ 261.564109] [<
ffffffff812a101b>] security_inode_setxattr+0x48/0x6a
[ 261.564109] [<
ffffffff8116d08a>] vfs_setxattr+0x6b/0x9f
[ 261.564109] [<
ffffffff8116d1e0>] setxattr+0x122/0x16c
[ 261.564109] [<
ffffffff811687e8>] ? mnt_want_write+0x21/0x45
[ 261.564109] [<
ffffffff8114d011>] ? __sb_start_write+0x10f/0x143
[ 261.564109] [<
ffffffff811687e8>] ? mnt_want_write+0x21/0x45
[ 261.564109] [<
ffffffff811687c0>] ? __mnt_want_write+0x48/0x4f
[ 261.564109] [<
ffffffff8116d3e6>] SyS_setxattr+0x6e/0xb0
[ 261.564109] [<
ffffffff81529da9>] system_call_fastpath+0x16/0x1b
[ 261.564109] Code: 48 89 f7 48 c7 c6 58 36 81 81 53 31 db e8 73 27 04 00 85 c0 75 28 bf 15 00 00 00 e8 8a a5 d9 ff 84 c0 75 05 83 cb ff eb 15 31 f6 <41> 80 7d 00 03 49 8b 7c 24 68 40 0f 94 c6 e8 e1 f9 ff ff 89 d8
[ 261.564109] RIP [<
ffffffff812af272>] ima_inode_setxattr+0x3e/0x5a
[ 261.564109] RSP <
ffff880042be3d50>
[ 261.564109] CR2:
0000000000000000
[ 261.599998] ---[ end trace
39a89a3fc267e652 ]---
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Jim Davis [Wed, 20 Aug 2014 20:29:41 +0000 (13:29 -0700)]
Documentation: remove outdated references to the linux-next wiki
The linux-next wiki at http://linux.f-seidel.de/linux-next/pmwiki has
been gone for several months now.
Signed-off-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Alexander Graf [Tue, 28 Oct 2014 00:03:59 +0000 (01:03 +0100)]
Documentation: Restrict TSC test code to x86
The prctl test code in Documentation/ tries to show how to
use a call that only makes sense on x86. Restrict it there
so that other platforms don't try to call asm("rdtsc").
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Peter Foley <pefoley2@pefoley.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Takashi Iwai [Tue, 28 Oct 2014 11:42:19 +0000 (12:42 +0100)]
ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
In compat mode, we copy each field of snd_pcm_status struct but don't
touch the reserved fields, and this leaves uninitialized values
there. Meanwhile the native ioctl does zero-clear the whole
structure, so we should follow the same rule in compat mode, too.
Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Maciej W. Rozycki [Sun, 26 Oct 2014 16:06:28 +0000 (16:06 +0000)]
x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
Fix duplicate XT-PIC seen in /proc/interrupts on x86 systems
that make use of 8259A Programmable Interrupt Controllers.
Specifically convert output like this:
CPU0
0: 76573 XT-PIC-XT-PIC timer
1: 11 XT-PIC-XT-PIC i8042
2: 0 XT-PIC-XT-PIC cascade
4: 8 XT-PIC-XT-PIC serial
6: 3 XT-PIC-XT-PIC floppy
7: 0 XT-PIC-XT-PIC parport0
8: 1 XT-PIC-XT-PIC rtc0
10: 448 XT-PIC-XT-PIC fddi0
12: 23 XT-PIC-XT-PIC eth0
14: 2464 XT-PIC-XT-PIC ide0
NMI: 0 Non-maskable interrupts
ERR: 0
to one like this:
CPU0
0: 122033 XT-PIC timer
1: 11 XT-PIC i8042
2: 0 XT-PIC cascade
4: 8 XT-PIC serial
6: 3 XT-PIC floppy
7: 0 XT-PIC parport0
8: 1 XT-PIC rtc0
10: 145 XT-PIC fddi0
12: 31 XT-PIC eth0
14: 2245 XT-PIC ide0
NMI: 0 Non-maskable interrupts
ERR: 0
that is one like we used to have from ~2.2 till it was changed
sometime.
The rationale is there is no value in this duplicate
information, it merely clutters output and looks ugly. We only
have one handler for 8259A interrupts so there is no need to
give it a name separate from the name already given to
irq_chip.
We could define meaningful names for handlers based on bits in
the ELCR register on systems that have it or the value of the
LTIM bit we use in ICW1 otherwise (hardcoded to 0 though with
MCA support gone), to tell edge-triggered and level-triggered
inputs apart. While that information does not affect 8259A
interrupt handlers it could help people determine which lines
are shareable and which are not. That is material for a
separate change though.
Any tools that parse /proc/interrupts are supposed not to be
affected since it was many years we used the format this change
converts back to.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1410260147190.21390@eddie.linux-mips.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Steven Noonan [Sat, 25 Oct 2014 22:09:42 +0000 (15:09 -0700)]
compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles
The bug referenced by the comment in this commit was not
completely fixed in GCC 4.8.2, as I mentioned in a thread back
in February:
https://lkml.org/lkml/2014/2/12/797
The conclusion at that time was to make the quirk unconditional
until the bug could be found and fixed in GCC. Unfortunately,
when I submitted the patch (commit
a9f18034) I left a comment
in that claimed the bug was fixed in GCC 4.8.2+.
This comment is inaccurate, and should be removed.
Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1414274982-14040-1-git-send-email-steven@uplinklabs.net
Cc: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 24 Oct 2014 07:12:35 +0000 (09:12 +0200)]
perf/x86: Fix compile warnings for intel_uncore
The uncore drivers require PCI and generate compile time warnings when
!CONFIG_PCI.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Fri, 3 Oct 2014 05:16:36 +0000 (22:16 -0700)]
perf: Fix typos in sample code in the perf_event.h header
struct perf_event_mmap_page has members called "index" and
"cap_user_rdpmc". Spell them correctly in the examples.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/320ba26391a8123cc16e5f02d24d34bd404332fd.1412313343.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 21 Oct 2014 09:10:21 +0000 (11:10 +0200)]
perf: Fix and clean up initialization of pmu::event_idx
Andy reported that the current state of event_idx is rather confused.
So remove all but the x86_pmu implementation and change the default to
return 0 (the safe option).
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Himangi Saraogi <himangi774@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com>
Cc: Thomas Huth <thuth@linux.vnet.ibm.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux390@de.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra (Intel) [Tue, 7 Oct 2014 17:07:33 +0000 (19:07 +0200)]
perf: Fix bogus kernel printk
Andy spotted the fail in what was intended as a conditional printk level.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Fixes:
cc6cd47e7395 ("perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141007124757.GH19379@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kirill Tkhai [Tue, 21 Oct 2014 16:35:56 +0000 (20:35 +0400)]
sched/dl: Fix preemption checks
1) switched_to_dl() check is wrong. We reschedule only
if rq->curr is deadline task, and we do not reschedule
if it's a lower priority task. But we must always
preempt a task of other classes.
2) dl_task_timer():
Policy does not change in case of priority inheritance.
rt_mutex_setprio() changes prio, while policy remains old.
So we lose some balancing logic in dl_task_timer() and
switched_to_dl() when we check policy instead of priority. Boosted
task may be rq->curr.
(I didn't change switched_from_dl() because no check is necessary
there at all).
I've looked at this place(switched_to_dl) several times and even fixed
this function, but found just now... I suppose some performance tests
may work better after this.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1413909356.19914.128.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Chen Hanxiao [Tue, 7 Oct 2014 09:29:07 +0000 (17:29 +0800)]
sched: Update comments for CLONE_NEWNS
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/1412674147-8941-1-git-send-email-chenhanxiao@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 5 Oct 2014 20:23:22 +0000 (22:23 +0200)]
sched: stop the unbound recursion in preempt_schedule_context()
preempt_schedule_context() does preempt_enable_notrace() at the end
and this can call the same function again; exception_exit() is heavy
and it is quite possible that need-resched is true again.
1. Change this code to dec preempt_count() and check need_resched()
by hand.
2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid
the enable/disable dance around __schedule(). But in this case
we need to move into sched/core.c.
3. Cosmetic, but x86 forgets to declare this function. This doesn't
really matter because it is only called by asm helpers, still it
make sense to add the declaration into asm/preempt.h to match
preempt_schedule().
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kirill Tkhai [Thu, 16 Oct 2014 10:39:37 +0000 (14:39 +0400)]
sched/fair: Fix division by zero sysctl_numa_balancing_scan_size
File /proc/sys/kernel/numa_balancing_scan_size_mb allows writing of zero.
This bash command reproduces problem:
$ while :; do echo 0 > /proc/sys/kernel/numa_balancing_scan_size_mb; \
echo 256 > /proc/sys/kernel/numa_balancing_scan_size_mb; done
divide error: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 24112 Comm: bash Not tainted 3.17.0+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task:
ffff88013c852600 ti:
ffff880037a68000 task.ti:
ffff880037a68000
RIP: 0010:[<
ffffffff81074191>] [<
ffffffff81074191>] task_scan_min+0x21/0x50
RSP: 0000:
ffff880037a6bce0 EFLAGS:
00010246
RAX:
0000000000000a00 RBX:
00000000000003e8 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffff88013c852600
RBP:
ffff880037a6bcf0 R08:
0000000000000001 R09:
0000000000015c90
R10:
ffff880239bf6c00 R11:
0000000000000016 R12:
0000000000003fff
R13:
ffff88013c852600 R14:
ffffea0008d1b000 R15:
0000000000000003
FS:
00007f12bb048700(0000) GS:
ffff88007da00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
0000000001505678 CR3:
0000000234770000 CR4:
00000000000006f0
Stack:
ffff88013c852600 0000000000003fff ffff880037a6bd18 ffffffff810741d1
ffff88013c852600 0000000000003fff 000000000002bfff ffff880037a6bda8
ffffffff81077ef7 ffffea0008a56d40 0000000000000001 0000000000000001
Call Trace:
[<
ffffffff810741d1>] task_scan_max+0x11/0x40
[<
ffffffff81077ef7>] task_numa_fault+0x1f7/0xae0
[<
ffffffff8115a896>] ? migrate_misplaced_page+0x276/0x300
[<
ffffffff81134a4d>] handle_mm_fault+0x62d/0xba0
[<
ffffffff8103e2f1>] __do_page_fault+0x191/0x510
[<
ffffffff81030122>] ? native_smp_send_reschedule+0x42/0x60
[<
ffffffff8106dc00>] ? check_preempt_curr+0x80/0xa0
[<
ffffffff8107092c>] ? wake_up_new_task+0x11c/0x1a0
[<
ffffffff8104887d>] ? do_fork+0x14d/0x340
[<
ffffffff811799bb>] ? get_unused_fd_flags+0x2b/0x30
[<
ffffffff811799df>] ? __fd_install+0x1f/0x60
[<
ffffffff8103e67c>] do_page_fault+0xc/0x10
[<
ffffffff8150d322>] page_fault+0x22/0x30
RIP [<
ffffffff81074191>] task_scan_min+0x21/0x50
RSP <
ffff880037a6bce0>
---[ end trace
9a826d16936c04de ]---
Also fix race in task_scan_min (it depends on compiler behaviour).
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: David Rientjes <rientjes@google.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/r/1413455977.24793.78.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yasuaki Ishimatsu [Wed, 22 Oct 2014 07:04:35 +0000 (16:04 +0900)]
sched/fair: Care divide error in update_task_scan_period()
While offling node by hot removing memory, the following divide error
occurs:
divide error: 0000 [#1] SMP
[...]
Call Trace:
[...] handle_mm_fault
[...] ? try_to_wake_up
[...] ? wake_up_state
[...] __do_page_fault
[...] ? do_futex
[...] ? put_prev_entity
[...] ? __switch_to
[...] do_page_fault
[...] page_fault
[...]
RIP [<
ffffffff810a7081>] task_numa_fault
RSP <
ffff88084eb2bcb0>
The issue occurs as follows:
1. When page fault occurs and page is allocated from node 1,
task_struct->numa_faults_buffer_memory[] of node 1 is
incremented and p->numa_faults_locality[] is also incremented
as follows:
o numa_faults_buffer_memory[] o numa_faults_locality[]
NR_NUMA_HINT_FAULT_TYPES
| 0 | 1 |
---------------------------------- ----------------------
node 0 | 0 | 0 | remote | 0 |
node 1 | 0 | 1 | locale | 1 |
---------------------------------- ----------------------
2. node 1 is offlined by hot removing memory.
3. When page fault occurs, fault_types[] is calculated by using
p->numa_faults_buffer_memory[] of all online nodes in
task_numa_placement(). But node 1 was offline by step 2. So
the fault_types[] is calculated by using only
p->numa_faults_buffer_memory[] of node 0. So both of fault_types[]
are set to 0.
4. The values(0) of fault_types[] pass to update_task_scan_period().
5. numa_faults_locality[1] is set to 1. So the following division is
calculated.
static void update_task_scan_period(struct task_struct *p,
unsigned long shared, unsigned long private){
...
ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared));
}
6. But both of private and shared are set to 0. So divide error
occurs here.
The divide error is rare case because the trigger is node offline.
This patch always increments denominator for avoiding divide error.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/54475703.8000505@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kirill Tkhai [Wed, 22 Oct 2014 07:17:11 +0000 (11:17 +0400)]
sched/numa: Fix unsafe get_task_struct() in task_numa_assign()
Unlocked access to dst_rq->curr in task_numa_compare() is racy.
If curr task is exiting this may be a reason of use-after-free:
task_numa_compare() do_exit()
... current->flags |= PF_EXITING;
... release_task()
... ~~delayed_put_task_struct()~~
... schedule()
rcu_read_lock() ...
cur = ACCESS_ONCE(dst_rq->curr) ...
... rq->curr = next;
... context_switch()
... finish_task_switch()
... put_task_struct()
... __put_task_struct()
... free_task_struct()
task_numa_assign() ...
get_task_struct() ...
As noted by Oleg:
<<The lockless get_task_struct(tsk) is only safe if tsk == current
and didn't pass exit_notify(), or if this tsk was found on a rcu
protected list (say, for_each_process() or find_task_by_vpid()).
IOW, it is only safe if release_task() was not called before we
take rcu_read_lock(), in this case we can rely on the fact that
delayed_put_pid() can not drop the (potentially) last reference
until rcu_read_unlock().
And as Kirill pointed out task_numa_compare()->task_numa_assign()
path does get_task_struct(dst_rq->curr) and this is not safe. The
task_struct itself can't go away, but rcu_read_lock() can't save
us from the final put_task_struct() in finish_task_switch(); this
reference goes away without rcu gp>>
The patch provides simple check of PF_EXITING flag. If it's not set,
this guarantees that call_rcu() of delayed_put_task_struct() callback
hasn't happened yet, so we can safely do get_task_struct() in
task_numa_assign().
Locked dst_rq->lock protects from concurrency with the last schedule().
Reusing or unmapping of cur's memory may happen without it.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1413962231.19914.130.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Juri Lelli [Fri, 24 Oct 2014 09:16:38 +0000 (10:16 +0100)]
sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer()
dl_task_timer() is racy against several paths. Daniel noticed that
the replenishment timer may experience a race condition against an
enqueue_dl_entity() called from rt_mutex_setprio(). With his own
words:
rt_mutex_setprio() resets p->dl.dl_throttled. So the pattern is:
start_dl_timer() throttled = 1, rt_mutex_setprio() throlled = 0,
sched_switch() -> enqueue_task(), dl_task_timer-> enqueue_task()
throttled is 0
=> BUG_ON(on_dl_rq(dl_se)) fires as the scheduling entity is already
enqueued on the -deadline runqueue.
As we do for the other races, we just bail out in the replenishment
timer code.
Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: vincent@legout.info
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Fabio Checconi <fchecconi@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414142198-18552-5-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Juri Lelli [Fri, 24 Oct 2014 09:16:37 +0000 (10:16 +0100)]
sched/deadline: Don't replenish from a !SCHED_DEADLINE entity
In the deboost path, right after the dl_boosted flag has been
reset, we can currently end up replenishing using -deadline
parameters of a !SCHED_DEADLINE entity. This of course causes
a bug, as those parameters are empty.
In the case depicted above it is safe to simply bail out, as
the deboosted task is going to be back to its original scheduling
class anyway.
Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: vincent@legout.info
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Fabio Checconi <fchecconi@gmail.com>
Link: http://lkml.kernel.org/r/1414142198-18552-4-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kirill Tkhai [Mon, 27 Oct 2014 10:18:25 +0000 (14:18 +0400)]
sched: Fix race between task_group and sched_task_group
The race may happen when somebody is changing task_group of a forking task.
Child's cgroup is the same as parent's after dup_task_struct() (there just
memory copying). Also, cfs_rq and rt_rq are the same as parent's.
But if parent changes its task_group before it's called cgroup_post_fork(),
we do not reflect this situation on child. Child's cfs_rq and rt_rq remain
the same, while child's task_group changes in cgroup_post_fork().
To fix this we introduce fork() method, which calls sched_move_task() directly.
This function changes sched_task_group on appropriate (also its logic has
no problem with freshly created tasks, so we shouldn't introduce something
special; we are able just to use it).
Possibly, this decides the Burke Libbey's problem: https://lkml.org/lkml/2014/10/24/456
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414405105.19914.169.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ian Munsie [Tue, 28 Oct 2014 03:25:30 +0000 (14:25 +1100)]
cxl: Fix PSL error due to duplicate segment table entries
In certain circumstances the PSL (Power Service Layer, which provides
translation services for CXL hardware) can send an interrupt for a
segment miss that the kernel has already handled. This can happen if
multiple translations for the same segment are queued in the PSL before
the kernel has restarted the first translation.
The CXL driver does not expect this situation and does not check if a
segment had already been handled. This could cause a duplicate segment
table entry which in turn caused a PSL error taking down the card.
This patch fixes the issue by checking for existing entries in the
segment table that match the segment we are trying to insert, so as to
avoid inserting duplicate entries.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ian Munsie [Tue, 28 Oct 2014 03:25:29 +0000 (14:25 +1100)]
powerpc/mm: Use appropriate ESID mask in copro_calculate_slb()
This patch makes copro_calculate_slb() mask the ESID by the correct mask
for 1T vs 256M segments.
This has no effect by itself as the extra bits were ignored, but it
makes debugging the segment table entries easier and means that we can
directly compare the ESID values for duplicates without needing to worry
about masking in the comparison.
This will be used to simplify a comparison in the following patch.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ian Munsie [Tue, 28 Oct 2014 03:25:28 +0000 (14:25 +1100)]
cxl: Refactor cxl_load_segment() and find_free_sste()
This moves the segment table hash calculation from cxl_load_segment()
into find_free_sste() since that is the only place it is actually used.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ian Munsie [Tue, 28 Oct 2014 03:25:27 +0000 (14:25 +1100)]
cxl: Disable secondary hash in segment table
This patch simplifies the process of finding a free segment table entry
by disabling the secondary hash. This reduces the number of possible
entries in the segment table for a given address from 16 to 8.
Due to the large segment sizes we use it is extremely unlikely that the
secondary hash would ever have been used in practice, so this should not
have any negative impacts and may even improve performance due to the
reduced number of comparisons that software & hardware need to perform.
This patch clears the SC bit in the hardware's state register
(CXL_PSL_SR_An) to disable the secondary hash in the hardware since we
can no longer fill out entries using it.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Weijie Yang [Fri, 24 Oct 2014 09:00:34 +0000 (17:00 +0800)]
x86, cma: Reserve DMA contiguous area after initmem_init()
Fengguang Wu reported a boot crash on the x86 platform
via the 0-day Linux Kernel Performance Test:
cma: dma_contiguous_reserve: reserving 31 MiB for global area
BUG: Int 6: CR2 (null)
[<
41850786>] dump_stack+0x16/0x18
[<
41d2b1db>] early_idt_handler+0x6b/0x6b
[<
41072227>] ? __phys_addr+0x2e/0xca
[<
41d4ee4d>] cma_declare_contiguous+0x3c/0x2d7
[<
41d6d359>] dma_contiguous_reserve_area+0x27/0x47
[<
41d6d4d1>] dma_contiguous_reserve+0x158/0x163
[<
41d33e0f>] setup_arch+0x79b/0xc68
[<
41d2b7cf>] start_kernel+0x9c/0x456
[<
41d2b2ca>] i386_start_kernel+0x79/0x7d
(See details at: https://lkml.org/lkml/2014/10/8/708)
It is because dma_contiguous_reserve() is called before
initmem_init() in x86, the variable high_memory is not
initialized but accessed by __pa(high_memory) in
dma_contiguous_reserve().
This patch moves dma_contiguous_reserve() after initmem_init()
so that high_memory is initialized before accessed.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: iamjoonsoo.kim@lge.com
Cc: 'Linux-MM' <linux-mm@kvack.org>
Cc: 'Weijie Yang' <weijie.yang.kh@gmail.com>
Link: http://lkml.kernel.org/r/000101cfef69%2431e528a0%2495af79e0%24%25yang@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Michael Ellerman [Tue, 28 Oct 2014 01:13:32 +0000 (12:13 +1100)]
Revert "powerpc/powernv: Fix endian bug in LPC bus debugfs accessors"
This reverts commit
bf7588a0859580a45c63cb082825d77c13eca357.
Ben says although the code is not correct "[this] fix was completely
wrong and does more damages than it fixes things."
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>