Linus Torvalds [Thu, 16 Jun 2011 07:35:09 +0000 (00:35 -0700)]
mm: get rid of the most spurious find_vma_prev() users
We have some users of this function that date back to before the vma
list was doubly linked, and just are silly. These days, you can find
the previous vma by just following the vma->vm_prev pointer.
In some cases you don't need any find_vma() lookup at all, and in other
cases you're better off with the regular "find_vma()" that uses the vma
cache front-end lookup.
Some "find_vma_prev()" users are still valid, though. For example, in
the case of a stack that grows up, it can be the case that we don't find
any 'vma' at all (because we're looking up an address that is past the
last vma), and that the stack that we want to grow is the 'prev' vma.
But that kind of special case aside, we generally should prefer to use
'find_vma()'.
Noticed due to a totally unrelated POWER memory corruption bug that just
happened to hit in 'find_vma_prev()' and made me go "Hmm - why are we
using that function here?".
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 16 Jun 2011 05:01:36 +0000 (22:01 -0700)]
Merge branch 'fixes' of /home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: footbridge: fix clock event support
ARM: footbridge: fix debug macros
ARM: initrd: disable initrds outside of memory
ARM: extend Code: line by one 16-bit quantity for Thumb instructions
ARM: 6955/1: cmpxchg syscall should data abort if page not write
ARM: 6954/1: zImage: fix Thumb2 breakage
ARM: 6953/1: DT: don't try to access physical address zero
ARM: 6949/2: mach-u300: fix compilaton warning in IO accessors
Revert "ARM: 6944/1: mm: allow ASID 0 to be allocated to tasks"
Revert "ARM: 6943/1: mm: use TTBR1 instead of reserved context ID"
davinci: make PCM platform devices static
arm: davinci: Fix fallout from generic irq chip conversion
ARM: 6894/1: mmci: trigger card detect IRQs on falling and rising edges
ARM: 6952/1: fix lockdep warning of "unannotated irqs-off"
ARM: 6951/1: include .bss in memory layout information
ARM: 6948/1: Fix .size directives for __arm{7,9}tdmi_proc_info
ARM: 6947/2: mach-u300: fix compilation error in timer
ARM: 6946/1: vexpress: move v2m clock init to init_early
ARM: mx51/sdma: Check the chip revision in run-time
arm: mxs: include asm/processor.h for cpu_relax()
Linus Torvalds [Thu, 16 Jun 2011 04:53:52 +0000 (21:53 -0700)]
Revert "fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP"
This reverts commit
7f81c8890c15a10f5220bebae3b6dfae4961962a.
It turns out that it's not actually a build-time check on x86-64 UML,
which does some seriously crazy stuff with VM_STACK_FLAGS.
The VM_STACK_FLAGS define depends on the arch-supplied
VM_STACK_DEFAULT_FLAGS value, and on x86-64 UML we have
arch/um/sys-x86_64/shared/sysdep/vm-flags.h:
#define VM_STACK_DEFAULT_FLAGS \
(test_thread_flag(TIF_IA32) ? vm_stack_flags32 : vm_stack_flags)
#define VM_STACK_DEFAULT_FLAGS vm_stack_flags
(yes, seriously: two different #define's for that thing, with the first
one being inside an "#ifdef TIF_IA32")
It's possible that it is UML that should just be fixed in this area, but
for now let's just undo the (very small) optimization.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jörg Sommer [Wed, 15 Jun 2011 20:00:47 +0000 (13:00 -0700)]
Documentation: fix cgroup typos and formatting
Fix format and spelling.
Signed-off-by: Jörg Sommer <joerg@alea.gnuu.de>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jörg Sommer [Wed, 15 Jun 2011 19:59:45 +0000 (12:59 -0700)]
Documentation: update cgroupfs mount point
According to commit
676db4af0430 ("cgroupfs: create /sys/fs/cgroup to
mount cgroupfs on") the canonical mountpoint for the cgroup filesystem
is /sys/fs/cgroup. Hence, this should be used in the documentation.
Signed-off-by: Jörg Sommer <joerg@alea.gnuu.de>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maxin B. John [Wed, 15 Jun 2011 19:58:29 +0000 (12:58 -0700)]
Documentation: update kmemleak supported archs
Instead of listing the architectures that are supported by
kmemleak in Documentation/kmemleak.txt, just refer people to
the list of supported architecutures in lib/Kconfig.debug so
that Documentation/kmemleak.txt does not need more updates
for this.
Signed-off-by: Maxin B. John <maxin.john@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Murray [Wed, 15 Jun 2011 19:57:09 +0000 (12:57 -0700)]
Documentation: update printk-formats.txt
This patch updates the incomplete documentation concerning the printk
extended format specifiers.
Signed-off-by: Andrew Murray <amurray@mpc-data.co.uk>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 16 Jun 2011 04:45:18 +0000 (21:45 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Check if lowest_mask is initialized in find_lowest_rq()
sched: Fix need_resched() when checking peempt
Dan Rosenberg [Wed, 15 Jun 2011 22:09:01 +0000 (15:09 -0700)]
alpha: fix several security issues
Fix several security issues in Alpha-specific syscalls. Untested, but
mostly trivial.
1. Signedness issue in osf_getdomainname allows copying out-of-bounds
kernel memory to userland.
2. Signedness issue in osf_sysinfo allows copying large amounts of
kernel memory to userland.
3. Typo (?) in osf_getsysinfo bounds minimum instead of maximum copy
size, allowing copying large amounts of kernel memory to userland.
4. Usage of user pointer in osf_wait4 while under KERNEL_DS allows
privilege escalation via writing return value of sys_wait4 to kernel
memory.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Wed, 15 Jun 2011 22:08:59 +0000 (15:08 -0700)]
drivers/misc/apds990x.c: apds990x_chip_on() should depend on CONFIG_PM || CONFIG_PM_RUNTIME
Fixes this warning:
drivers/misc/apds990x.c: At top level:
drivers/misc/apds990x.c:613: warning: `apds990x_chip_on' defined but not used
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Samu Onkalo <samu.p.onkalo@nokia.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 15 Jun 2011 22:08:58 +0000 (15:08 -0700)]
ksm: fix NULL pointer dereference in scan_get_next_rmap_item()
Andrea Righi reported a case where an exiting task can race against
ksmd::scan_get_next_rmap_item (http://lkml.org/lkml/2011/6/1/742) easily
triggering a NULL pointer dereference in ksmd.
ksm_scan.mm_slot == &ksm_mm_head with only one registered mm
CPU 1 (__ksm_exit) CPU 2 (scan_get_next_rmap_item)
list_empty() is false
lock slot == &ksm_mm_head
list_del(slot->mm_list)
(list now empty)
unlock
lock
slot = list_entry(slot->mm_list.next)
(list is empty, so slot is still ksm_mm_head)
unlock
slot->mm == NULL ... Oops
Close this race by revalidating that the new slot is not simply the list
head again.
Andrea's test case:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#define BUFSIZE getpagesize()
int main(int argc, char **argv)
{
void *ptr;
if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
perror("posix_memalign");
exit(1);
}
if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
perror("madvise");
exit(1);
}
*(char *)NULL = 0;
return 0;
}
Reported-by: Andrea Righi <andrea@betterlinux.com>
Tested-by: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanlong Gao [Wed, 15 Jun 2011 22:08:56 +0000 (15:08 -0700)]
rtc: fix build warnings in defconfigs
RTC_CLASS is changed to bool, so 'm' is invalid.
Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Stein [Wed, 15 Jun 2011 22:08:55 +0000 (15:08 -0700)]
drivers/tty/serial/pch_uart.c: don't oops if dmi_get_system_info returns NULL
If dmi_get_system_info() returns NULL, pch_uart_init_port() will
dereferencea a zero pointer.
This oops was observed on an Atom based board which has no BIOS, but
a bootloder which doesn't provide DMI data.
Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: Greg KH <gregkh@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nils Carlson [Wed, 15 Jun 2011 22:08:54 +0000 (15:08 -0700)]
drivers/char/hpet.c: fix periodic-emulation for delayed interrupts
When interrupts are delayed due to interrupt masking or due to other
interrupts being serviced the HPET periodic-emuation would fail. This
happened because given an interval t and a time for the current interrupt
m we would compute the next time as t + m. This works until we are
delayed for > t, in which case we would be writing a new value which is in
fact in the past.
This can be solved by computing the next time instead as (k * t) + m where
k is large enough to be in the future. The exact computation of k is
described in a comment to the code.
More detail:
Assuming an interval of 5 between each expected interrupt we have a normal
case of
t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
t5: interrupt, read t5 from comparator, set next interrupt t5 + 5
t10: interrupt, read t10 from comparator, set next interrupt t10 + 5
...
So, what happens when the interrupt is serviced too late?
t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
t11: delayed interrupt serviced, read t5 from comparator, set next
interrupt t5 + 5, which is in the past!
... counter loops ...
t10: Much much later, get the next interrupt.
This can happen either because we have interrupts masked for too long
(some stupid driver goes on a printk rampage) or just because we are
pushing the limits of the interval (too small a period), or both most
probably.
My solution is to read the main counter as well and set the next interrupt
to occur at the right interval, for example:
t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
t11: delayed interrupt serviced, read t5 from comparator, set next
interrupt t15 as t10 has been missed.
t15: back on track.
Signed-off-by: Nils Carlson <nils.carlson@ericsson.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
akpm@linux-foundation.org [Wed, 15 Jun 2011 22:08:52 +0000 (15:08 -0700)]
Documentation/feature-removal-schedule.txt: remove ns_cgroup from feature-removal-schedule.txt
Commit
a77aea92010acf ("cgroup: remove the ns_cgroup") removed the
ns_cgroup but it forgot to remove the related doc in
feature-removal-schedule.txt.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Wed, 15 Jun 2011 22:08:52 +0000 (15:08 -0700)]
mm: compaction: abort compaction if too many pages are isolated and caller is asynchronous V2
Asynchronous compaction is used when promoting to huge pages. This is all
very nice but if there are a number of processes in compacting memory, a
large number of pages can be isolated. An "asynchronous" process can
stall for long periods of time as a result with a user reporting that
firefox can stall for 10s of seconds. This patch aborts asynchronous
compaction if too many pages are isolated as it's better to fail a
hugepage promotion than stall a process.
[minchan.kim@gmail.com: return COMPACT_PARTIAL for abort]
Reported-and-tested-by: Ury Stankevich <urykhy@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 15 Jun 2011 22:08:51 +0000 (15:08 -0700)]
mm: vmscan: do not use page_count without a page pin
It is unsafe to run page_count during the physical pfn scan because
compound_head could trip on a dangling pointer when reading
page->first_page if the compound page is being freed by another CPU.
[mgorman@suse.de: split out patch]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Wed, 15 Jun 2011 22:08:50 +0000 (15:08 -0700)]
mm: compaction: ensure that the compaction free scanner does not move to the next zone
Compaction works with two scanners, a migration and a free scanner. When
the scanners crossover, migration within the zone is complete. The
location of the scanner is recorded on each cycle to avoid excesive
scanning.
When a zone is small and mostly reserved, it's very easy for the migration
scanner to be close to the end of the zone. Then the following situation
can occurs
o migration scanner isolates some pages near the end of the zone
o free scanner starts at the end of the zone but finds that the
migration scanner is already there
o free scanner gets reinitialised for the next cycle as
cc->migrate_pfn + pageblock_nr_pages
moving the free scanner into the next zone
o migration scanner moves into the next zone
When this happens, NR_ISOLATED accounting goes haywire because some of the
accounting happens against the wrong zone. One zones counter remains
positive while the other goes negative even though the overall global
count is accurate. This was reported on X86-32 with !SMP because !SMP
allows the negative counters to be visible. The fact that it is the bug
should theoritically be possible there.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Wed, 15 Jun 2011 22:08:49 +0000 (15:08 -0700)]
compaction: checks correct fragmentation index
fragmentation_index() returns -1000 when the allocation might succeed
This doesn't match the comment and code in compaction_suitable(). I
thought compaction_suitable should return COMPACT_PARTIAL in -1000
case, because in this case allocation could succeed depending on
watermarks.
The impact of this is that compaction starts and compact_finished() is
called which rechecks the watermarks and the free lists. It should have
the same result in that compaction should not start but is more expensive.
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Wed, 15 Jun 2011 22:08:48 +0000 (15:08 -0700)]
mm/memory-failure.c: fix page isolated count mismatch
Pages isolated for migration are accounted with the vmstat counters
NR_ISOLATE_[ANON|FILE]. Callers of migrate_pages() are expected to
increment these counters when pages are isolated from the LRU. Once the
pages have been migrated, they are put back on the LRU or freed and the
isolated count is decremented.
Memory failure is not properly accounting for pages it isolates causing
the NR_ISOLATED counters to be negative. On SMP builds, this goes
unnoticed as negative counters are treated as 0 due to expected per-cpu
drift. On UP builds, the counter is treated by too_many_isolated() as a
large value causing processes to enter D state during page reclaim or
compaction. This patch accounts for pages isolated by memory failure
correctly.
[mel@csn.ul.ie: rewrote changelog]
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Triplett [Wed, 15 Jun 2011 22:08:47 +0000 (15:08 -0700)]
gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL
CONFIG_CONSTRUCTORS controls support for running constructor functions at
kernel init time. According to commit
b99b87f70c7785ab ("kernel:
constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this. However,
CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
and CONFIG_GCOV_KERNEL depends on it. Instead, default it to n and have
CONFIG_GCOV_KERNEL select it, so that the normal case of
CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n.
Observed in the short list of =y values in a minimal kernel configuration.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Wed, 15 Jun 2011 22:08:46 +0000 (15:08 -0700)]
MAINTAINERS: add entry for legacy eeprom driver
I shall maintain the legacy eeprom driver, until we finally get rid of it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:46 +0000 (15:08 -0700)]
memcg: avoid percpu cached charge draining at softlimit
Based on Michal Hocko's comment.
We are not draining per cpu cached charges during soft limit reclaim
because background reclaim doesn't care about charges. It tries to free
some memory and charges will not give any.
Cached charges might influence only selection of the biggest soft limit
offender but as the call is done only after the selection has been already
done it makes no change.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:45 +0000 (15:08 -0700)]
memcg: fix percpu cached charge draining frequency
For performance, memory cgroup caches some "charge" from res_counter into
per cpu cache. This works well but because it's cache, it needs to be
flushed in some cases. Typical cases are
1. when someone hit limit.
2. when rmdir() is called and need to charges to be 0.
But "1" has problem.
Recently, with large SMP machines, we see many kworker runs because of
flushing memcg's cache. Bad things in implementation are that even if a
cpu contains a cache for memcg not related to a memcg which hits limit,
drain code is called.
This patch does
A) check percpu cache contains a useful data or not.
B) check other asynchronous percpu draining doesn't run.
C) don't call local cpu callback.
(*)This patch avoid changing the calling condition with hard-limit.
When I run "cat 1Gfile > /dev/null" under 300M limit memcg,
[Before]
13767 kamezawa 20 0 98.6m 424 416 D 10.0 0.0 0:00.61 cat
58 root 20 0 0 0 0 S 0.6 0.0 0:00.09 kworker/2:1
60 root 20 0 0 0 0 S 0.6 0.0 0:00.08 kworker/4:1
4 root 20 0 0 0 0 S 0.3 0.0 0:00.02 kworker/0:0
57 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/1:1
61 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/5:1
62 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/6:1
63 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/7:1
[After]
2676 root 20 0 98.6m 416 416 D 9.3 0.0 0:00.87 cat
2626 kamezawa 20 0 15192 1312 920 R 0.3 0.0 0:00.28 top
1 root 20 0 19384 1496 1204 S 0.0 0.0 0:00.66 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0
[akpm@linux-foundation.org: make percpu_charge_mutex static, tweak comments]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:44 +0000 (15:08 -0700)]
memcg: fix wrong check of noswap with softlimit
Hierarchical reclaim doesn't swap out if memsw and resource limits are
thye same (memsw_is_minimum == true) because we would hit mem+swap limit
anyway (during hard limit reclaim).
If it comes to the soft limit we shouldn't consider memsw_is_minimum at
all because it doesn't make much sense. Either the soft limit is bellow
the hard limit and then we cannot hit mem+swap limit or the direct reclaim
takes a precedence.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:43 +0000 (15:08 -0700)]
memcg: clear mm->owner when last possible owner leaves
The following crash was reported:
> Call Trace:
> [<
ffffffff81139792>] mem_cgroup_from_task+0x15/0x17
> [<
ffffffff8113a75a>] __mem_cgroup_try_charge+0x148/0x4b4
> [<
ffffffff810493f3>] ? need_resched+0x23/0x2d
> [<
ffffffff814cbf43>] ? preempt_schedule+0x46/0x4f
> [<
ffffffff8113afe8>] mem_cgroup_charge_common+0x9a/0xce
> [<
ffffffff8113b6d1>] mem_cgroup_newpage_charge+0x5d/0x5f
> [<
ffffffff81134024>] khugepaged+0x5da/0xfaf
> [<
ffffffff81078ea0>] ? __init_waitqueue_head+0x4b/0x4b
> [<
ffffffff81133a4a>] ? add_mm_counter.constprop.5+0x13/0x13
> [<
ffffffff81078625>] kthread+0xa8/0xb0
> [<
ffffffff814d13e8>] ? sub_preempt_count+0xa1/0xb4
> [<
ffffffff814d5664>] kernel_thread_helper+0x4/0x10
> [<
ffffffff814ce858>] ? retint_restore_args+0x13/0x13
> [<
ffffffff8107857d>] ? __init_kthread_worker+0x5a/0x5a
What happens is that khugepaged tries to charge a huge page against an mm
whose last possible owner has already exited, and the memory controller
crashes when the stale mm->owner is used to look up the cgroup to charge.
mm->owner has never been set to NULL with the last owner going away, but
nobody cared until khugepaged came along.
Even then it wasn't a problem because the final mmput() on an mm was
forced to acquire and release mmap_sem in write-mode, preventing an
exiting owner to go away while the mmap_sem was held, and until "
692e0b3
mm: thp: optimize memcg charge in khugepaged", the memory cgroup charge
was protected by mmap_sem in read-mode.
Instead of going back to relying on the mmap_sem to enforce lifetime of a
task, this patch ensures that mm->owner is properly set to NULL when the
last possible owner is exiting, which the memory controller can handle
just fine.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:42 +0000 (15:08 -0700)]
memcg: fix init_page_cgroup nid with sparsemem
Commit
21a3c9646873 ("memcg: allocate memory cgroup structures in local
nodes") makes page_cgroup allocation as NUMA aware. But that caused a
problem https://bugzilla.kernel.org/show_bug.cgi?id=36192.
The problem was getting a NID from invalid struct pages, which was not
initialized because it was out-of-node, out of [node_start_pfn,
node_end_pfn)
Now, with sparsemem, page_cgroup_init scans pfn from 0 to max_pfn. But
this may scan a pfn which is not on any node and can access memmap which
is not initialized.
This makes page_cgroup_init() for SPARSEMEM node aware and remove a code
to get nid from page->flags. (Then, we'll use valid NID always.)
[akpm@linux-foundation.org: try to fix up comments]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:41 +0000 (15:08 -0700)]
mm: memory.numa_stat: fix file permission
Commit
406eb0c9ba76 ("memcg: add memory.numastat api for numa
statistics") adds memory.numa_stat file for memory cgroup. But the file
permissions are wrong.
[kamezawa@bluextal linux-2.6]$ ls -l /cgroup/memory/A/memory.numa_stat
---------- 1 root root 0 Jun 9 18:36 /cgroup/memory/A/memory.numa_stat
This patch fixes the permission as
[root@bluextal kamezawa]# ls -l /cgroup/memory/A/memory.numa_stat
-r--r--r-- 1 root root 0 Jun 10 16:49 /cgroup/memory/A/memory.numa_stat
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Miao [Wed, 15 Jun 2011 22:08:40 +0000 (15:08 -0700)]
leds: fix the incorrect display in menuconfig
Seems when a config option does not have a dependency of the menuconfig,
it messes the display of the rest configs, even if it's a hidden one.
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael Aquini [Wed, 15 Jun 2011 22:08:39 +0000 (15:08 -0700)]
mm: fix negative commitlimit when gigantic hugepages are allocated
When 1GB hugepages are allocated on a system, free(1) reports less
available memory than what really is installed in the box. Also, if the
total size of hugepages allocated on a system is over half of the total
memory size, CommitLimit becomes a negative number.
The problem is that gigantic hugepages (order > MAX_ORDER) can only be
allocated at boot with bootmem, thus its frames are not accounted to
'totalram_pages'. However, they are accounted to hugetlb_total_pages()
What happens to turn CommitLimit into a negative number is this
calculation, in fs/proc/meminfo.c:
allowed = ((totalram_pages - hugetlb_total_pages())
* sysctl_overcommit_ratio / 100) + total_swap_pages;
A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.
Also, every vm statistic which depends on 'totalram_pages' will render
confusing values, as if system were 'missing' some part of its memory.
Impact of this bug:
When gigantic hugepages are allocated and sysctl_overcommit_memory ==
OVERCOMMIT_NEVER. In a such situation, __vm_enough_memory() goes through
the mentioned 'allowed' calculation and might end up mistakenly returning
-ENOMEM, thus forcing the system to start reclaiming pages earlier than it
would be ususal, and this could cause detrimental impact to overall
system's performance, depending on the workload.
Besides the aforementioned scenario, I can only think of this causing
annoyances with memory reports from /proc/meminfo and free(1).
[akpm@linux-foundation.org: standardize comment layout]
Reported-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Rafael Aquini <aquini@linux.com>
Acked-by: Russ Anderson <rja@sgi.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:38 +0000 (15:08 -0700)]
mm/memory_hotplug.c: fix building of node hotplug zonelist
During memory hotplug we refresh zonelists when we online a page in a new
zone. It means that the node's zonelist is not initialized until pages
are onlined. So for example, "nid" passed by MEM_GOING_ONLINE notifier
will point to NODE_DATA(nid) which has no zone fallback list. Moreover,
if we hot-add cpu-only nodes, alloc_pages() will do no fallback.
This patch makes a zonelist when a new pgdata is available.
Note: in production, at fujitsu, memory should be onlined before cpu
and our server didn't have any memory-less nodes and had no problems.
But recent changes in MEM_GOING_ONLINE+page_cgroup
will access not initialized zonelist of node.
Anyway, there are memory-less node and we need some care.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Wed, 15 Jun 2011 22:08:37 +0000 (15:08 -0700)]
init/calibrate.c: remove annoying printk
Remove calibrate_delay_direct()'s KERN_DEBUG printk related to bogomips
calculation as it appears when booting every core on setups with
'ignore_loglevel' which dmesg people scan for possible issues. As the
message doesn't show very useful information to the widest audience of
kernel boot message gazers, it should be removed.
Introduced by commit
d2b463135f84 ("init/calibrate.c: fix for critical
bogoMIPS intermittent calculation failure").
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andrew Worsley <amworsley@gmail.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Wed, 15 Jun 2011 22:08:35 +0000 (15:08 -0700)]
w1: W1_MASTER_DS1WM should depend on GENERIC_HARDIRQS
On m68k (which doesn't support generic hardirqs yet):
drivers/w1/masters/ds1wm.c: In function `ds1wm_probe':
drivers/w1/masters/ds1wm.c: error: implicit declaration of function `irq_set_irq_type'
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Jean-Franois Dagenais <dagenaisj@sonatest.com>
Cc: Matt Reimer <mreimer@vpop.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Kaiser [Wed, 15 Jun 2011 22:08:34 +0000 (15:08 -0700)]
include/asm-generic/pgtable.h: fix unbalanced parenthesis
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pawel Osciak [Wed, 15 Jun 2011 22:08:32 +0000 (15:08 -0700)]
MAINTAINERS: add videobuf2 maintainers
Add maintainers for the videobuf2 V4L2 driver framework.
Signed-off-by: Pawel Osciak <pawel@osciak.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Wed, 15 Jun 2011 22:08:31 +0000 (15:08 -0700)]
leds: move LEDS_GPIO_REGISTER out of menuconfig NEW_LEDS
Commit
4440673a95e6 ("leds: provide helper to register "leds-gpio"
devices") broke the display of the NEW_LEDS menu as it didn't depend on
NEW_LEDS and so made "LED drivers" and "LED Triggers" appear at the same
level as "LED Support" instead of below it as it was before
4440673a.
Moving LEDS_GPIO_REGISTER out of the menuconfig NEW_LEDS fixes this
unintended side effect.
Reported-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Wed, 15 Jun 2011 22:08:31 +0000 (15:08 -0700)]
drivers/leds/leds-asic3: make LEDS_ASIC3 depend on LEDS_CLASS
We call led_classdev_unregister/led_classdev_register in
asic3_led_remove/asic3_led_probe, thus make LEDS_ASIC3 depend on
LEDS_CLASS.
This patch fixes below build error if LEDS_CLASS is not configured.
LD .tmp_vmlinux1
drivers/built-in.o: In function `asic3_led_remove':
clkdev.c:(.devexit.text+0x1860): undefined reference to `led_classdev_unregister'
drivers/built-in.o: In function `asic3_led_probe':
clkdev.c:(.devinit.text+0xcee8): undefined reference to `led_classdev_register'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Paul Parsons <lost.distance@yahoo.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Balbir Singh [Wed, 15 Jun 2011 22:08:30 +0000 (15:08 -0700)]
MAINTAINERS: Balbir has moved
Update my email address. Email will start to the old address bouncing
soon
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Triplett [Wed, 15 Jun 2011 22:08:28 +0000 (15:08 -0700)]
uts: make default hostname configurable, rather than always using "(none)"
The "hostname" tool falls back to setting the hostname to "localhost" if
/etc/hostname does not exist. Distribution init scripts have the same
fallback. However, if userspace never calls sethostname, such as when
booting with init=/bin/sh, or otherwise booting a minimal system without
the usual init scripts, the default hostname of "(none)" remains,
unhelpfully appearing in various places such as prompts ("root@(none):~#")
and logs. Furthermore, "(none)" doesn't typically resolve to anything
useful.
Make the default hostname configurable. This removes the need for the
standard fallback, provides a useful default for systems that never call
sethostname, and makes minimal systems that much more useful with less
configuration. Distributions could choose to use "localhost" here to
avoid the fallback, while embedded systems may wish to use a specific
target hostname.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David Miller <davem@davemloft.net>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Kel Modderman <kel@otaku42.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dr. David Alan Gilbert [Wed, 15 Jun 2011 22:08:27 +0000 (15:08 -0700)]
BUILD_BUG_ON_ZERO: fix sparse breakage
BUILD_BUG_ON_ZERO and BUILD_BUG_ON_NULL must return values, even in the
CHECKER case otherwise various users of it become syntactically invalid.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Wed, 15 Jun 2011 22:08:25 +0000 (15:08 -0700)]
mm: compaction: fix special case -1 order checks
Commit
56de7263fcf3 ("mm: compaction: direct compact when a high-order
allocation fails") introduced a check for cc->order == -1 in
compact_finished. We should continue compacting in that case because
the request came from userspace and there is no particular order to
compact for. Similar check has been added by
82478fb7 (mm: compaction:
prevent division-by-zero during user-requested compaction) for
compaction_suitable.
The check is, however, done after zone_watermark_ok which uses order as a
right hand argument for shifts. Not only watermark check is pointless if
we can break out without it but it also uses 1 << -1 which is not well
defined (at least from C standard). Let's move the -1 check above
zone_watermark_ok.
[minchan.kim@gmail.com> - caught compaction_suitable]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Rostedt [Wed, 15 Jun 2011 22:08:23 +0000 (15:08 -0700)]
mm: fix wrong kunmap_atomic() pointer
Running a ktest.pl test, I hit the following bug on x86_32:
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:81 __kunmap_atomic+0x64/0xc1()
Hardware name:
Modules linked in:
Pid: 93, comm: sh Not tainted 2.6.39-test+ #1
Call Trace:
[<
c04450da>] warn_slowpath_common+0x7c/0x91
[<
c042f5df>] ? __kunmap_atomic+0x64/0xc1
[<
c042f5df>] ? __kunmap_atomic+0x64/0xc1^M
[<
c0445111>] warn_slowpath_null+0x22/0x24
[<
c042f5df>] __kunmap_atomic+0x64/0xc1
[<
c04d4a22>] unmap_vmas+0x43a/0x4e0
[<
c04d9065>] exit_mmap+0x91/0xd2
[<
c0443057>] mmput+0x43/0xad
[<
c0448358>] exit_mm+0x111/0x119
[<
c044855f>] do_exit+0x1ff/0x5fa
[<
c0454ea2>] ? set_current_blocked+0x3c/0x40
[<
c0454f24>] ? sigprocmask+0x7e/0x8e
[<
c0448b55>] do_group_exit+0x65/0x88
[<
c0448b90>] sys_exit_group+0x18/0x1c
[<
c0c3915f>] sysenter_do_call+0x12/0x38
---[ end trace
8055f74ea3c0eb62 ]---
Running a ktest.pl git bisect, found the culprit: commit
e303297e6c3a
("mm: extended batches for generic mmu_gather")
But although this was the commit triggering the bug, it was not the one
originally responsible for the bug. That was commit
d16dfc550f53 ("mm:
mmu_gather rework").
The code in zap_pte_range() has something that looks like the following:
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
do {
[...]
} while (pte++, addr += PAGE_SIZE, addr != end);
pte_unmap_unlock(pte - 1, ptl);
The pte starts off pointing at the first element in the page table
directory that was returned by the pte_offset_map_lock(). When it's done
with the page, pte will be pointing to anything between the next entry and
the first entry of the next page inclusive. By doing a pte - 1, this puts
the pte back onto the original page, which is all that pte_unmap_unlock()
needs.
In most archs (64 bit), this is not an issue as the pte is ignored in the
pte_unmap_unlock(). But on 32 bit archs, where things may be kmapped, it
is essential that the pte passed to pte_unmap_unlock() resides on the same
page that was given by pte_offest_map_lock().
The problem came in
d16dfc55 ("mm: mmu_gather rework") where it introduced
a "break;" from the while loop. This alone did not seem to easily trigger
the bug. But the modifications made by
e303297e6 caused that "break;" to
be hit on the first iteration, before the pte++.
The pte not being incremented will now cause pte_unmap_unlock(pte - 1) to
be pointing to the previous page. This will cause the wrong page to be
unmapped, and also trigger the warning above.
The simple solution is to just save the pointer given by
pte_offset_map_lock() and use it in the unlock.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christian Gmeiner [Wed, 15 Jun 2011 22:08:22 +0000 (15:08 -0700)]
drivers/misc/cs5535-mfgpt.c: fix wrong if condition
Fix the wrong `if' condition for the check if the requested timer is
available.
The bitmap avail is used to store if a timer is used already. test_bit()
is used to check if the requested timer is available. If a bit in the
avail bitmap is set it means that the timer is available.
The runtime effect would be that allocating a specific timer always fails
(versus telling cs5535_mfgpt_alloc_timer to allocate the first available
timer, which works).
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Wed, 15 Jun 2011 22:08:21 +0000 (15:08 -0700)]
drivers/misc/spear13xx_pcie_gadget.c: fix a memory leak in spear_pcie_gadget_probe error path
In the case of goto err_kzalloc, we should kfree target.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Pratyush Anand <pratyush.anand@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Wed, 15 Jun 2011 22:08:20 +0000 (15:08 -0700)]
mm: increase RECLAIM_DISTANCE to 30
Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236)
that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual
Xeon E5520 + Intel S5520UR MB). He is using Cyrus IMAPd and it's built on
a very traditional single-process model.
* a master process which reads config files and manages the other
process
* multiple imapd processes, one per connection
* multiple pop3d processes, one per connection
* multiple lmtpd processes, one per connection
* periodical "cleanup" processes.
There are thousands of independent processes. The problem is, recent
Intel motherboard turn on zone_reclaim_mode by default and traditional
prefork model software don't work well on it. Unfortunatelly, such models
are still typical even in the 21st century. We can't ignore them.
This patch raises the zone_reclaim_mode threshold to 30. 30 doesn't have
any specific meaning. but 20 means that one-hop QPI/Hypertransport and
such relatively cheap 2-4 socket machine are often used for traditional
servers as above. The intention is that these machines don't use
zone_reclaim_mode.
Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions.
This patch doesn't change such high-end NUMA machine behavior.
Dave Hansen said:
: I know specifically of pieces of x86 hardware that set the information
: in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode
: behavior which that implies.
:
: They've done performance testing and run very large and scary benchmarks
: to make sure that they _want_ this turned on. What this means for them
: is that they'll probably be de-optimized, at least on newer versions of
: the kernel.
:
: If you want to do this for particular systems, maybe _that_'s what we
: should do. Have a list of specific configurations that need the
: defaults overridden either because they're buggy, or they have an
: unusual hardware configuration not really reflected in the distance
: table.
And later said:
: The original change in the hardware tables was for the benefit of a
: benchmark. Said benchmark isn't going to get run on mainline until the
: next batch of enterprise distros drops, at which point the hardware where
: this was done will be irrelevant for the benchmark. I'm sure any new
: hardware will just set this distance to another yet arbitrary value to
: make the kernel do what it wants. :)
:
: Also, when the hardware got _set_ to this initially, I complained. So, I
: guess I'm getting my way now, with this patch. I'm cool with it.
Reported-by: Robert Mueller <robm@fastmail.fm>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Wed, 15 Jun 2011 22:08:17 +0000 (15:08 -0700)]
checkpatch: add warning for uses of printk_ratelimit
Warn about uses of printk_ratelimit() because it uses a global state and
can hide subsequent useful messages.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 15 Jun 2011 22:08:17 +0000 (15:08 -0700)]
kmsg_dump.h: fix build when CONFIG_PRINTK is disabled
Fix <linux/kmsg_dump.h> when CONFIG_PRINTK is not enabled:
include/linux/kmsg_dump.h:56: error: 'EINVAL' undeclared (first use in this function)
include/linux/kmsg_dump.h:61: error: 'EINVAL' undeclared (first use in this function)
Looks like commit
595dd3d8bf95 ("kmsg_dump: fix build for
CONFIG_PRINTK=n") uses EINVAL without having the needed header file(s),
but I'm sure that I build tested that patch also. oh well.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ying Han [Wed, 15 Jun 2011 22:08:16 +0000 (15:08 -0700)]
memcg: add documentation for the memory.numastat API
[akpm@linux-foundation.org: rework text, fit it into 80-cols]
Signed-off-by: Ying Han <yinghan@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Wed, 15 Jun 2011 22:08:15 +0000 (15:08 -0700)]
vmscan: implement swap token priority aging
While testing for memcg aware swap token, I observed a swap token was
often grabbed an intermittent running process (eg init, auditd) and they
never release a token.
Why?
Some processes (eg init, auditd, audispd) wake up when a process exiting.
And swap token can be get first page-in process when a process exiting
makes no swap token owner. Thus such above intermittent running process
often get a token.
And currently, swap token priority is only decreased at page fault path.
Then, if the process sleep immediately after to grab swap token, the swap
token priority never be decreased. That's obviously undesirable.
This patch implement very poor (and lightweight) priority aging. It only
be affect to the above corner case and doesn't change swap tendency
workload performance (eg multi process qsbench load)
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Wed, 15 Jun 2011 22:08:14 +0000 (15:08 -0700)]
vmscan: implement swap token trace
This is useful for observing swap token activity.
example output:
zsh-1845 [000] 598.962716: update_swap_token_priority:
mm=
ffff88015eaf7700 old_prio=1 new_prio=0
memtoy-1830 [001] 602.033900: update_swap_token_priority:
mm=
ffff880037a45880 old_prio=947 new_prio=949
memtoy-1830 [000] 602.041509: update_swap_token_priority:
mm=
ffff880037a45880 old_prio=949 new_prio=951
memtoy-1830 [000] 602.051959: update_swap_token_priority:
mm=
ffff880037a45880 old_prio=951 new_prio=953
memtoy-1830 [000] 602.052188: update_swap_token_priority:
mm=
ffff880037a45880 old_prio=953 new_prio=955
memtoy-1830 [001] 602.427184: put_swap_token:
token_mm=
ffff880037a45880
zsh-1789 [000] 602.427281: replace_swap_token:
old_token_mm= (null) old_prio=0 new_token_mm=
ffff88015eaf7018
new_prio=2
zsh-1789 [001] 602.433456: update_swap_token_priority:
mm=
ffff88015eaf7018 old_prio=2 new_prio=4
zsh-1789 [000] 602.437613: update_swap_token_priority:
mm=
ffff88015eaf7018 old_prio=4 new_prio=6
zsh-1789 [000] 602.443924: update_swap_token_priority:
mm=
ffff88015eaf7018 old_prio=6 new_prio=8
zsh-1789 [000] 602.451873: update_swap_token_priority:
mm=
ffff88015eaf7018 old_prio=8 new_prio=10
zsh-1789 [001] 602.462639: update_swap_token_priority:
mm=
ffff88015eaf7018 old_prio=10 new_prio=12
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel<riel@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Wed, 15 Jun 2011 22:08:13 +0000 (15:08 -0700)]
vmscan,memcg: memcg aware swap token
Currently, memcg reclaim can disable swap token even if the swap token mm
doesn't belong in its memory cgroup. It's slightly risky. If an admin
creates very small mem-cgroup and silly guy runs contentious heavy memory
pressure workload, every tasks are going to lose swap token and then
system may become unresponsive. That's bad.
This patch adds 'memcg' parameter into disable_swap_token(). and if the
parameter doesn't match swap token, VM doesn't disable it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Rik van Riel<riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 15 Jun 2011 22:08:12 +0000 (15:08 -0700)]
drivers/video/backlight/adp8870_bl.c: add missed props.type conversion
Cc: Michael Hennerich <michael.hennerich@analog.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Hennerich [Wed, 15 Jun 2011 22:08:11 +0000 (15:08 -0700)]
backlight: new driver for the ADP8870 backlight devices
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Wed, 15 Jun 2011 22:08:11 +0000 (15:08 -0700)]
fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP
Commit
a8bef8ff6ea1 ("mm: migration: avoid race between shift_arg_pages()
and rmap_walk() during migration by not migrating temporary stacks")
introduced a BUG_ON() to ensure that VM_STACK_FLAGS and
VM_STACK_INCOMPLETE_SETUP do not overlap. The check is a compile time
one, so BUILD_BUG_ON is more appropriate.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 15 Jun 2011 22:08:10 +0000 (15:08 -0700)]
lib/bitmap.c: fix kernel-doc notation
Fix new kernel-doc warnings in lib/bitmap.c:
Warning(lib/bitmap.c:596): No description found for parameter 'buf'
Warning(lib/bitmap.c:596): Excess function parameter 'bp' description in '__bitmap_parselist'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 15 Jun 2011 22:08:09 +0000 (15:08 -0700)]
mm/memory.c: fix kernel-doc notation
Fix new kernel-doc warnings in mm/memory.c:
Warning(mm/memory.c:1327): No description found for parameter 'tlb'
Warning(mm/memory.c:1327): Excess function parameter 'tlbp' description in 'unmap_vmas'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 15 Jun 2011 22:08:08 +0000 (15:08 -0700)]
mm: remove khugepaged double thp vmstat update with CONFIG_NUMA=n
Johannes noticed the vmstat update is already taken care of by
khugepaged_alloc_hugepage() internally. The only places that are required
to update the vmstat are the callers of alloc_hugepage (callers of
khugepaged_alloc_hugepage aren't).
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Rostedt [Tue, 14 Jun 2011 22:36:25 +0000 (18:36 -0400)]
sched: Check if lowest_mask is initialized in find_lowest_rq()
On system boot up, the lowest_mask is initialized with an
early_initcall(). But RT tasks may wake up on other
early_initcall() callers before the lowest_mask is initialized,
causing a system crash.
Commit "
d72bce0e67 rcu: Cure load woes" was the first commit
to wake up RT tasks in early init. Before this commit this bug
should not happen.
Reported-by: Andrew Theurer <habanero@linux.vnet.ibm.com>
Tested-by: Andrew Theurer <habanero@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110614223657.824872966@goodmis.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Hillf Danton [Tue, 14 Jun 2011 22:36:24 +0000 (18:36 -0400)]
sched: Fix need_resched() when checking peempt
The RT preempt check tests the wrong task if NEED_RESCHED is
set. It currently checks the local CPU task. It is supposed to
check the task that is running on the runqueue we are about to
wake another task on.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110614223657.450239027@goodmis.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Randy Dunlap [Tue, 14 Jun 2011 22:50:11 +0000 (15:50 -0700)]
signal.c: fix kernel-doc notation
Fix kernel-doc warnings in signal.c:
Warning(kernel/signal.c:2374): No description found for parameter 'nset'
Warning(kernel/signal.c:2374): Excess function parameter 'set' description in 'sys_rt_sigprocmask'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 14 Jun 2011 19:45:10 +0000 (12:45 -0700)]
x86 idle: APM requires pm_idle/default_idle unconditionally when a module
[ Also from Ben Hutchings <ben@decadent.org.uk> and Vitaliy Ivanov
<vitalivanov@gmail.com> ]
Commit
06ae40ce073d ("x86 idle: EXPORT_SYMBOL(default_idle, pm_idle)
only when APM demands it") removed the export for pm_idle/default_idle
unless the apm module was modularised and CONFIG_APM_CPU_IDLE was set.
But the apm module uses pm_idle/default_idle unconditionally,
CONFIG_APM_CPU_IDLE only affects the bios idle threshold. Adjust the
export accordingly.
[ Used #ifdef instead of #if defined() as it's shorter, and what both
Ben and Vitaliy used.. Andy, you're out-voted ;) - Linus ]
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 14 Jun 2011 18:28:54 +0000 (11:28 -0700)]
Merge branch 'for-linus-fixes' of git://git./linux/kernel/git/gerg/m68knommu
* 'for-linus-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: use kernel processor defines for conditional optimizations
m68knommu: create config options for CPU classes
m68knommu: fix linker script exported name sections
Linus Torvalds [Tue, 14 Jun 2011 18:28:38 +0000 (11:28 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
TOMOYO: Fix oops in tomoyo_mount_acl().
Linus Torvalds [Tue, 14 Jun 2011 18:25:56 +0000 (11:25 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/egtvedt/avr32-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/avr32-2.6:
avr32, exec: remove redundant set_fs(USER_DS)
avr32: make intc_resume() return void to conform to syscore_ops
avr32: add some more at91 to cpu.h definition
avr32: set CONFIG_CC_OPTIMIZE_FOR_SIZE=y for all defconfigs
avr32/at32ap: fix mapping of platform device id for USART
avr32: fix use of non-existing portnr variable in at32_map_usart()
Linus Torvalds [Tue, 14 Jun 2011 18:25:32 +0000 (11:25 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: Compare only lower 32 bits of framebuffer map offsets
drm/i915: Don't leak in i915_gem_shmem_pread_slow()
drm/radeon/kms: do bounds checking for 3D_LOAD_VBPNTR and bump array limit
drm/radeon/kms: fix mac g5 quirk
x86/uv/x2apic: update for change in pci bridge handling.
alpha, drm: Remove obsolete Alpha support in MGA DRM code
alpha/drm: Cleanup Alpha support in DRM generic code
savage: remove unnecessary if statement
drm/radeon: fix GUI idle IH debug statements
drm/radeon/kms: check modes against max pixel clock
drm: fix fbs in DRM_IOCTL_MODE_GETRESOURCES ioctl
Linus Torvalds [Tue, 14 Jun 2011 18:24:40 +0000 (11:24 -0700)]
Merge git://git./linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] update cifs version to 1.73
[CIFS] trivial cleanup fscache cFYI and cERROR messages
cifs: correctly handle NULL tcon pointer in CIFSTCon
cifs: show sec= option in /proc/mounts
cifs: don't allow cifs_reconnect to exit with NULL socket pointer
CIFS: Fix sparse error
Linus Torvalds [Tue, 14 Jun 2011 18:21:21 +0000 (11:21 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md/raid5: remove unusual use of bio_iovec_idx()
md/raid5: fix FUA request handling in ops_run_io()
md/raid5: fix raid5_set_bi_hw_segments
md:Documentation/md.txt - fix typo
md/bitmap: remove unused fields from struct bitmap
md/bitmap: use proper accessor macro
md: check ->hot_remove_disk when removing disk
md: Using poll /proc/mdstat can monitor the events of adding a spare disks
MD: use is_power_of_2 macro
MD: raid5 do not set fullsync
MD: support initial bitmap creation in-kernel
MD: add sync_super to mddev_t struct
MD: raid1 changes to allow use by device mapper
MD: move thread wakeups into resume
MD: possible typo
MD: no sync IO while suspended
MD: no integrity register if no gendisk
Linus Torvalds [Tue, 14 Jun 2011 18:19:27 +0000 (11:19 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Remove cpufreq_stats sysfs entries on module unload.
MAINTAINERS: Update CPU FREQUENCY patterns
Steve French [Tue, 14 Jun 2011 16:19:54 +0000 (16:19 +0000)]
[CIFS] update cifs version to 1.73
Signed-off-by: Steve French <sfrench@us.ibm.com>
Steve French [Tue, 14 Jun 2011 15:51:18 +0000 (15:51 +0000)]
[CIFS] trivial cleanup fscache cFYI and cERROR messages
... for uniformity and cleaner debug logs.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Mathias Krause [Fri, 10 Jun 2011 13:09:05 +0000 (15:09 +0200)]
avr32, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Hans-Christian Egtvedt [Mon, 6 Jun 2011 16:19:20 +0000 (18:19 +0200)]
avr32: make intc_resume() return void to conform to syscore_ops
This patch removes the unneeded, and now wrong, return 0 from intc_resume() and
lets the function return void instead. This matches the resume callback in
struct syscore_ops.
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Nicolas Ferre [Sat, 14 May 2011 22:23:32 +0000 (00:23 +0200)]
avr32: add some more at91 to cpu.h definition
Somme common drivers will need those at91 cpu_is_xxx() definitions.
Those definitions are already in Linus' tree so if we want to use them
in common drivers, we will need them in AVR32 cpu.h file.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Hans-Christian Egtvedt [Wed, 1 Jun 2011 13:10:49 +0000 (15:10 +0200)]
avr32: set CONFIG_CC_OPTIMIZE_FOR_SIZE=y for all defconfigs
This patch makes sure the kconfig option CC_OPTIMIZE_FOR_SIZE is set to yes for
all default configuration files. This ensures the kernel is optimized for size,
and avoids potential relocation truncated to fit problems.
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Hans-Christian Egtvedt [Wed, 8 Jun 2011 08:47:25 +0000 (10:47 +0200)]
avr32/at32ap: fix mapping of platform device id for USART
This patch will fix the mapping of the platform device id when mapping USART
peripheral ID to UART platform device id. Not setting the platform device id
will in most cases (when you map USART > 0 to UART 0) make the console not
available.
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Hans-Christian Egtvedt [Wed, 1 Jun 2011 09:08:01 +0000 (11:08 +0200)]
avr32: fix use of non-existing portnr variable in at32_map_usart()
This patch fixes the use of the non-existing portnr variable in
at32_map_usart() to use the provided line number instead. Typo was introduced
in commit
2b348e2f82f532e3aff8e0ce9293033b3294c1e0.
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Tetsuo Handa [Mon, 13 Jun 2011 04:49:11 +0000 (13:49 +0900)]
TOMOYO: Fix oops in tomoyo_mount_acl().
In tomoyo_mount_acl() since 2.6.36, kern_path() was called without checking
dev_name != NULL. As a result, an unprivileged user can trigger oops by issuing
mount(NULL, "/", "ext3", 0, NULL) request.
Fix this by checking dev_name != NULL before calling kern_path(dev_name).
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: stable@kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
Namhyung Kim [Tue, 14 Jun 2011 04:23:57 +0000 (14:23 +1000)]
md/raid5: remove unusual use of bio_iovec_idx()
In the bio_for_each_segment loop, bvl always points current
bio_vec, so the same as bio_iovec_idx(, i). Let's get rid of
it.
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Namhyung Kim [Tue, 14 Jun 2011 04:20:19 +0000 (14:20 +1000)]
md/raid5: fix FUA request handling in ops_run_io()
Commit
e9c7469bb4f5 ("md: implment REQ_FLUSH/FUA support")
introduced R5_WantFUA flag and set rw to WRITE_FUA in that case.
However remaining code still checks whether rw is exactly same
as WRITE or not, so FUAed-write ends up with being treated as
READ. Fix it.
This bug has been present since 2.6.37 and the fix is suitable for any
-stable kernel since then. It is not clear why this has not caused
more problems.
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Namhyung Kim [Mon, 13 Jun 2011 05:48:22 +0000 (14:48 +0900)]
md/raid5: fix raid5_set_bi_hw_segments
The @bio->bi_phys_segments consists of active stripes count in the
lower 16 bits and processed stripes count in the upper 16 bits. So
logical-OR operator should be bitwise one.
This bug has been present since 2.6.27 and the fix is suitable for any
-stable kernel since then. Fortunately the bad code is only used on
error paths and is relatively unlikely to be hit.
Cc: stable@kernel.org
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Greg Ungerer [Thu, 2 Jun 2011 06:07:33 +0000 (16:07 +1000)]
m68k: use kernel processor defines for conditional optimizations
Older m68k-linux compilers will include pre-defined symbols that
confuse what processor it is being targeted for. For example gcc-4.1.2
will pre-define __mc68020__ even if you specify the target processor
as -m68000 on the gcc command line. Newer versions of gcc have this
corrected.
In a few places the m68k code uses defined(__mc68020__) for optimizations
that include instructions that are specific to the CPU 68020 and above.
When compiling with older compilers this will be true even when we have
selected to compile for the older 68000 processors.
Switch to using the kernel processor defines, CONFIG_M68020 and friends.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Greg Ungerer [Thu, 2 Jun 2011 05:50:48 +0000 (15:50 +1000)]
m68knommu: create config options for CPU classes
There are 3 families of CPU core types that we support in the m68knommu
architecture branch. They are
. traditional 68000
. CPU32 (a 68020 core derivative without MMU or bitfield instructions)
. ColdFire
It will be useful going forward to have a CONFIG_ option defined for
each type. We already have one for ColdFire (CONFIG_COLDFIRE), so add
for the other 2 families, CONFIG_M68000 and CONFIG_MCPU32.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Greg Ungerer [Thu, 2 Jun 2011 04:09:32 +0000 (14:09 +1000)]
m68knommu: fix linker script exported name sections
The recent commit titled "module: Sort exported symbols" (
f02e8a65)
changed the exported symbol name sections. Bring the m68knommu linker
script into line with those changes - including the sorting of the
symbol names.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Tormod Volden [Mon, 30 May 2011 19:45:43 +0000 (19:45 +0000)]
drm: Compare only lower 32 bits of framebuffer map offsets
Drivers using multiple framebuffers got broken by commit
41c2e75e60200a860a74b7c84a6375c105e7437f which ignored the framebuffer
(or register) map offset when looking for existing maps. The rationale
was that the kernel-userspace ABI is fixed at a 32-bit offset, so the
real offsets could not always be handed over for comparison.
Instead of ignoring the offset we will compare the lower 32 bit. Drivers
using multiple framebuffers should just make sure that the lower 32 bit
are different. The existing drivers in question are practically limited
to 32-bit systems so that should be fine for them.
It is assumed that current drivers always specify a correct framebuffer
map offset, even if this offset was ignored since above commit. So this
patch should not change anything for drivers using only one framebuffer.
Drivers needing multiple framebuffers with 64-bit map offsets will need
to cook up something, for instance keeping an ID in the lower bit which
is to be aligned away when it comes to using the offset.
All of above applies to _DRM_REGISTERS as well.
Signed-off-by: Tormod Volden <debian.tormod@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jesper Juhl [Sun, 12 Jun 2011 20:53:44 +0000 (20:53 +0000)]
drm/i915: Don't leak in i915_gem_shmem_pread_slow()
It seems to me that we are leaking 'user_pages' in
drivers/gpu/drm/i915/i915_gem.c::i915_gem_shmem_pread_slow() if
read_cache_page_gfp() fails.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Fri, 10 Jun 2011 14:41:26 +0000 (14:41 +0000)]
drm/radeon/kms: do bounds checking for 3D_LOAD_VBPNTR and bump array limit
To my knowledge, the limit is 16 on r300.
(the docs don't say what the limit is)
The lack of bounds checking can be abused to do all sorts of things
(from bypassing parts of the CS checker to crashing the kernel).
Bugzilla:
https://bugs.freedesktop.org/show_bug.cgi?id=36745
Cc: stable@kernel.org
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Mon, 6 Jun 2011 16:53:30 +0000 (12:53 -0400)]
drm/radeon/kms: fix mac g5 quirk
Apple uses the same subsystem pci ids for lots of
hardware much of which is wired up differently. In
this case, the G5 imac and the G5 tower.
Only apply the quirk configuration to G5 towers.
Reported-by: Joachim Henke <j-o@users.sourceforge.net>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: Joachim Henke <j-o@users.sourceforge.net>
Cc: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 25 May 2011 04:00:49 +0000 (14:00 +1000)]
x86/uv/x2apic: update for change in pci bridge handling.
When I added
3448a19da479b6bd1e28e2a2be9fa16c6a6feb39
I forgot about the special uv handling code for this, so this
patch fixes it up.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Ingo Molnar
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jay Estabrook [Thu, 9 Jun 2011 22:19:12 +0000 (18:19 -0400)]
alpha, drm: Remove obsolete Alpha support in MGA DRM code
Remove an obsolete Alpha adjustment in the drm for MGA on Alpha.
Signed-off-by: Jay Estabrook <jay.estabrook@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jay Estabrook [Thu, 9 Jun 2011 22:18:39 +0000 (18:18 -0400)]
alpha/drm: Cleanup Alpha support in DRM generic code
Remove an obsolete Alpha adjustment, and modify another,
to go with the current Alpha architecture support.
Signed-off-by: Jay Estabrook <jay.estabrook@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Greg Dietsche [Mon, 13 Jun 2011 14:40:38 +0000 (09:40 -0500)]
savage: remove unnecessary if statement
the code always returns ret regardless, so if(ret) check is unnecessary.
v2: fixed up the spelling.
Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu>
Reviewed-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Mon, 13 Jun 2011 22:29:59 +0000 (15:29 -0700)]
Linux 3.0-rc3
Jeff Layton [Sun, 12 Jun 2011 01:17:10 +0000 (21:17 -0400)]
cifs: correctly handle NULL tcon pointer in CIFSTCon
Long ago (in commit
00e485b0), I added some code to handle share-level
passwords in CIFSTCon. That code ignored the fact that it's legit to
pass in a NULL tcon pointer when connecting to the IPC$ share on the
server.
This wasn't really a problem until recently as we only called CIFSTCon
this way when the server returned -EREMOTE. With the introduction of
commit
c1508ca2 however, it gets called this way on every mount, causing
an oops when share-level security is in effect.
Fix this by simply treating a NULL tcon pointer as if user-level
security were in effect. I'm not aware of any servers that protect the
IPC$ share with a specific password anyway. Also, add a comment to the
top of CIFSTCon to ensure that we don't make the same mistake again.
Cc: <stable@kernel.org>
Reported-by: Martijn Uffing <mp3project@sarijopen.student.utwente.nl>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Jeff Layton [Mon, 13 Jun 2011 15:50:41 +0000 (11:50 -0400)]
cifs: show sec= option in /proc/mounts
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Jeff Layton [Fri, 10 Jun 2011 20:14:57 +0000 (16:14 -0400)]
cifs: don't allow cifs_reconnect to exit with NULL socket pointer
It's possible for the following set of events to happen:
cifsd calls cifs_reconnect which reconnects the socket. A userspace
process then calls cifs_negotiate_protocol to handle the NEGOTIATE and
gets a reply. But, while processing the reply, cifsd calls
cifs_reconnect again. Eventually the GlobalMid_Lock is dropped and the
reply from the earlier NEGOTIATE completes and the tcpStatus is set to
CifsGood. cifs_reconnect then goes through and closes the socket and sets the
pointer to zero, but because the status is now CifsGood, the new socket
is not created and cifs_reconnect exits with the socket pointer set to
NULL.
Fix this by only setting the tcpStatus to CifsGood if the tcpStatus is
CifsNeedNegotiate, and by making sure that generic_ip_connect is always
called at least once in cifs_reconnect.
Note that this is not a perfect fix for this issue. It's still possible
that the NEGOTIATE reply is handled after the socket has been closed and
reconnected. In that case, the socket state will look correct but it no
NEGOTIATE was performed on it be for the wrong socket. In that situation
though the server should just shut down the socket on the next attempted
send, rather than causing the oops that occurs today.
Cc: <stable@kernel.org> # .38.x: fd88ce9: [CIFS] cifs: clarify the meaning of tcpStatus == CifsGood
Reported-and-Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Pavel Shilovsky [Thu, 9 Jun 2011 08:58:53 +0000 (12:58 +0400)]
CIFS: Fix sparse error
cifs_sb_master_tlink was declared as inline, but without a definition.
Remove the declaration and move the definition up.
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Linus Torvalds [Mon, 13 Jun 2011 20:00:53 +0000 (13:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/penberg/slab-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
SLAB: Record actual last user of freed objects.
slub: always align cpu_slab to honor cmpxchg_double requirement
Linus Torvalds [Mon, 13 Jun 2011 18:21:50 +0000 (11:21 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: unwind canceled flock state
ceph: fix ENOENT logic in striped_read
ceph: fix short sync reads from the OSD
ceph: fix sync vs canceled write
ceph: use ihold when we already have an inode ref
Linus Torvalds [Mon, 13 Jun 2011 17:47:04 +0000 (10:47 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rtc: Staticize non-exported __rtc_set_alarm()
rtc: Fix ioctl error path return
ptp: Fix some locking bugs in ptp_read()
ptp: Return -EFAULT on copy_to_user() errors
Linus Torvalds [Mon, 13 Jun 2011 17:45:49 +0000 (10:45 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ftrace: Revert
8ab2b7efd ftrace: Remove unnecessary disabling of irqs
kprobes/trace: Fix kprobe selftest for gcc 4.6
ftrace: Fix possible undefined return code
oprofile, dcookies: Fix possible circular locking dependency
oprofile: Fix locking dependency in sync_start()
oprofile: Free potentially owned tasks in case of errors
oprofile, x86: Add comments to IBS LVT offset initialization