Izik Eidus [Tue, 22 Sep 2009 00:02:07 +0000 (17:02 -0700)]
ksm: change ksm nice level to be 5
ksm should try not to disturb other tasks as much as possible.
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Chris Wright <chrisw@redhat.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Izik Eidus [Tue, 22 Sep 2009 00:02:06 +0000 (17:02 -0700)]
ksm: change copyright message
Adding Hugh Dickins into the authors list.
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Chris Wright <chrisw@redhat.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 22 Sep 2009 00:02:05 +0000 (17:02 -0700)]
ksm: prevent mremap move poisoning
KSM's scan allows for user pages to be COWed or unmapped at any time,
without requiring any notification. But its stable tree does assume that
when it finds a KSM page where it placed a KSM page, then it is the same
KSM page that it placed there.
mremap move could break that assumption: if an area containing a KSM page
was unmapped, then an area containing a different KSM page was moved with
mremap into the place of the original, before KSM's scan came around to
notice. That could then poison a node of the stable tree, so that memcmps
would "lie" and upset the ordering of the tree.
Probably noone will ever need mremap move on a VM_MERGEABLE area; except
that prohibiting it would make trouble for schemes in which we try making
everything VM_MERGEABLE e.g. for testing: an mremap which normally works
would then fail mysteriously.
There's no need to go to any trouble, such as re-sorting KSM's list of
rmap_items to match the new layout: simply unmerge the area to COW all its
KSM pages before moving, but leave VM_MERGEABLE on so that they're
remerged later.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Izik Eidus [Tue, 22 Sep 2009 00:02:03 +0000 (17:02 -0700)]
ksm: Kernel SamePage Merging
Ksm is code that allows merging of identical pages between one or more
applications, in a way invisible to the applications that use it. Pages
that are merged are marked as read-only, then COWed when any application
tries to change them.
Whereas fork() allows sharing anonymous pages between parent and child,
ksm can share anonymous pages between unrelated processes.
Ksm works by walking over the memory pages of the applications it scans,
in order to find identical pages. It uses two sorted data structures,
called the stable and unstable trees, to locate identical pages in an
effective way.
When ksm finds two identical pages, it marks them as readonly and merges
them into a single page. After the pages have been marked as readonly and
merged into one, Linux treats them as normal copy-on-write pages, copying
to a fresh anonymous page if write access is required later.
Ksm scans and merges anonymous pages only in those memory areas that have
been registered with it by madvise(addr, length, MADV_MERGEABLE).
The ksm scanner is controlled by sysfs files in /sys/kernel/mm/ksm/:
max_kernel_pages - the maximum number of unswappable kernel pages
which may be allocated by ksm (0 for unlimited).
kernel_pages_allocated - how many ksm pages are currently allocated,
sharing identical content between different
processes (pages unswappable in this release).
pages_shared - how many pages have been saved by sharing with ksm pages
(kernel_pages_allocated being excluded from this count).
pages_to_scan - how many pages ksm should scan before sleeping.
sleep_millisecs - how many milliseconds ksm should sleep between scans.
run - write 0 to disable ksm, read 0 while ksm is disabled (default),
write 1 to run ksm, read 1 while ksm is running,
write 2 to disable ksm and unmerge all its pages.
Includes contributions by Andrea Arcangeli Chris Wright and Hugh Dickins.
[hugh.dickins@tiscali.co.uk: fix rare page leak]
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 22 Sep 2009 00:02:01 +0000 (17:02 -0700)]
ksm: identify PageKsm pages
KSM will need to identify its kernel merged pages unambiguously, and
/proc/kpageflags will probably like to do so too.
Since KSM will only be substituting anonymous pages, statistics are best
preserved by making a PageKsm page a special PageAnon page: one with no
anon_vma.
But KSM then needs its own page_add_ksm_rmap() - keep it in ksm.h near
PageKsm; and do_wp_page() must COW them, unlike singly mapped PageAnons.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 22 Sep 2009 00:01:59 +0000 (17:01 -0700)]
ksm: no debug in page_dup_rmap()
page_dup_rmap(), used on each mapped page when forking, was originally
just an inline atomic_inc of mapcount. 2.6.22 added CONFIG_DEBUG_VM
out-of-line checks to it, which would need to be ever-so-slightly
complicated to allow for the PageKsm() we're about to define.
But I think these checks never caught anything. And if it's coding errors
we're worried about, such checks should be in page_remove_rmap() too, not
just when forking; whereas if it's pagetable corruption we're worried
about, then they shouldn't be limited to CONFIG_DEBUG_VM.
Oh, just revert page_dup_rmap() to an inline atomic_inc of mapcount.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 22 Sep 2009 00:01:57 +0000 (17:01 -0700)]
ksm: the mm interface to ksm
This patch presents the mm interface to a dummy version of ksm.c, for
better scrutiny of that interface: the real ksm.c follows later.
When CONFIG_KSM is not set, madvise(2) reject MADV_MERGEABLE and
MADV_UNMERGEABLE with EINVAL, since that seems more helpful than
pretending that they can be serviced. But when CONFIG_KSM=y, accept them
even if KSM is not currently running, and even on areas which KSM will not
touch (e.g. hugetlb or shared file or special driver mappings).
Like other madvices, report ENOMEM despite success if any area in the
range is unmapped, and use EAGAIN to report out of memory.
Define vma flag VM_MERGEABLE to identify an area on which KSM may try
merging pages: leave it to ksm_madvise() to decide whether to set it.
Define mm flag MMF_VM_MERGEABLE to identify an mm which might contain
VM_MERGEABLE areas, to minimize callouts when forking or exiting.
Based upon earlier patches by Chris Wright and Izik Eidus.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 22 Sep 2009 00:01:53 +0000 (17:01 -0700)]
ksm: define MADV_MERGEABLE and MADV_UNMERGEABLE
The out-of-tree KSM used ioctls on fds cloned from /dev/ksm to register a
memory area for merging: we prefer now to use an madvise(2) interface.
This patch just defines MADV_MERGEABLE (to tell KSM it may merge pages in
this area found identical to pages in other mergeable areas) and
MADV_UNMERGEABLE (to undo that).
Most architectures use asm-generic, but alpha, mips, parisc, xtensa need
their own definitions: included here for mmotm convenience, but we'll
probably want to split this and feed pieces to arch maintainers.
Based upon earlier patches by Chris Wright and Izik Eidus.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 22 Sep 2009 00:01:52 +0000 (17:01 -0700)]
ksm: first tidy up madvise_vma()
madvise.c has several levels of switch statements, what to do in which?
Move MADV_DOFORK code down from madvise_vma() to madvise_behavior(), so
madvise_vma() can be a simple router, to madvise_behavior() by default.
vma->vm_flags is an unsigned long so use the same type for new_flags. Add
missing comment lines to describe MADV_DONTFORK and MADV_DOFORK.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Izik Eidus [Tue, 22 Sep 2009 00:01:51 +0000 (17:01 -0700)]
ksm: add mmu_notifier set_pte_at_notify()
KSM is a linux driver that allows dynamicly sharing identical memory pages
between one or more processes.
Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created. Memory is
periodically scanned; identical pages are identified and merged.
The sharing is made in a transparent way to the processes that use it.
Ksm is highly important for hypervisors (kvm), where in production
enviorments there might be many copys of the same data data among the host
memory. This kind of data can be: similar kernels, librarys, cache, and
so on.
Even that ksm was wrote for kvm, any userspace application that want to
use it to share its data can try it.
Ksm may be useful for any application that might have similar (page
aligment) data strctures among the memory, ksm will find this data merge
it to one copy, and even if it will be changed and thereforew copy on
writed, ksm will merge it again as soon as it will be identical again.
Another reason to consider using ksm is the fact that it might simplify
alot the userspace code of application that want to use shared private
data, instead that the application will mange shared area, ksm will do
this for the application, and even write to this data will be allowed
without any synchinization acts from the application.
Ksm was designed to be a loadable module that doesn't change the VM code
of linux.
This patch:
The set_pte_at_notify() macro allows setting a pte in the shadow page
table directly, instead of flushing the shadow page table entry and then
getting vmexit to set it. It uses a new change_pte() callback to do so.
set_pte_at_notify() is an optimization for kvm, and other users of
mmu_notifiers, for COW pages. It is useful for kvm when ksm is used,
because it allows kvm not to have to receive vmexit and only then map the
ksm page into the shadow page table, but instead map it directly at the
same time as Linux maps the page into the host page table.
Users of mmu_notifiers who don't implement new mmu_notifier_change_pte()
callback will just receive the mmu_notifier_invalidate_page() callback.
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Tue, 22 Sep 2009 00:01:48 +0000 (17:01 -0700)]
mm: perform non-atomic test-clear of PG_mlocked on free
By the time PG_mlocked is cleared in the page freeing path, nobody else is
looking at our page->flags anymore.
It is thus safe to make the test-and-clear non-atomic and thereby removing
an unnecessary and expensive operation from a hotpath.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Figo.zhang [Tue, 22 Sep 2009 00:01:47 +0000 (17:01 -0700)]
vmalloc.c: fix double error checking
There is no need for double error checking.
Signed-off-by: Figo.zhang <figo1802@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 22 Sep 2009 00:01:47 +0000 (17:01 -0700)]
mm: add gfp mask checking for __get_free_pages()
__get_free_pages() with __GFP_HIGHMEM is not safe because the return
address cannot represent a highmem page. get_zeroed_page() already has
such a debug checking.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:46 +0000 (17:01 -0700)]
vmscan: kill unnecessary prefetch
The pages in the list passed move_active_pages_to_lru() are already
touched by shrink_active_list(). IOW the prefetch in
move_active_pages_to_lru() don't populate any cache. it's pointless.
This patch remove it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:45 +0000 (17:01 -0700)]
vmscan: kill unnecessary page flag test
The page_lru() already evaluate PageActive() and PageSwapBacked(). We
don't need to re-evaluate it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:44 +0000 (17:01 -0700)]
vmscan: move ClearPageActive from move_active_pages() to shrink_active_list()
The move_active_pages_to_lru() function is called under irq disabled and
ClearPageActive() doesn't need irq disabling.
Then, this patch move it into shrink_active_list().
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Tue, 22 Sep 2009 00:01:43 +0000 (17:01 -0700)]
vmscan: don't attempt to reclaim anon page in lumpy reclaim when no swap space is available
The VM already avoids attempting to reclaim anon pages in various places,
But it doesn't avoid it for lumpy reclaim.
It shuffles lru list unnecessary so that it is pointless.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Tue, 22 Sep 2009 00:01:42 +0000 (17:01 -0700)]
mm: count only reclaimable lru pages
global_lru_pages() / zone_lru_pages() can be used in two ways:
- to estimate max reclaimable pages in determine_dirtyable_memory()
- to calculate the slab scan ratio
When swap is full or not present, the anon lru lists are not reclaimable
and also won't be scanned. So the anon pages shall not be counted in both
usage scenarios. Also rename to _reclaimable_pages: now they are counting
the possibly reclaimable lru pages.
It can greatly (and correctly) increase the slab scan rate under high
memory pressure (when most file pages have been reclaimed and swap is
full/absent), thus reduce false OOM kills.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "Li, Ming Chun" <macli@brc.ubc.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 22 Sep 2009 00:01:40 +0000 (17:01 -0700)]
vm: document that setting vfs_cache_pressure to 0 isn't a good idea
Reported-by: Christian Thaeter <ct@pipapo.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:39 +0000 (17:01 -0700)]
mm: remove __{add,sub}_zone_page_state()
__add_zone_page_state() and __sub_zone_page_state() are unused.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rik van Riel [Tue, 22 Sep 2009 00:01:38 +0000 (17:01 -0700)]
vmscan: throttle direct reclaim when too many pages are isolated already
When way too many processes go into direct reclaim, it is possible for all
of the pages to be taken off the LRU. One result of this is that the next
process in the page reclaim code thinks there are no reclaimable pages
left and triggers an out of memory kill.
One solution to this problem is to never let so many processes into the
page reclaim path that the entire LRU is emptied. Limiting the system to
only having half of each inactive list isolated for reclaim should be
safe.
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:37 +0000 (17:01 -0700)]
mm: vmstat: add isolate pages
If the system is running a heavy load of processes then concurrent reclaim
can isolate a large number of pages from the LRU. /proc/vmstat and the
output generated for an OOM do not show how many pages were isolated.
This has been observed during process fork bomb testing (mstctl11 in LTP).
This patch shows the information about isolated pages.
Reproduced via:
-----------------------
% ./hackbench 140 process 1000
=> OOM occur
active_anon:146 inactive_anon:0 isolated_anon:49245
active_file:79 inactive_file:18 isolated_file:113
unevictable:0 dirty:0 writeback:0 unstable:0 buffer:39
free:370 slab_reclaimable:309 slab_unreclaimable:5492
mapped:53 shmem:15 pagetables:28140 bounce:0
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:36 +0000 (17:01 -0700)]
mm: shrink_inactive_list() nr_scan accounting fix fix
If sc->isolate_pages() return 0, we don't need to call shrink_page_list().
In past days, shrink_inactive_list() handled it properly.
But commit
fb8d14e1 (three years ago commit!) breaked it. current
shrink_inactive_list() always call shrink_page_list() although
isolate_pages() return 0.
This patch restore proper return value check.
Requirements:
o "nr_taken == 0" condition should stay before calling shrink_page_list().
o "nr_taken == 0" condition should stay after nr_scan related statistics
modification.
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:35 +0000 (17:01 -0700)]
mm: rename pgmoved variable in shrink_active_list()
Currently the pgmoved variable has two meanings. It causes harder
reviewing. This patch separates it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 22 Sep 2009 00:01:34 +0000 (17:01 -0700)]
mm: update alloc_flags after oom killer has been called
It is possible for the oom killer to select current as the task to kill.
When this happens, alloc_flags needs to be updated accordingly to set
ALLOC_NO_WATERMARKS so the subsequent allocation attempt may use memory
reserves as the result of its thread having TIF_MEMDIE set if the
allocation is not __GFP_NOMEMALLOC.
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:33 +0000 (17:01 -0700)]
mm: oom analysis: add shmem vmstat
Recently we encountered OOM problems due to memory use of the GEM cache.
Generally a large amuont of Shmem/Tmpfs pages tend to create a memory
shortage problem.
We often use the following calculation to determine the amount of shmem
pages:
shmem = NR_ACTIVE_ANON + NR_INACTIVE_ANON - NR_ANON_PAGES
however the expression does not consider isolated and mlocked pages.
This patch adds explicit accounting for pages used by shmem and tmpfs.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:32 +0000 (17:01 -0700)]
mm: oom analysis: Show kernel stack usage in /proc/meminfo and OOM log output
The amount of memory allocated to kernel stacks can become significant and
cause OOM conditions. However, we do not display the amount of memory
consumed by stacks.
Add code to display the amount of memory used for stacks in /proc/meminfo.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:31 +0000 (17:01 -0700)]
mm: oom analysis: add buffer cache information to show_free_areas()
It is often useful to know the statistics for all pages that are handled
like page cache pages when looking at OOM log output.
Therefore show_free_areas() should also display buffer cache statistics.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:30 +0000 (17:01 -0700)]
mm: oom analysis: add per-zone statistics to show_free_areas()
show_free_areas() displays only a limited amount of zone counters. This
patch includes additional counters in the display to allow easier
debugging. This may be especially useful if an OOM is due to running out
of DMA memory.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andi Kleen [Tue, 22 Sep 2009 00:01:29 +0000 (17:01 -0700)]
Documentation/memory.txt: remove some very outdated recommendations
Remove some very outdated recommendations in Documentation/memory.txt
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:29 +0000 (17:01 -0700)]
mm: show_free_areas(): display slab pages in two separate fields
If an OOM happens, we really want to know the number of remaining
reclaimable pages. So the reclaimable slab and unreclaimable slab fields
should not be combined for display.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 22 Sep 2009 00:01:28 +0000 (17:01 -0700)]
mm: clean up page_remove_rmap()
page_remove_rmap() has multiple PageAnon() tests and it has deep nesting.
Clean this up.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Tue, 22 Sep 2009 00:01:25 +0000 (17:01 -0700)]
hugetlb: restore interleaving of bootmem huge pages
I noticed that alloc_bootmem_huge_page() will only advance to the next
node on failure to allocate a huge page, potentially filling nodes with
huge-pages. I asked about this on linux-mm and linux-numa, cc'ing the
usual huge page suspects.
Mel Gorman responded:
I strongly suspect that the same node being used until allocation
failure instead of round-robin is an oversight and not deliberate
at all. It appears to be a side-effect of a fix made way back in
commit
63b4613c3f0d4b724ba259dc6c201bb68b884e1a ["hugetlb: fix
hugepage allocation with memoryless nodes"]. Prior to that patch
it looked like allocations would always round-robin even when
allocation was successful.
This patch--factored out of my "hugetlb mempolicy" series--moves the
advance of the hstate next node from which to allocate up before the test
for success of the attempted allocation.
Note that alloc_bootmem_huge_page() is only used for order > MAX_ORDER
huge pages.
I'll post a separate patch for mainline/stable, as the above mentioned
"balance freeing" series renamed the next node to alloc function.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andy Whitcroft <apw@canonical.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Tue, 22 Sep 2009 00:01:24 +0000 (17:01 -0700)]
hugetlb: clean up and update huge pages documentation
Attempt to clarify huge page administration and usage, and updates the
doucmentation to mention the balancing of huge pages across nodes when
allocating and freeing.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Tue, 22 Sep 2009 00:01:23 +0000 (17:01 -0700)]
hugetlb: use free_pool_huge_page() to return unused surplus pages
Use the [modified] free_pool_huge_page() function to return unused
surplus pages. This will help keep huge pages balanced across nodes
between freeing of unused surplus pages and freeing of persistent huge
pages [from set_max_huge_pages] by using the same node id "cursor". It
also eliminates some code duplication.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Tue, 22 Sep 2009 00:01:22 +0000 (17:01 -0700)]
hugetlb: balance freeing of huge pages across nodes
Free huges pages from nodes in round robin fashion in an attempt to keep
[persistent a.k.a static] hugepages balanced across nodes
New function free_pool_huge_page() is modeled on and performs roughly the
inverse of alloc_fresh_huge_page(). Replaces dequeue_huge_page() which
now has no callers, so this patch removes it.
Helper function hstate_next_node_to_free() uses new hstate member
next_to_free_nid to distribute "frees" across all nodes with huge pages.
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 22 Sep 2009 00:01:20 +0000 (17:01 -0700)]
page_alloc: fix kernel-doc warning
Ummark function as having kernel-doc notation, fixing the kernel-doc
warning.
Warning(mm/page_alloc.c:4519): No description found for parameter 'zone'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Tue, 22 Sep 2009 00:01:19 +0000 (17:01 -0700)]
memory hotplug: migrate swap cache page
In test, some pages in swap-cache can't be migrated, as they aren't rmap.
unmap_and_move() ignores swap-cache page which is just read in and hasn't
rmap (see the comments in the code), but swap_aops provides .migratepage.
Better to migrate such pages instead of ignore them.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Yakui Zhao <yakui.zhao@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Tue, 22 Sep 2009 00:01:19 +0000 (17:01 -0700)]
memory hotplug: alloc page from other node in memory online
To initialize hotadded node, some pages are allocated. At that time, the
node hasn't memory, this makes the allocation always fail. In such case,
let's allocate pages from other nodes.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Yakui Zhao <yakui.zhao@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Tue, 22 Sep 2009 00:01:18 +0000 (17:01 -0700)]
memory hotplug: make pages from movable zone always isolatable
Pages on movable zone have two types, MIGRATE_MOVABLE and MIGRATE_RESERVE,
both them can be movable, because only movable memory allocation can get
pages from movable zone. This makes pages in movable zone always be able
to migrate.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Yakui Zhao <yakui.zhao@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Tue, 22 Sep 2009 00:01:17 +0000 (17:01 -0700)]
memory hotplug: exclude isolated page from pco page alloc
Pages marked as isolated should not be allocated again. If such pages
reside in pcp list, they can be allocated too, so there is a ping-pong
memory offline frees some pages to pcp list and the pages get allocated
and then memory offline frees them again, this loop will happen again and
again.
This should have no impact in normal code path, because in normal code
path, pages in pcp list aren't isolated, and below loop will break in the
first entry.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Yakui Zhao <yakui.zhao@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Tue, 22 Sep 2009 00:01:16 +0000 (17:01 -0700)]
memory hotplug: update zone pcp at memory online
In my test, 128M memory is hot added, but zone's pcp batch is 0, which is
an obvious error. When pages are onlined, zone pcp should be updated
accordingly.
[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Yakui Zhao <yakui.zhao@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 22 Sep 2009 00:01:15 +0000 (17:01 -0700)]
mm: remove obsoleted alloc_pages cpuset comment
When a cpuset's nodemask is updated, all attached tasks have their cached
task->mems_allowed updated by a heap instead of requiring an explicit call
to cpuset_update_task_memory_state(), which has since been removed in
58568d2a8215cb6f55caf2332017d7bdff954e1c ("cpuset,mm: update tasks'
mems_allowed in time").
Remove the obsoleted comment from the page allocator.
Cc: Paul Menage <menage@google.com>
Acked-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Tue, 22 Sep 2009 00:01:13 +0000 (17:01 -0700)]
mm: make swap token dummies static inlines
Make use of the compiler's typechecking on !CONFIG_SWAP as well.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 22 Sep 2009 00:01:13 +0000 (17:01 -0700)]
const: make block_device_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 22 Sep 2009 00:01:12 +0000 (17:01 -0700)]
const: make lock_manager_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 22 Sep 2009 00:01:11 +0000 (17:01 -0700)]
const: make file_lock_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 22 Sep 2009 00:01:11 +0000 (17:01 -0700)]
const: mark remaining inode_operations as const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 22 Sep 2009 00:01:10 +0000 (17:01 -0700)]
const: mark remaining address_space_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 22 Sep 2009 00:01:10 +0000 (17:01 -0700)]
const: mark remaining export_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 22 Sep 2009 00:01:09 +0000 (17:01 -0700)]
const: mark remaining super_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 22 Sep 2009 00:01:09 +0000 (17:01 -0700)]
const: make struct super_block::s_qcop const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 22 Sep 2009 00:01:08 +0000 (17:01 -0700)]
const: make struct super_block::dq_op const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 22 Sep 2009 00:01:07 +0000 (17:01 -0700)]
drivers/mfd/ab3100-core.c: fix powerpc build error
drivers/mfd/ab3100-core.c:647: error: ab3100_init_settings causes a section type conflict
Cc: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Tue, 22 Sep 2009 00:01:06 +0000 (17:01 -0700)]
proc: document `guest' column in /proc/stat
We added a new column in cpuX lines of /proc/stat, to show the amount of
time spent by a cpu servicing a guest, without updating
Documentation/filesystems/proc.txt
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 22 Sep 2009 00:01:06 +0000 (17:01 -0700)]
fs: make sure data stored into inode is properly seen before unlocking new inode
In theory it could happen that on one CPU we initialize a new inode but
clearing of I_NEW | I_LOCK gets reordered before some of the
initialization. Thus on another CPU we return not fully uptodate inode
from iget_locked().
This seems to fix a corruption issue on ext3 mounted over NFS.
[akpm@linux-foundation.org: add some commentary]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pierre Ossman [Tue, 22 Sep 2009 00:00:59 +0000 (17:00 -0700)]
sdhci: orphan driver and list
Remove myself as maintainer from the sdhci driver and steer people
towards the new MMC list for discussing it.
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 22 Sep 2009 00:00:58 +0000 (17:00 -0700)]
drivers/media/dvb/pt1/pt1.c needs vmalloc.h
alpha:
drivers/media/dvb/pt1/pt1.c: In function 'pt1_cleanup_tables':
drivers/media/dvb/pt1/pt1.c:422: error: implicit declaration of function 'vfree'
drivers/media/dvb/pt1/pt1.c: In function 'pt1_init_tables':
drivers/media/dvb/pt1/pt1.c:431: error: implicit declaration of function 'vmalloc'
drivers/media/dvb/pt1/pt1.c:431: warning: assignment makes pointer from integer without a cast
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 21 Sep 2009 16:15:07 +0000 (09:15 -0700)]
Merge branch 'perfcounters-rename-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-rename-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Tidy up after the big rename
perf: Do the big rename: Performance Counters -> Performance Events
perf_counter: Rename 'event' to event_id/hw_event
perf_counter: Rename list_entry -> group_entry, counter_list -> group_list
Manually resolved some fairly trivial conflicts with the tracing tree in
include/trace/ftrace.h and kernel/trace/trace_syscalls.c.
Linus Torvalds [Mon, 21 Sep 2009 16:06:52 +0000 (09:06 -0700)]
Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Fix whitespace inconsistencies
rcu: Fix thinko, actually initialize full tree
rcu: Apply results of code inspection of kernel/rcutree_plugin.h
rcu: Add WARN_ON_ONCE() consistency checks covering state transitions
rcu: Fix synchronize_rcu() for TREE_PREEMPT_RCU
rcu: Simplify rcu_read_unlock_special() quiescent-state accounting
rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods
rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down
rcutorture: Occasionally delay readers enough to make RCU force_quiescent_state
rcu: Initialize multi-level RCU grace periods holding locks
rcu: Need to update rnp->gpnum if preemptable RCU is to be reliable
Linus Torvalds [Mon, 21 Sep 2009 16:06:31 +0000 (09:06 -0700)]
Merge branch 'perfcounters-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter, powerpc, sparc: Fix compilation after perf_counter_overflow() change
perf_counter: x86: Fix PMU resource leak
perf util: SVG performance improvements
perf util: Make the timechart SVG width dynamic
perf timechart: Show the duration of scheduler delays in the SVG
perf timechart: Show the name of the waker/wakee in timechart
Linus Torvalds [Mon, 21 Sep 2009 16:06:17 +0000 (09:06 -0700)]
Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Simplify sys_sched_rr_get_interval() system call
sched: Fix potential NULL derference of doms_cur
sched: Fix raciness in runqueue_is_locked()
sched: Re-add lost cpu_allowed check to sched_fair.c::select_task_rq_fair()
sched: Remove unneeded indentation in sched_fair.c::place_entity()
Linus Torvalds [Mon, 21 Sep 2009 16:05:47 +0000 (09:05 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kernel/profile.c: Switch /proc/irq/prof_cpu_mask to seq_file
tracing: Export trace_profile_buf symbols
tracing/events: use list_for_entry_continue
tracing: remove max_tracer_type_len
function-graph: use ftrace_graph_funcs directly
tracing: Remove markers
tracing: Allocate the ftrace event profile buffer dynamically
tracing: Factorize the events profile accounting
Linus Torvalds [Mon, 21 Sep 2009 16:05:19 +0000 (09:05 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Print the hypervisor returned tsc_khz during boot
x86: Correct segment permission flags in 64-bit linker script
x86: cpuinit-annotate SMP boot trampolines properly
x86: Increase timeout for EHCI debug port reset completion in early printk
x86: Fix uaccess_32.h typo
x86: Trivial whitespace cleanups
x86, apic: Fix missed handling of discrete apics
x86/i386: Remove duplicated #include
x86, mtrr: Convert loop to a while based construct, avoid naked semicolon
Revert 'x86: Fix system crash when loading with "reservetop" parameter'
x86, mce: Fix compile warning in case of CONFIG_SMP=n
x86, apic: Use logical flat on intel with <= 8 logical cpus
x86: SGI UV: Map MMIO-High memory range
x86: SGI UV: Add volatile semantics to macros that access chipset registers
x86: SGI UV: Fix IPI macros
x86: apic: Convert BUG() to BUG_ON()
x86: Remove final bits of CONFIG_X86_OLD_MCE
Linus Torvalds [Mon, 21 Sep 2009 16:04:30 +0000 (09:04 -0700)]
Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
nfs: initialize the backing_dev_info when creating the server
writeback: make balance_dirty_pages() gradually back more off
writeback: don't use schedule_timeout() without setting runstate
nfs: nfs_kill_super() should call bdi_unregister() after killing super
Linus Torvalds [Mon, 21 Sep 2009 16:03:10 +0000 (09:03 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (222 commits)
V4L/DVB (13033): pt1: Don't use a deprecated DMA_BIT_MASK macro
V4L/DVB (13029): radio-si4713: remove #include <linux/version.h>
V4L/DVB (13027): go7007: convert printks to v4l2_info
V4L/DVB (13026): s2250-board: Implement brightness and contrast controls
V4L/DVB (13025): s2250-board: Fix memory leaks
V4L/DVB (13024): go7007: Implement vidioc_g_std and vidioc_querystd
V4L/DVB (13023): go7007: Merge struct gofh and go declarations
V4L/DVB (13022): go7007: Fix mpeg controls
V4L/DVB (13021): go7007: Fix whitespace and line lengths
V4L/DVB (13020): go7007: Updates to Kconfig and Makefile
V4L/DVB (13019): video: initial support for ADV7180
V4L/DVB (13018): kzalloc failure ignored in au8522_probe()
V4L/DVB (13017): gspca: kmalloc failure ignored in sd_start()
V4L/DVB (13016): kmalloc failure ignored in lgdt3304_attach() and s921_attach()
V4L/DVB (13015): kmalloc failure ignored in m920x_firmware_download()
V4L/DVB (13014): Add support for Compro VideoMate E800 (DVB-T part only)
V4L/DVB (13013): FM TX: si4713: Kconfig: Fixed two typos.
V4L/DVB (13012): uvc: introduce missing kfree
V4L/DVB (13011): Change tuner type of BeholdTV cards
V4L/DVB (13009): gspca - stv06xx-hdcs: Reduce exposure range
...
Linus Torvalds [Mon, 21 Sep 2009 15:15:18 +0000 (08:15 -0700)]
Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: fix debugging dump
UBIFS: improve lprops dump
UBIFS: various minor commentary fixes
UBIFS: improve journal head debugging prints
UBIFS: define journal head numbers in ubifs-media.h
UBIFS: amend commentaries
UBIFS: check ubifs_scan error codes better
UBIFS: do not print scary error messages needlessly
UBIFS: add inode size debugging check
UBIFS: constify file and inode operations
UBIFS: remove unneeded call from ubifs_sync_fs
UBIFS: kill BKL
UBIFS: remove unused functions
UBIFS: suppress compilation warning
Linus Torvalds [Mon, 21 Sep 2009 15:13:55 +0000 (08:13 -0700)]
Merge branch 'linux-next' of git://git.infradead.org/ubi-2.6
* 'linux-next' of git://git.infradead.org/ubi-2.6:
UBI: improve NOR flash erasure quirk
UBI: introduce flash dump helper
UBI: eliminate possible undefined behaviour
UBI: print a warning if too many PEBs are corrupted
UBI: amend NOR flash pre-erase quirk
UBI: print a message if ECH is corrupted and VIDH is ok
Linus Torvalds [Mon, 21 Sep 2009 15:10:09 +0000 (08:10 -0700)]
Merge branch 'drm-linus' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (133 commits)
drm/vgaarb: add VGA arbitration support to the drm and kms.
drm/radeon: some r420s have a CP race with the DMA engine.
drm/radeon/r600/kms: rv670 is not DCE3
drm/radeon/kms: r420 idle after programming GA_ENHANCE
drm/radeon/kms: more fixes to rv770 suspend/resume path.
drm/radeon/kms: more alignment for rv770.c with r600.c
drm/radeon/kms: rv770 blit init called too late.
drm/radeon/kms: move around new init path code to avoid posting at init
drm/radeon/r600: fix some issues with suspend/resume.
drm/radeon/kms: disable VGA rendering engine before taking over VRAM
drm/radeon/kms: Move radeon_get_clock_info() call out of radeon_clocks_init().
drm/radeon/kms: add initial connector properties
drm/radeon/kms: Use surfaces for scanout / cursor byte swapping on big endian.
drm/radeon/kms: don't fail if we fail to init GPU acceleration
drm/r600/kms: fixup number of loops per blit calculation.
drm/radeon/kms: reprogram format in set base.
drm/radeon: avivo chips have no separate int bit for display
drm/radeon/r600: don't do interrupts
drm: fix _DRM_GEM addmap error message
drm: update crtc x/y when only fb changes
...
Fixed up trivial conflicts in firmware/Makefile due to network driver
(cxgb3) and drm (mga/r128/radeon) firmware being listed next to each
other.
Ingo Molnar [Mon, 21 Sep 2009 06:56:58 +0000 (08:56 +0200)]
Driver-Core: fix devnode callbacks for dabusb and industrialio
The build of the dabusb driver broke:
drivers/media/video/dabusb.c:758: error: unknown field 'nodename' specified in initializer
drivers/media/video/dabusb.c:758: warning: initialization from incompatible pointer type
make[3]: *** wait: No child processes. Stop.
Due to this commit:
e454cea: Driver-Core: extend devnode callbacks to provide permissions
Missing the dabusb driver's dabusb_nodename() callback.
Similar issues with the iio/industrialio driver in staging, pointed out
and patched by Jean Delvare.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Industrialio-parts-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jens Axboe [Mon, 21 Sep 2009 07:59:39 +0000 (09:59 +0200)]
nfs: initialize the backing_dev_info when creating the server
NFS may free the server structure without ever having used the
bdi, so we either need to flag the bdi as being uninitialized or
initialize it up front. This does the latter.
This fixes a crash with mounting more than one NFS file system,
should people ever need that kind of obscure NFS functionality.
Tested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Thu, 17 Sep 2009 17:59:14 +0000 (19:59 +0200)]
writeback: make balance_dirty_pages() gradually back more off
Currently it just sleeps for a very short time, just 1 jiffy. If
we keep looping in there, continually delay for a little longer
of up to 100msec in total. That was the old limit for congestion
wait.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Thu, 17 Sep 2009 17:56:01 +0000 (19:56 +0200)]
writeback: don't use schedule_timeout() without setting runstate
Just use schedule_timeout_interruptible(), saves a call to
set_current_state().
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Thu, 17 Sep 2009 12:51:44 +0000 (14:51 +0200)]
nfs: nfs_kill_super() should call bdi_unregister() after killing super
Otherwise we could be attempting to flush data for a writeback
thread and bdi that have already disappeared.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Ingo Molnar [Mon, 21 Sep 2009 10:20:38 +0000 (12:20 +0200)]
perf: Tidy up after the big rename
- provide compatibility Kconfig entry for existing PERF_COUNTERS .config's
- provide courtesy copy of old perf_counter.h, for user-space projects
- small indentation fixups
- fix up MAINTAINERS
- fix small x86 printout fallout
- fix up small PowerPC comment fallout (use 'counter' as in register)
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 21 Sep 2009 10:02:48 +0000 (12:02 +0200)]
perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 21 Sep 2009 09:31:35 +0000 (11:31 +0200)]
perf_counter: Rename 'event' to event_id/hw_event
In preparation to the renames, to avoid a namespace clash.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 21 Sep 2009 08:18:27 +0000 (10:18 +0200)]
perf_counter: Rename list_entry -> group_entry, counter_list -> group_list
This is in preparation of the big rename, but also makes sense
in a standalone way: 'list_entry' is a bad name as we already
have a list_entry() in list.h.
Also, the 'counter list' is too vague, it doesnt tell us the
purpose of that list.
Clarify these names to show that it's all about the group
hiearchy.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 21 Sep 2009 10:51:27 +0000 (12:51 +0200)]
Merge branch 'linus' into perfcounters/rename
Merge reason: pull in all the latest code before doing the rename.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Artem Bityutskiy [Mon, 21 Sep 2009 09:09:22 +0000 (12:09 +0300)]
Merge branch 'master' of git://git./linux/kernel/git/torvalds/linux-2.6 into linux-next
Conflicts:
fs/ubifs/super.c
Merge the upstream tree in order to resolve a conflict with the
per-bdi writeback changes from the linux-2.6-block tree.
Peter Williams [Mon, 21 Sep 2009 01:31:53 +0000 (01:31 +0000)]
sched: Simplify sys_sched_rr_get_interval() system call
By removing the need for it to know details of scheduling classes.
This allows PlugSched to define orthogonal scheduling classes.
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <
06d1b89ee15a0eef82d7.
1253496713@mudlark.pw.nest>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul Mackerras [Mon, 21 Sep 2009 06:44:32 +0000 (16:44 +1000)]
perf_counter, powerpc, sparc: Fix compilation after perf_counter_overflow() change
Commit
5622f295 ("x86, perf_counter, bts: Optimize BTS overflow
handling") removed the regs field from struct perf_sample_data and
added a regs parameter to perf_counter_overflow(). This breaks the
build on powerpc (and Sparc) as reported by Sachin Sant:
arch/powerpc/kernel/perf_counter.c: In function 'record_and_restart':
arch/powerpc/kernel/perf_counter.c:1165: error: unknown field 'regs' specified in initializer
This adjusts arch/powerpc/kernel/perf_counter.c to correspond with the
new struct perf_sample_data and perf_counter_overflow().
[ v2: also fix Sparc, Markus Metzger <markus.t.metzger@intel.com> ]
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: benh@kernel.crashing.org
Cc: linuxppc-dev@ozlabs.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19127.8400.376239.586120@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dave Airlie [Mon, 21 Sep 2009 04:33:58 +0000 (14:33 +1000)]
drm/vgaarb: add VGA arbitration support to the drm and kms.
VGA arb requires DRM support for non-kms drivers, to turn on/off
irqs when disabling the mem/io regions.
VGA arb requires KMS support for GPUs where we can turn off VGA
decoding. Currently we know how to do this for intel and radeon
kms drivers, which allows them to be removed from the arbiter.
This patch comes from Fedora rawhide kernel.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Mon, 21 Sep 2009 04:48:45 +0000 (14:48 +1000)]
drm/radeon: some r420s have a CP race with the DMA engine.
This patch makes sure the CP doesn't DMA do VRAM while 2D
is active by inserting a CP resync token.
todo: port to kms.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Fri, 18 Sep 2009 15:30:30 +0000 (11:30 -0400)]
drm/radeon/r600/kms: rv670 is not DCE3
RV670 was using the wrong modesetting code.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 21 Sep 2009 04:15:10 +0000 (14:15 +1000)]
drm/radeon/kms: r420 idle after programming GA_ENHANCE
https://bugs.freedesktop.org/show_bug.cgi?id=24041
The idle allows rs690 to startup properly.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 21 Sep 2009 04:06:30 +0000 (14:06 +1000)]
drm/radeon/kms: more fixes to rv770 suspend/resume path.
This resumes my
RV730PRO (4650)
RV770 (4850)
fine.
Still researching the RV4550 (RV710), resumes without X fine.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Sun, 20 Sep 2009 23:02:06 +0000 (16:02 -0700)]
Merge git://git./linux/kernel/git/jaswinder/linux-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/linux-2.6:
includecheck fix: x86, cpu/common.c
includecheck fix: kernel/trace, ring_buffer.c
includecheck fix: include/linux, ftrace.h
includecheck fix: include/linux, page_cgroup.h
includecheck fix: include/linux, aio.h
includecheck fix: include/drm, drm_memory.h
includecheck fix: include/acpi, acpi_bus.h
includecheck fix: drivers/xen, evtchn.c
includecheck fix: drivers/video, vgacon.c
includecheck fix: drivers/scsi, ibmvscsi.c
includecheck fix: drivers/scsi, libfcoe.c
includecheck fix: x86, shadow.c
includecheck fix: x86, traps.c
includecheck fix: um, helper.c
includecheck fix: s390, sys_s390.c
Linus Torvalds [Sun, 20 Sep 2009 22:57:28 +0000 (15:57 -0700)]
loongson: fix cut-and-paste mis-merge
Ingo points out that I screwed up when merging the 'timers-for-linus'
branch in commit
a03fdb7612874834d6847107198712d18b5242c7.
A bit too much copy-and-pasting caused the end result to have an
extraneous 'return' in the middle of an expression. That was obviously
bogus. Blush.
Reported-by-with-patch: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 20 Sep 2009 22:55:56 +0000 (15:55 -0700)]
Merge branch 'next-i2c' of git://aeryn.fluff.org.uk/bjdooks/linux
* 'next-i2c' of git://aeryn.fluff.org.uk/bjdooks/linux:
[PATCH] i2c-imx: make bus available early
i2c-mv64xxx: correct mv64xxx_i2c_intr() return type
Linus Torvalds [Sun, 20 Sep 2009 22:55:39 +0000 (15:55 -0700)]
Merge git://git./linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
Driver-Core: extend devnode callbacks to provide permissions
Linus Torvalds [Sun, 20 Sep 2009 22:55:17 +0000 (15:55 -0700)]
Merge git://git./linux/kernel/git/gregkh/tty-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (79 commits)
USB serial: update the console driver
usb-serial: straighten out serial_open
usb-serial: add missing tests and debug lines
usb-serial: rename subroutines
usb-serial: fix termios initialization logic
usb-serial: acquire references when a new tty is installed
usb-serial: change logic of serial lookups
usb-serial: put subroutines in logical order
usb-serial: change referencing of port and serial structures
tty: Char: mxser, use THRE for ASPP_OQUEUE ioctl
tty: Char: mxser, add support for CP112UL
uartlite: support shared interrupt lines
tty: USB: serial/mct_u232, fix tty refcnt
tty: riscom8, fix tty refcnt
tty: riscom8, fix shutdown declaration
TTY: fix typos
tty: Power: fix suspend vt regression
tty: vt: use printk_once
tty: handle VT specific compat ioctls in vt driver
n_tty: move echoctl check and clean up logic
...
Linus Torvalds [Sun, 20 Sep 2009 22:54:37 +0000 (15:54 -0700)]
Merge branch 'perfcounters-core-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (58 commits)
perf_counter: Fix perf_copy_attr() pointer arithmetic
perf utils: Use a define for the maximum length of a trace event
perf: Add timechart help text and add timechart to "perf help"
tracing, x86, cpuidle: Move the end point of a C state in the power tracer
perf utils: Be consistent about minimum text size in the svghelper
perf timechart: Add "perf timechart record"
perf: Add the timechart tool
perf: Add a SVG helper library file
tracing, perf: Convert the power tracer into an event tracer
perf: Add a sample_event type to the event_union
perf: Allow perf utilities to have "callback" options without arguments
perf: Store trace event name/id pairs in perf.data
perf: Add a timestamp to fork events
sched_clock: Make it NMI safe
perf_counter: Fix up swcounter throttling
x86, perf_counter, bts: Optimize BTS overflow handling
perf sched: Add --input=file option to builtin-sched.c
perf trace: Sample timestamp and cpu when using record flag
perf tools: Increase MAX_EVENT_LENGTH
perf tools: Fix memory leak in read_ftrace_printk()
...
Alok Kataria [Fri, 4 Sep 2009 20:13:39 +0000 (13:13 -0700)]
x86: Print the hypervisor returned tsc_khz during boot
On an AMD-64 system the processor frequency that is printed during
system boot, may be different than the tsc frequency that was
returned by the hypervisor, due to the value returned from
calibrate_cpu.
For debugging timekeeping or other related issues it might be
better to get the tsc_khz value returned by the hypervisor.
The patch below now prints the tsc frequency that the VMware
hypervisor returned.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <
1252095219.12518.13.camel@ank32.eng.vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Sun, 20 Sep 2009 18:24:58 +0000 (20:24 +0200)]
Merge branch 'linus' into x86/urgent
Merge reason: Bring in changes that the next patch will depend on.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Beulich [Fri, 4 Sep 2009 08:18:07 +0000 (09:18 +0100)]
x86: Correct segment permission flags in 64-bit linker script
While these don't get actively used (afaict), it still doesn't hurt
for them to properly reflect what how respective segments will get
mapped/ accessed.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <
4AA0E95F0200007800013707@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Beulich [Fri, 4 Sep 2009 08:16:22 +0000 (09:16 +0100)]
x86: cpuinit-annotate SMP boot trampolines properly
Add missing annotations, and make use of include/linux/init.h's
macros.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <
4AA0E8F60200007800013703@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Beulich [Fri, 4 Sep 2009 08:13:49 +0000 (09:13 +0100)]
x86: Increase timeout for EHCI debug port reset completion in early printk
On one of my systems, several thousand iterations are needed before
CMD_RESET can be observed clear after setting it. Using a much
higher value here obviously cannot hurt.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <
4AA0E85D02000078000136F9@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yong Zhang [Mon, 14 Sep 2009 12:20:16 +0000 (20:20 +0800)]
sched: Fix potential NULL derference of doms_cur
If CONFIG_CPUMASK_OFFSTACK is enabled but doms_cur alloc failed in
arch_init_sched_domains(), doms_cur will move back to
fallback_doms. But this time, fallback_doms has not been
initialized yet.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: a.p.zijlstra@chello.nl
LKML-Reference: <
1252930816-7672-1-git-send-email-yong.zhang0@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Sergey Senozhatsky [Thu, 17 Sep 2009 12:54:01 +0000 (15:54 +0300)]
x86: Fix uaccess_32.h typo
Trivial: correct "that the we don't" typo.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
LKML-Reference: <
20090917125401.GU3717@localdomain.by>
Signed-off-by: Ingo Molnar <mingo@elte.hu>