Ingo Molnar [Tue, 11 Aug 2009 08:47:36 +0000 (10:47 +0200)]
perf_counter, x86: Fix lapic printk message
Instead of this garbled bootup on UP Pentium-M systems:
[ 0.015048] Performance Counters:
[ 0.016004] no Local APIC, try rebooting with lapicno PMU driver, software counters only.
Print:
[ 0.015050] Performance Counters:
[ 0.016004] no APIC, boot with the "lapic" boot parameter to force-enable it.
[ 0.017003] no PMU driver, software counters only.
Cf: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Mon, 10 Aug 2009 20:21:19 +0000 (13:21 -0700)]
pty: fix data loss when stopped (^S/^Q)
Commit
d945cb9cc ("pty: Rework the pty layer to use the normal buffering
logic") dropped the test for 'tty->stopped' in pty_write_room(), which
then causes the n_tty line discipline thing to not throttle the data
properly when the tty is stopped.
So instead of pausing the write due to the tty being stopped, the ldisc
layer would go ahead and push it down to the pty. The pty write()
routine would then refuse to take the data (because it _did_ check
'stopped'), and the data wouldn't actually be written.
This whole stopped test should eventually be moved into the tty ldisc
layer rather than have low-level tty drivers care about these things,
but right now the fix is to just re-instate the missing pty 'stopped'
handling.
Reported-and-tested-by: Artur Skawina <art.08.09@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 10 Aug 2009 18:48:51 +0000 (11:48 -0700)]
Merge branch 'perfcounters-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
perf_counter: Zero dead bytes from ftrace raw samples size alignment
perf_counter: Subtract the buffer size field from the event record size
perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data
perf_counter: Correct PERF_SAMPLE_RAW output
perf tools: callchain: Fix bad rounding of minimum rate
perf_counter tools: Fix libbfd detection for systems with libz dependency
perf: "Longum est iter per praecepta, breve et efficax per exempla"
perf_counter: Fix a race on perf_counter_ctx
perf_counter: Fix tracepoint sampling to be part of generic sampling
perf_counter: Work around gcc warning by initializing tracepoint record unconditionally
perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
perf tools: callchain: Fix 'perf report' display to be callchain by default
perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains
perf record: Fix the -A UI for empty or non-existent perf.data
perf util: Fix do_read() to fail on EOF instead of busy-looping
perf list: Fix the output to not include tracepoints without an id
perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale
perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
perf report: Fix per task mult-counter stat reporting
...
Linus Torvalds [Mon, 10 Aug 2009 18:21:13 +0000 (11:21 -0700)]
Merge branch 'irq-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/irq: Fix move_irq_desc() for nodes without ram
Linus Torvalds [Mon, 10 Aug 2009 18:11:40 +0000 (11:11 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix serialization in pit_expect_msb()
Linus Torvalds [Mon, 10 Aug 2009 18:00:37 +0000 (11:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI hotplug: SGI hotplug: do not use hotplug_slot_attr
PCI hotplug: SGI hotplug: fix build failure
Linus Torvalds [Fri, 31 Jul 2009 19:45:41 +0000 (12:45 -0700)]
x86: Fix serialization in pit_expect_msb()
Wei Chong Tan reported a fast-PIT-calibration corner-case:
| pit_expect_msb() is vulnerable to SMI disturbance corner case
| in some platforms which causes /proc/cpuinfo to show wrong
| CPU MHz value when quick_pit_calibrate() jumps to success
| section.
I think that the real issue isn't even an SMI - but the fact
that in the very last iteration of the loop, there's no
serializing instruction _after_ the last 'rdtsc'. So even in
the absense of SMI's, we do have a situation where the cycle
counter was read without proper serialization.
The last check should be done outside the outer loop, since
_inside_ the outer loop, we'll be testing that the PIT has
the right MSB value has the right value in the next iteration.
So only the _last_ iteration is special, because that's the one
that will not check the PIT MSB value any more, and because the
final 'get_cycles()' isn't serialized.
In other words:
- I'd like to move the PIT MSB check to after the last
iteration, rather than in every iteration
- I think we should comment on the fact that it's also a
serializing instruction and so 'fences in' the TSC read.
Here's a suggested replacement.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: "Tan, Wei Chong" <wei.chong.tan@intel.com>
Tested-by: "Tan, Wei Chong" <wei.chong.tan@intel.com>
LKML-Reference: <
B28277FD4E0F9247A3D55704C440A140D5D683F3@pgsmsx504.gar.corp.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Mon, 10 Aug 2009 16:00:47 +0000 (09:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
mm_for_maps: take ->cred_guard_mutex to fix the race with exec
mm_for_maps: shift down_read(mmap_sem) to the caller
mm_for_maps: simplify, use ptrace_may_access()
Linus Torvalds [Mon, 10 Aug 2009 15:59:56 +0000 (08:59 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/dma: pci_set_dma_mask() shouldn't fail if mask fits in RAM
Jaswinder Singh Rajput [Mon, 10 Aug 2009 15:45:42 +0000 (16:45 +0100)]
MN10300: includecheck fix: mn10300, pci.h
Fix the following 'make includecheck' warning:
arch/mn10300/include/asm/pci.h: linux/mm.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Figo.zhang [Sat, 8 Aug 2009 13:01:22 +0000 (21:01 +0800)]
mempool.c: clean up type-casting
clean up type-casting twice. "size_t" is typedef as "unsigned long" in
64-bit system, and "unsigned int" in 32-bit system, and the intermediate
cast to 'long' is pointless.
Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Sat, 8 Aug 2009 08:52:50 +0000 (17:52 +0900)]
documentation: register ioctl entry of nilfs2
This will register the ioctl range used by nilfs2 file system to the
table listed in Documentation/ioctl/ioctl-number.txt.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frederic Weisbecker [Mon, 10 Aug 2009 14:38:36 +0000 (16:38 +0200)]
perf_counter: Zero dead bytes from ftrace raw samples size alignment
After aligning the ftrace raw samples, there are dead bytes storing
random data from the stack. We don't want to leak these to userspace,
then zero these out.
Before:
0x2de88 [0x50]: event: 9
.
. ... raw event: size 80 bytes
. 0000: 09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff ......P........
. 0010: 68 01 00 00 68 01 00 00 2c 00 00 00 00 00 00 00 h...h...,......
. 0020: 2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00 ,...+...h...h..
. 0030: 6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00 kondemand/0....
. 0040: 68 01 00 00 40 7f 46 81 ff ff ff ff 00 10 1b 7f h...@.F........
^ ^ ^ ^
Leak
After:
0x2d318 [0x50]: event: 9
.
. ... raw event: size 80 bytes
. 0000: 09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff ......P........
. 0010: 68 01 00 00 68 01 00 00 68 14 00 00 00 00 00 00 h...h...h......
. 0020: 2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00 ,...+...h...h..
. 0030: 6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00 kondemand/0....
. 0040: 68 01 00 00 a0 80 46 81 ff ff ff ff 00 00 00 00 h.....F........
^ ^ ^ ^
Fixed
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <
1249915116-5210-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Frederic Weisbecker [Mon, 10 Aug 2009 14:11:32 +0000 (16:11 +0200)]
perf_counter: Subtract the buffer size field from the event record size
We compute the perf raw sample size by aligning the raw ftrace
event size plus the buffer size field itself. We do that
instead of aligning only the perf raw sample size, so that we
might economize some in some cases.
But this buffer size field is not stored in the perf raw
sample, we must then substract its size from the buffer once we
computed the alignment unless we may get a useless u32 field in
the buffer.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <
20090810141129.GA5124@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Oleg Nesterov [Fri, 10 Jul 2009 01:27:40 +0000 (03:27 +0200)]
mm_for_maps: take ->cred_guard_mutex to fix the race with exec
The problem is minor, but without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Now we do not need to re-check task->mm after ptrace_may_access(), it
can't be changed to the new mm under us.
Strictly speaking, this also fixes another very minor problem. Unless
security check fails or the task exits mm_for_maps() should never
return NULL, the caller should get either old or new ->mm.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Oleg Nesterov [Fri, 10 Jul 2009 01:27:38 +0000 (03:27 +0200)]
mm_for_maps: shift down_read(mmap_sem) to the caller
mm_for_maps() takes ->mmap_sem after security checks, this looks
strange and obfuscates the locking rules. Move this lock to its
single caller, m_start().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Oleg Nesterov [Tue, 23 Jun 2009 19:25:32 +0000 (21:25 +0200)]
mm_for_maps: simplify, use ptrace_may_access()
It would be nice to kill __ptrace_may_access(). It requires task_lock(),
but this lock is only needed to read mm->flags in the middle.
Convert mm_for_maps() to use ptrace_may_access(), this also simplifies
the code a little bit.
Also, we do not need to take ->mmap_sem in advance. In fact I think
mm_for_maps() should not play with ->mmap_sem at all, the caller should
take this lock.
With or without this patch, without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Peter Zijlstra [Mon, 10 Aug 2009 09:20:12 +0000 (11:20 +0200)]
perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data
Raw tracepoint data contains various kernel internals and
data from other users, so restrict this to CAP_SYS_ADMIN.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <
1249896452.17467.75.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 10 Aug 2009 09:16:52 +0000 (11:16 +0200)]
perf_counter: Correct PERF_SAMPLE_RAW output
PERF_SAMPLE_* output switches should unconditionally output the
correct format, as they are the only way to unambiguously parse
the PERF_EVENT_SAMPLE data.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <
1249896447.17467.74.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Benjamin Herrenschmidt [Mon, 10 Aug 2009 06:36:38 +0000 (16:36 +1000)]
powerpc/dma: pci_set_dma_mask() shouldn't fail if mask fits in RAM
On an iMac G5, the b43 driver is failing to initialise because trying to
set the dma mask to 30-bit fails. Even though there's only 512MiB of RAM
in the machine anyway:
https://bugzilla.redhat.com/show_bug.cgi?id=514787
We should probably let it succeed if the available RAM in the system
doesn't exceed the requested limit.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sun, 9 Aug 2009 21:58:34 +0000 (14:58 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/ubi-2.6
* 'for-linus' of git://git.infradead.org/ubi-2.6:
UBI: compatible fallback in absense of sequence numbers
UBI: fix double free on error path
Linus Torvalds [Sun, 9 Aug 2009 21:58:21 +0000 (14:58 -0700)]
Merge branch 'kvm-updates/2.6.31' of git://git./virt/kvm/kvm
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Avoid redelivery of edge interrupt before next edge
KVM: MMU: limit rmap chain length
KVM: ia64: fix build failures due to ia64/unsigned long mismatches
KVM: Make KVM_HPAGES_PER_HPAGE unsigned long to avoid build error on powerpc
KVM: fix ack not being delivered when msi present
KVM: s390: fix wait_queue handling
KVM: VMX: Fix locking imbalance on emulation failure
KVM: VMX: Fix locking order in handle_invalid_guest_state
KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in kvm_mmu_change_mmu_pages
KVM: SVM: force new asid on vcpu migration
KVM: x86: verify MTRR/PAT validity
KVM: PIT: fix kpit_elapsed division by zero
KVM: Fix KVM_GET_MSR_INDEX_LIST
Linus Torvalds [Sun, 9 Aug 2009 21:58:09 +0000 (14:58 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/i915: silence vblank warnings
drm: silence pointless vblank warning.
drm: When adding probed modes, preserve duplicate mode types
Linus Torvalds [Sun, 9 Aug 2009 21:57:41 +0000 (14:57 -0700)]
Merge branch 'timers-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
posix_cpu_timers_exit_group(): Do not use thread_group_cputimer()
Linus Torvalds [Sun, 9 Aug 2009 21:57:26 +0000 (14:57 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix/complete ftrace event records sampling
perf_counter, ftrace: Fix perf_counter integration
tracing/filters: Always free pred on filter_add_subsystem_pred() failure
tracing/filters: Don't use pred on alloc failure
ring-buffer: Fix memleak in ring_buffer_free()
tracing: Fix recordmcount.pl to handle sections with only weak functions
ring-buffer: Fix advance of reader in rb_buffer_peek()
tracing: do not use functions starting with .L in recordmcount.pl
ring-buffer: do not disable ring buffer on oops_in_progress
ring-buffer: fix check of try_to_discard result
Linus Torvalds [Sun, 9 Aug 2009 21:57:09 +0000 (14:57 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix buffer overflow in efi_init()
x86: Add quirk to make Apple MacBookPro5,1 use reboot=pci
x86: Fix MSI-X initialization by using online_mask for x2apic target_cpus
x86: Fix VMI && stack protector
Linus Torvalds [Sun, 9 Aug 2009 21:56:51 +0000 (14:56 -0700)]
Merge branch 'core-fixes-for-linus-2' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Fix typos in documentation
lockdep: Fix file mode of lock_stat
rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock()
Frederic Weisbecker [Sun, 9 Aug 2009 02:19:15 +0000 (04:19 +0200)]
perf tools: callchain: Fix bad rounding of minimum rate
Sometimes we get callchain branches that have a rate under the
limit given by the user.
Say you launched:
perf record -f -g -a ./hackbench 10
perf report -g fractal,10.0
And you got:
2.33% hackbench [kernel] [k] _spin_lock_irqsave
|
|--78.57%-- remove_wait_queue
| poll_freewait
| do_sys_poll
| sys_poll
| sysenter_dispatch
| 0xf7ffa430
| 0x1ffadea3c
|
|--7.14%-- __up_read
| up_read
| do_page_fault
| page_fault
| 0xf7ffa430
| 0xa0df710000000a
...
It is abnormal to get a 7.14% branch whereas we passed a 10%
filter.
The problem is that we round down the minimum threshold. This
happens mostly when we have very low number of events. If the
total amount of your branch is 4 and you have a subranch of 3
events, filtering to 90% will be computed like follows:
limit = 4 * 0.9;
The result is about 3.6, but the cast to integer will round
down to 3. It means that our filter is actually of 75%
We must then explicitly round up the minimum threshold.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: acme@redhat.com
Cc: peterz@infradead.org
Cc: efault@gmx.de
LKML-Reference: <
20090809024235.GA10146@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Galbraith [Sat, 8 Aug 2009 12:14:15 +0000 (14:14 +0200)]
perf_counter tools: Fix libbfd detection for systems with libz dependency
Due to a libz dependency in some distro's binutils package,
C++ demangle support isn't compiled in despite the necessary
libraries being available.
Fix this by adding a -lz link test to the dependency detection
rules.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1249733655.6929.5.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Carlos R. Mafra [Wed, 5 Aug 2009 18:53:34 +0000 (20:53 +0200)]
perf: "Longum est iter per praecepta, breve et efficax per exempla"
A few examples of how 'perf' can be used, from an e-mail by
Ingo Molnar http://lkml.org/lkml/2009/8/4/346.
Signed-off-by: Carlos R. Mafra <crmafra2@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Valdis.Kletnieks@vt.edu
LKML-Reference: <
20090805185334.GA4535@Pilar.aei.mpg.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 7 Aug 2009 17:49:01 +0000 (19:49 +0200)]
perf_counter: Fix a race on perf_counter_ctx
While extending perfcounters with BTS hw-tracing, Markus
Metzger managed to trigger this warning:
[ 995.557128] WARNING: at kernel/perf_counter.c:1191 __perf_counter_task_sched_out+0x48/0x6b()
triggers because commit
9f498cc5be7e013d8d6e4c616980ed0ffc8680d2 (perf_counter: Full
task tracing) removed clearing of tsk->perf_counter_ctxp out
from under ctx->lock which introduced a race (against
perf_lock_task_context).
Move it back and deal with the exit notification by explicitly
passing along the former task context.
Reported-by: Markus T Metzger <markus.t.metzger@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <
1249667341.17467.5.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sat, 8 Aug 2009 02:26:37 +0000 (04:26 +0200)]
perf_counter: Fix tracepoint sampling to be part of generic sampling
Based on Peter's comments, make tracepoint sampling generic
just like all the other sampling bits are. This is a rename
with no code changes:
- PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
- struct perf_tracepoint_record to perf_raw_record
We want the system in place that transport tracepoints raw
samples events into the perf ring buffer to be generalized and
usable by any type of counter.
Reported-by; Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <
1249698400-5441-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sat, 8 Aug 2009 02:26:35 +0000 (04:26 +0200)]
perf_counter: Work around gcc warning by initializing tracepoint record unconditionally
Despite that the tracepoint record is always present when the
PERF_SAMPLE_TP_RECORD flag is set, gcc raises a warning,
thinking it might not be initialized:
kernel/perf_counter.c: In function ‘perf_counter_output’:
kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function
Then, initialize it to NULL and always check if it's not NULL
before dereference it.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <
1249698400-5441-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sat, 8 Aug 2009 00:16:25 +0000 (02:16 +0200)]
perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
When we filter the callchains below a given percentage, we
ignore them and the end result only shows entries that have an
upper percentage than the filter threshold.
It seems to users then that we have an imbalance in the
percentage, as if the sum inside a profiled branch doesn't
reach 100%.
Since in the past there have been real perf report bugs that
showed the same sypmtom, it would be nice to assure the user
that the data is perfect and trustable and it all sums up to
100.00%.
So fix this by displaying the remaining hits that have been
filtered but without more detail than their amount in each
branches. Example while filtering below 50%:
7.73% [k] delay_tsc
|
|--98.22%-- __const_udelay
| |
| |--86.37%-- ath5k_hw_register_timeout
| | ath5k_hw_noise_floor_calibration
| | ath5k_hw_reset
| | ath5k_reset
| | ath5k_config
| | ieee80211_hw_config
| | |
| | |--88.53%-- ieee80211_scan_work
| | | worker_thread
| | | kthread
| | | child_rip
| | --11.47%-- [...]
| --13.63%-- [...]
--1.78%-- [...]
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <
1249690585-9145-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sat, 8 Aug 2009 00:16:24 +0000 (02:16 +0200)]
perf tools: callchain: Fix 'perf report' display to be callchain by default
If we recorded with -g option to record the callchain, right now
we require a -g option to perf report as well - and people reported
this as unnecessary complication: the user already specified -g
once, no need to require it a second time.
So if the recording includes call-chains, display the callchain by
default from perf report.
( The user can override this default using "-g none" option from
perf report. )
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <
1249690585-9145-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sat, 8 Aug 2009 00:16:23 +0000 (02:16 +0200)]
perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains
When the callchain tree comes to insert an empty backtrace, it
raises a spurious warning about the fact we are inserting an
empty. This is spurious because the radix tree assumes it did
something wrong to reach this state. But it didn't, we just met
an empty callchain that has to be ignored.
This happens occasionally with certain types of call-chain
recordings. If it happens it's a big nuisance as perf report
output starts with thousands of warning lines.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <
1249690585-9145-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pierre Habouzit [Fri, 7 Aug 2009 12:16:01 +0000 (14:16 +0200)]
perf record: Fix the -A UI for empty or non-existent perf.data
1. Ignore the -A argument if there is no perf.data file
2. Treat an empty file like a non existent file.
Else, perf will try to read the perf.data header, and fail with
an error.
Treating an empty file like a non-existent file makes sense,
since an interupted (as in SIGKILLed) perf could leave such
files around, and you don't want to annoy the user with errors
for files with no data in it.
Signed-off-by: Pierre Habouzit <pierre.habouzit@intersec.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pierre Habouzit [Fri, 7 Aug 2009 12:16:00 +0000 (14:16 +0200)]
perf util: Fix do_read() to fail on EOF instead of busy-looping
While toying with perf, I've noticed that perf record can
easily enter a busy loop when doing something as silly as:
$ perf record -A ls
Yeah, do_read here really wants to read a known size, not being
able to should die(), not busy-loop ;)
That was the cause for the bug.
Signed-off-by: Pierre Habouzit <pierre.habouzit@intersec.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 6 Aug 2009 14:48:54 +0000 (16:48 +0200)]
perf list: Fix the output to not include tracepoints without an id
Stop perf list from displaying tracepoints without an id file,
those are special tracepoints that are not interfaced to
perfcounters so listing them is erroneous and passing them as
events will produce no output.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul Mackerras [Fri, 7 Aug 2009 06:59:45 +0000 (16:59 +1000)]
perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
If we have the powerpc perf_counter backend compiled in, but
the cpu we are running on is one where we don't support the
PMU, we currently oops in hw_perf_group_sched_in if we try to
use any counters, because ppmu is NULL in that case, and we
unconditionally dereference ppmu.
This fixes the problem by adding a check if ppmu is NULL at the
beginning of hw_perf_group_sched_in, and also at the beginning
of the other functions that get called from the perf_counter
core, i.e. hw_perf_disable, hw_perf_enable, and
hw_perf_counter_setup.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: benh@kernel.crashing.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Brice Goglin [Fri, 7 Aug 2009 08:18:39 +0000 (10:18 +0200)]
perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale
We want to use a coherent flag for -S/--stat across all tools,
so free up -S in perf stat.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arnaldo Carvalho de Melo [Thu, 6 Aug 2009 17:43:17 +0000 (14:43 -0300)]
perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
Used with perf report --verbose:
[acme@doppio linux-2.6-tip]$ perf report -v | head -16
5.17% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x00000000005d8eee f [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
2.56% firefox /lib64/libpthread-2.10.1.so 0x0000000000008e02 d [.] __pthread_mutex_lock_internal
1.94% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x0000000000d0af8f f [.] SearchTable
1.75% firefox [kernel] 0xffffffffff60013b k [.] vread_hpet
1.63% firefox /lib64/libpthread-2.10.1.so 0x000000000000a404 d [.] __pthread_mutex_unlock
1.47% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000482ea f [.] js_Interpret
1.42% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x000000000003eda3 f [.] JS_CallTracer
1.24% firefox [kernel] 0xffffffff8102ca4a k [k] read_hpet
1.16% firefox [kernel] 0xffffffff810f3dd4 k [k] fget_light
1.11% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000567ff f [.] js_TraceObject
0.98% firefox /usr/lib64/firefox-3.5.2/firefox 0x000000000000dd23 b [.] arena_ralloc
[acme@doppio linux-2.6-tip]$
The new field is just after the symbol address. To help in
figuring out symbol resolution bugs.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 6 Aug 2009 17:40:28 +0000 (19:40 +0200)]
perf report: Fix per task mult-counter stat reporting
Brice Goglin reported:
> I can easily sort them by thread id, but I don't know how to match
> my 4 events with each group of 4 lines.
Also report the counter id and the time running/enabled
stats (in case the counter got time-shared).
Reported-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 6 Aug 2009 18:57:41 +0000 (20:57 +0200)]
perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header
Brice Goglin reported that only the first result from a
multi-counter perf record --stat run is accurate, the
rest looks bogus.
A silly mistake made us re-read the first attribute for
every recorded attribute.
Reported-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Cc: paulus@samba.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Fri, 7 Aug 2009 05:11:05 +0000 (07:11 +0200)]
perf tools: Fix call-chain cumul hit based sub-total (fractal mode)
The callchain fractal mode builds each new total hits in a new
branch of profiling by using the parent's hits of the current
branch plus the hits of the children.
This is wrong, the total hits of a branch should be made of the
sum of every children hits, we must ignore the parent hits in
this scope.
This patch also fixes another mistake with the hit counting.
Now the rates are correct.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Galbraith [Tue, 4 Aug 2009 08:24:41 +0000 (10:24 +0200)]
perf top: Update man page
perf_counter tools: update perf top manual page to reflect
current implementation.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Galbraith [Tue, 4 Aug 2009 08:21:23 +0000 (10:21 +0200)]
perf top: Improve interactive key handling
Pressing any key which is not currently mapped to
functionality, based on startup command line options, displays
currently mapped keys, and prompts for input.
Pressing any unmapped key at the prompt returns the user to
display mode with variables unchanged. eg, pressing ? <SPACE>
<ESC> etc displays currently available keys, the value of the
variable associated with that key, and prompts.
Pressing same again aborts input.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Wed, 22 Jul 2009 07:29:32 +0000 (09:29 +0200)]
perf_counter: Fix software counters for fast moving event sources
Reimplement the software counters to deal with fast moving
event sources (such as tracepoints). This means being able
to generate multiple overflows from a single 'event' as well
as support throttling.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Galbraith [Fri, 24 Jul 2009 08:09:50 +0000 (10:09 +0200)]
perf_counter tools: Allow perf top top users to switch between weighted and individual counter display
Add [w]eighted hotkey. Pressing [w] toggles between displaying
weighted total of all counters, and the counter selected via
[E]vent select key.
------------------------------------------------------------------------------
PerfTop: 90395 irqs/sec kernel:16.1% [cache-misses/cache-references/instructions], (all, 4 CPUs)
------------------------------------------------------------------------------
weight samples pcnt RIP kernel function
______ _______ _____ ________________ _______________
1275408.6 10881 - 5.3% -
ffffffff81146f70 : copy_page_c
553683.4 43569 - 21.3% -
ffffffff81146f20 : clear_page_c
74075.0 6768 - 3.3% -
ffffffff81147190 : copy_user_generic_string
40602.9 7538 - 3.7% -
ffffffff81284ba2 : _spin_lock
26882.1 965 - 0.5% -
ffffffff8109d280 : file_ra_state_init
[w]
------------------------------------------------------------------------------
PerfTop: 91221 irqs/sec kernel:14.5% [10000Hz cache-misses], (all, 4 CPUs)
------------------------------------------------------------------------------
weight samples pcnt RIP kernel function
______ _______ _____ ________________ _______________
47320.00 - 22.3% -
ffffffff81146f20 : clear_page_c
14261.00 - 6.7% -
ffffffff810992f5 : __rmqueue
11046.00 - 5.2% -
ffffffff81146f70 : copy_page_c
7842.00 - 3.7% -
ffffffff81284ba2 : _spin_lock
7234.00 - 3.4% -
ffffffff810aa1d6 : unmap_vmas
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Galbraith [Wed, 22 Jul 2009 18:36:03 +0000 (20:36 +0200)]
perf_counter tools: Fix/resurrect perf top annotation in a simple interactive form
perf top used to have annotation support, but it has bitrotted and
removed.
This patch restores that: it allows the user to select any symbol
in kernel space for source level annotation on the fly, switch
between event counters and alter display variables. When symbol
details are being displayed, stopping annotation reverts to normal.
known keys:
[d] select display delay.
[e] select display entries (lines).
[E] select annotation event counter.
[f] select normal display count filter.
[F] select annotation display count filter (percentage).
[qQ] quit.
[s] select annotation symbol and start annotation.
[S] stop annotation, revert to normal display.
[z] toggle event count zeroing.
Sample:
------------------------------------------------------------------------------
PerfTop: 16719 irqs/sec kernel:78.7% [cache-misses/cache-references/instructions/cycles], (all, 4 CPUs)
------------------------------------------------------------------------------
Showing cache-misses for e1000_clean_rx_irq
Events Pcnt (>=3%)
0 0.0% /* adjust length to remove Ethernet CRC */
0 0.0% if (!(adapter->flags2 & FLAG2_CRC_STRIPPING))
0 0.0% length -= 4;
436 5.0% f039: 41 f6 84 24 5c 29 00 testb $0x1,0x295c(%r12)
0 0.0% f089: 8b 4d 84 mov -0x7c(%rbp),%ecx
0 0.0% f08c: 48 83 ef 02 sub $0x2,%rdi
0 0.0% f090: 48 83 ee 02 sub $0x2,%rsi
811 9.3% f094: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
0 0.0%
0 0.0% while (rx_desc->status & E1000_RXD_STAT_DD) {
0 0.0% f114: 41 f6 47 0c 01 testb $0x1,0xc(%r15)
7226 82.6% f119: 0f 85 24 fe ff ff jne ef43 <e1000_clean_rx_irq+0x84>
Available events:
0 cache-misses
1 cache-references
2 instructions
3 cycles
Enter details event counter: 2
------------------------------------------------------------------------------
PerfTop: 15035 irqs/sec kernel:79.0% [cache-misses/cache-references/instructions/cycles], (all, 4 CPUs)
------------------------------------------------------------------------------
Showing instructions for e1000_clean_rx_irq
Events Pcnt (>=3%)
0 0.0% int *work_done, int work_to_do)
0 0.0% {
175 0.9% eebf: 55 push %rbp
1898 9.8% eec0: 48 89 e5 mov %rsp,%rbp
0 0.0%
0 0.0% i = rx_ring->next_to_clean;
140 0.7% ef0a: 0f b7 41 1a movzwl 0x1a(%rcx),%eax
670 3.4% ef0e: 89 45 ac mov %eax,-0x54(%rbp)
0 0.0% {
0 0.0% memcpy(skb->data + offset, from, len);
91 0.5% f07b: 49 8b b6 e8 00 00 00 mov 0xe8(%r14),%rsi
1153 5.9% f082: 48 8b b8 e8 00 00 00 mov 0xe8(%rax),%rdi
42 0.2% f089: 8b 4d 84 mov -0x7c(%rbp),%ecx
14 0.1% f08c: 48 83 ef 02 sub $0x2,%rdi
0 0.0% f090: 48 83 ee 02 sub $0x2,%rsi
1618 8.3% f094: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
0 0.0%
0 0.0% /* return some buffers to hardware, one at a time is too slow */
0 0.0% if (cleaned_count >= E1000_RX_BUFFER_WRITE) {
867 4.5% f0e7: 83 7d b0 0f cmpl $0xf,-0x50(%rbp)
0 0.0%
0 0.0% while (rx_desc->status & E1000_RXD_STAT_DD) {
37 0.2% f114: 41 f6 47 0c 01 testb $0x1,0xc(%r15)
4047 20.8% f119: 0f 85 24 fe ff ff jne ef43 <e1000_clean_rx_irq+0x84>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Thu, 6 Aug 2009 23:25:54 +0000 (01:25 +0200)]
perf_counter: Fix/complete ftrace event records sampling
This patch implements the kernel side support for ftrace event
record sampling.
A new counter sampling attribute is added:
PERF_SAMPLE_TP_RECORD
which requests ftrace events record sampling. In this case
if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
fires, we emit the tracepoint binary record to the
perfcounter event buffer, as a sample.
Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
record:
perf record -f -F 1 -a -e workqueue:workqueue_execution
perf report -D
0x21e18 [0x48]: event: 9
.
. ... raw event: size 72 bytes
. 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........
. 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!......
. 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve
. 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1...........
. 0040: e0 b1 31 81 ff ff ff ff .......
.
0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
The raw ftrace binary record starts at offset 0020.
Translation:
struct trace_entry {
type = 0x2b = 43;
flags = 1;
preempt_count = 2;
pid = 0xa = 10;
tgid = 0xa = 10;
}
thread_comm = "events/1"
thread_pid = 0xa = 10;
func = 0xffffffff8131b1e0 = flush_to_ldisc()
What will come next?
- Userspace support ('perf trace'), 'flight data recorder' mode
for perf trace, etc.
- The unconditional copy from the profiling callback brings
some costs however if someone wants no such sampling to
occur, and needs to be fixed in the future. For that we need
to have an instant access to the perf counter attribute.
This is a matter of a flag to add in the struct ftrace_event.
- Take care of the events recursivity! Don't ever try to record
a lock event for example, it seems some locking is used in
the profiling fast path and lead to a tracing recursivity.
That will be fixed using raw spinlock or recursivity
protection.
- [...]
- Profit! :-)
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 21 Jul 2009 15:34:57 +0000 (17:34 +0200)]
perf_counter, ftrace: Fix perf_counter integration
Adds possible second part to the assign argument of TP_EVENT().
TP_perf_assign(
__perf_count(foo);
__perf_addr(bar);
)
Which, when specified make the swcounter increment with @foo instead
of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
associated with the event) when this triggers a counter overflow.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Sun, 9 Aug 2009 10:46:45 +0000 (12:46 +0200)]
Merge branch 'linus' into tracing/urgent
Merge reason: Merge up to almost-rc6 to pick up latest perfcounters
(on which we'll queue up a dependent fix)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Gleb Natapov [Sun, 5 Jul 2009 15:48:11 +0000 (18:48 +0300)]
KVM: Avoid redelivery of edge interrupt before next edge
The check for an edge is broken in current ioapic code. ioapic->irr is
cleared on each edge interrupt by ioapic_service() and this makes
old_irr != ioapic->irr condition in kvm_ioapic_set_irq() to be always
true. The patch fixes the code to properly recognise edge.
Some HW emulation calls set_irq() without level change. If each such
call is propagated to an OS it may confuse a device driver. This is the
case with keyboard device emulation and Windows XP x64 installer on SMP VM.
Each keystroke produce two interrupts (down/up) one interrupt is
submitted to CPU0 and another to CPU1. This confuses Windows somehow
and it ignores keystrokes.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Roel Kluin [Thu, 6 Aug 2009 22:58:13 +0000 (15:58 -0700)]
x86: fix buffer overflow in efi_init()
If the vendor name (from c16) can be longer than 100 bytes (or missing a
terminating null), then the null is written past the end of vendor[].
Found with Parfait, http://research.sun.com/projects/parfait/
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Frans Pop [Sun, 9 Aug 2009 02:25:29 +0000 (12:25 +1000)]
drm/i915: silence vblank warnings
these errors are pretty pointless
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Paul Rolland [Sun, 9 Aug 2009 02:24:01 +0000 (12:24 +1000)]
drm: silence pointless vblank warning.
Some applications/hardware combinations are triggering the message "failed to
acquire vblank counter" to be issued up to 20 times a second, which makes it
both useless and dangerous, as this may hide other important messages.
This changes makes it only appear when people are debugging.
Signed-off-by: Paul Rolland <rol@as2917.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Lost-twice-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Keith Packard [Mon, 20 Jul 2009 21:49:17 +0000 (14:49 -0700)]
drm: When adding probed modes, preserve duplicate mode types
The code which takes probed modes and adds them to a connector eliminates
duplicate modes by comparing them using drm_mode_equal. That function
doesn't consider the type bits, which means that any modes which differ only
in the type field will be lost.
One of the bits in the mode->type field is the DRM_MODE_TYPE_PREFERRED bit.
If the mode with that bit is lost, then higher level code will not know
which mode to select, causing a random mode to be used instead.
This patch simply merges the two mode type bits together; that seems
reasonable to me, but perhaps only a subset of the bits should be used? None
of these can be user defined as they all come from looking at just the
hardware.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Stanislaw Gruszka [Thu, 6 Aug 2009 23:03:30 +0000 (16:03 -0700)]
posix_cpu_timers_exit_group(): Do not use thread_group_cputimer()
When the process exits we don't have to run new cputimer nor
use running one (as it not accounts when tsk->exit_state != 0)
to get process CPU times. As there is only one thread we can
just use CPU times fields from task and signal structs.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Roland McGrath <roland@redhat.com>
Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Sat, 8 Aug 2009 15:49:53 +0000 (10:49 -0500)]
tracing/filters: Always free pred on filter_add_subsystem_pred() failure
If filter_add_subsystem_pred() fails due to ENOSPC or ENOMEM,
the pred doesn't get freed, while as a side effect it does for
other errors. Make it so the caller always frees the pred for
any error.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <
1249746593.6453.32.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Sat, 8 Aug 2009 15:49:09 +0000 (10:49 -0500)]
tracing/filters: Don't use pred on alloc failure
Dan Carpenter sent me a fix to prevent pred from being used if
it couldn't be allocated. I noticed the same problem also
existed for the create_pred() case and added a fix for that.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <
1249746549.6453.29.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ozan Çağlayan [Tue, 4 Aug 2009 16:39:31 +0000 (19:39 +0300)]
x86: Add quirk to make Apple MacBookPro5,1 use reboot=pci
MacBookPro5,1 is not able to reboot unless reboot=pci is set.
This patch forces it through a DMI quirk specific to this
device.
Signed-off-by: Ozan Çağlayan <ozan@pardus.org.tr>
LKML-Reference: <
1249403971-6543-1-git-send-email-ozan@pardus.org.tr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Tue, 4 Aug 2009 16:01:33 +0000 (09:01 -0700)]
x86/irq: Fix move_irq_desc() for nodes without ram
Don't move it if target node is -1.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
4A785B5D.
4070702@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Tue, 4 Aug 2009 15:59:59 +0000 (08:59 -0700)]
x86: Fix MSI-X initialization by using online_mask for x2apic target_cpus
found a system where x2apic reports an MSI-X irq initialization
failure:
[ 302.859446] igbvf 0000:81:10.4: enabling device (0000 -> 0002)
[ 302.874369] igbvf 0000:81:10.4: using 64bit DMA mask
[ 302.879023] igbvf 0000:81:10.4: using 64bit consistent DMA mask
[ 302.894386] igbvf 0000:81:10.4: enabling bus mastering
[ 302.898171] igbvf 0000:81:10.4: setting latency timer to 64
[ 302.914050] reserve_memtype added 0xefb08000-0xefb0c000, track uncached-minus, req uncached-minus, ret uncached-minus
[ 302.933839] reserve_memtype added 0xefb28000-0xefb29000, track uncached-minus, req uncached-minus, ret uncached-minus
[ 302.940367] alloc irq_desc for 265 on node 4
[ 302.956874] alloc kstat_irqs on node 4
[ 302.959452] alloc irq_2_iommu on node 0
[ 302.974328] igbvf 0000:81:10.4: irq 265 for MSI/MSI-X
[ 302.977778] alloc irq_desc for 266 on node 4
[ 302.980347] alloc kstat_irqs on node 4
[ 302.995312] free_memtype request 0xefb28000-0xefb29000
[ 302.998816] igbvf 0000:81:10.4: Failed to initialize MSI-X interrupts.
... it turns out that when trying to enable MSI-X,
__assign_irq_vector(new, cfg_new, apic->target_cpus()) can not
get vector because for x2apic target-cpus returns cpumask_of(0)
Update that to online_mask like xapic.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <
4A785AFF.
3050902@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Sat, 8 Aug 2009 02:06:36 +0000 (19:06 -0700)]
Merge git://git./linux/kernel/git/gregkh/usb-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: fix oops on disconnect in cdc-acm
USB: storage: include Prolific Technology USB drive in unusual_devs list
USB: ftdi_sio: add product_id for Marvell OpenRD Base, Client
USB: ftdi_sio: add vendor and product id for Bayer glucose meter serial converter cable
USB: EHCI: fix counting of transaction error retries
USB: EHCI: fix two new bugs related to Clear-TT-Buffer
USB: usbfs: fix -ENOENT error code to be -ENODEV
USB: musb: fix the nop registration for OMAP3EVM
USB: devio: Properly do access_ok() checks
USB: pl2303: New vendor and product id
Linus Torvalds [Sat, 8 Aug 2009 02:06:13 +0000 (19:06 -0700)]
Merge git://git./linux/kernel/git/gregkh/staging-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6:
Staging: rspiusb: Fix buffer overflow
staging: add dependencies on PCI for drivers that require it
Staging: rtl8192su: fix build error
Staging: rt2870: Revert d44ca7 Removal of kernel_thread() API
Staging: rt2870: Add USB ID for Linksys, Planex Communications, Belkin
Linus Torvalds [Sat, 8 Aug 2009 02:03:59 +0000 (19:03 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (22 commits)
drm/i915: Fix read outside array bounds in restoring the SWF10 range.
drm/i915: Use our own workqueue to avoid wedging the system along with the GPU.
drm/i915: Add support for dual-channel LVDS on 8xx.
drm/i915: Return disconnected for SDVO DVI when there's no digital EDID.
drm/i915: Choose real sdvo output according to result from detection
drm/i915: Set preferred mode for integrated TV according to TV format
drm/i915: fix 845G FIFO size & burst length
drm/i915: fix VGA detect on IGDNG
drm/i915: Add eDP support on IGDNG mobile chip
drm/i915: enable DisplayPort support on IGDNG
drm/i915: Fix channel ending action for DP aux transaction
drm/i915: fix issue in display pipe setup on IGDNG
drm/i915: disable VGA plane reliably
drm/I915: Fix offset to DVO timings in LVDS data
drm/i915: hdmi detection according by reading edid
drm/i915: correct self-refresh calculation in "everything off" case
drm/i915: handle FIFO oversubsription correctly
drm/i915: FIFO watermark calculation fixes
drm/i915: ignore lvds on AOpen Mini PC MP-915
drm/i915: Allow frame buffers up to 4096x4096 on 915/945 class hardware
...
Linus Torvalds [Sat, 8 Aug 2009 02:03:09 +0000 (19:03 -0700)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix balancing oops when invalidate_inode_pages2 returns EBUSY
Btrfs: correct error-handling zlib error handling
Btrfs: remove superfluous NULL pointer check in btrfs_rename()
Btrfs: make sure the async caching thread advances the key
Btrfs: fix btrfs_remove_from_free_space corner case
Linus Torvalds [Sat, 8 Aug 2009 01:53:44 +0000 (18:53 -0700)]
Merge git://git./linux/kernel/git/hch/xfs-icache-races
* git://git.kernel.org/pub/scm/linux/kernel/git/hch/xfs-icache-races:
xfs: fix freeing of inodes not yet added to the inode cache
vfs: add __destroy_inode
vfs: fix inode_init_always calling convention
Roel Kluin [Fri, 7 Aug 2009 19:00:10 +0000 (21:00 +0200)]
Staging: rspiusb: Fix buffer overflow
usb_buffer_map_sg() may return -1. This will result in a read from
pdx->PixelUrb[frameInfo][-1]
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Mahoney [Fri, 7 Aug 2009 23:12:03 +0000 (16:12 -0700)]
staging: add dependencies on PCI for drivers that require it
This patch adds PCI dependencies to staging drivers that require it.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Tue, 4 Aug 2009 23:53:36 +0000 (16:53 -0700)]
Staging: rtl8192su: fix build error
This fixes a build error when selecting the rtl8192su driver as a
module. This has been reported by me, and the opensuse kernel developer
team, and I finally tracked it down.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mike Galbraith [Fri, 31 Jul 2009 05:14:04 +0000 (07:14 +0200)]
Staging: rt2870: Revert d44ca7 Removal of kernel_thread() API
Staging: rt2870: Revert d44ca7 Removal of kernel_thread() API
The sanity check this patch introduced triggers on shutdown, apparently due to
threads having already exited by the time BUG_ON() is reached.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Teoh <htmldeveloper@gmail.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jakob Gruber [Thu, 30 Jul 2009 18:37:36 +0000 (20:37 +0200)]
Staging: rt2870: Add USB ID for Linksys, Planex Communications, Belkin
Linksys WUSB100, Belkin
F5D8053 N, Planex Communications unknown model.
Signed-off-by: Jakob Gruber <jakob.gruber@kabelnet.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Oliver Neukum [Tue, 4 Aug 2009 21:52:09 +0000 (23:52 +0200)]
USB: fix oops on disconnect in cdc-acm
This patch fixes an oops caused when during an unplug a device's table
of endpoints is zeroed before the driver is notified. A pointer to
the endpoint must be cached.
this fixes a regression caused by commit
5186ffee2320942c3dc9745f7930e0eb15329ca6
Therefore it should go into 2.6.31
Signed-off-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Rogerio Brito [Thu, 6 Aug 2009 22:20:19 +0000 (15:20 -0700)]
USB: storage: include Prolific Technology USB drive in unusual_devs list
Add a quirk entry for the Leading Driver UD-11 usb flash drive.
As Alan Stern told me, the device doesn't deal correctly with the
locking media feature of the device, and this patch incorporates it.
Compiled, tested, working.
Signed-off-by: Rogerio Brito <rbrito@ime.usp.br>
Cc: Phil Dibowitz <phil@ipom.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Robert Hancock <hancockrwd@gmail.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dhaval Vasa [Fri, 7 Aug 2009 11:56:49 +0000 (17:26 +0530)]
USB: ftdi_sio: add product_id for Marvell OpenRD Base, Client
reference:
http://www.open-rd.org
Signed-off-by: Dhaval Vasa <dhaval.vasa@einfochips.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Marko Hänninen [Fri, 31 Jul 2009 19:32:39 +0000 (22:32 +0300)]
USB: ftdi_sio: add vendor and product id for Bayer glucose meter serial converter cable
Attached patch adds USB vendor and product IDs for Bayer's USB to serial
converter cable used by Bayer blood glucose meters. It seems to be a
FT232RL based device and works without any problem with ftdi_sio driver
when this patch is applied. See: http://winglucofacts.com/cables/
Signed-off-by: Marko Hänninen <bugitus@gmail.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Fri, 31 Jul 2009 14:41:40 +0000 (10:41 -0400)]
USB: EHCI: fix counting of transaction error retries
This patch (as1274) simplifies the counting of transaction-error
retries. Now we will count up from 0 to QH_XACTERR_MAX instead of
down from QH_XACTERR_MAX to 0.
The patch also fixes a small bug: qh->xacterr was not getting
initialized for interrupt endpoints.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Matthijs Kooijman <matthijs@stdin.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Fri, 31 Jul 2009 14:40:22 +0000 (10:40 -0400)]
USB: EHCI: fix two new bugs related to Clear-TT-Buffer
This patch (as1273) fixes two(!) bugs introduced by the new
Clear-TT-Buffer implementation in ehci-hcd.
It is now possible for an idle QH to have some URBs on its
queue -- this will happen if a Clear-TT-Buffer is pending for
the QH's endpoint. Consequently we should not issue a warning
when someone tries to unlink an URB from an idle QH; instead
we should process the request immediately.
The refcounts for QHs could get messed up, because
submit_async() would increment the refcount when calling
qh_link_async() and qh_link_async() would then refuse to link
the QH into the schedule if a Clear-TT-Buffer was pending.
Instead we should increment the refcount only when the QH
actually is added to the schedule. The current code tries to
be clever by leaving the refcount alone if an unlink is
immediately followed by a relink; the patch changes this to an
unconditional decrement and increment (although they occur in
the opposite order).
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: David Brownell <david-b@pacbell.net>
Tested-by: Manuel Lauss <manuel.lauss@gmail.com>
Tested-by: Matthijs Kooijman <matthijs@stdin.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Thu, 30 Jul 2009 19:28:14 +0000 (15:28 -0400)]
USB: usbfs: fix -ENOENT error code to be -ENODEV
This patch (as1272) changes the error code returned when an open call
for a USB device node fails to locate the corresponding device. The
appropriate error code is -ENODEV, not -ENOENT.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Kay Sievers <kay.sievers@vrfy.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Gupta, Ajay Kumar [Wed, 29 Jul 2009 06:28:57 +0000 (11:58 +0530)]
USB: musb: fix the nop registration for OMAP3EVM
OMAP3EVM uses ISP1504 phy which doesn't require any programming and
thus has to use NOP otg transceiver.
Cleanups being done:
- Remove unwanted code in usb-musb.c file
- Register NOP in OMAP3EVM board file using
usb_nop_xceiv_register().
- Select NOP_USB_XCEIV for OMAP3EVM boards.
- Don't enable TWL4030_USB in omap3_evm_defconfig
Signed-off-by: Ajay Kumar Gupta <ajay.gupta@ti.com>
Signed-off-by: Eino-Ville Talvala <talvala@stanford.edu>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Michael Buesch [Wed, 29 Jul 2009 09:39:03 +0000 (11:39 +0200)]
USB: devio: Properly do access_ok() checks
access_ok() checks must be done on every part of the userspace structure
that is accessed. If access_ok() on one part of the struct succeeded, it
does not imply it will succeed on other parts of the struct. (Does
depend on the architecture implementation of access_ok()).
This changes the __get_user() users to first check access_ok() on the
data structure.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Cc: stable <stable@kernel.org>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Khanh-Dang Nguyen Thu Lam [Tue, 28 Jul 2009 17:41:17 +0000 (19:41 +0200)]
USB: pl2303: New vendor and product id
I am submitting a patch for the pl2303 driver. This patch adds support
for the "Sony QN-3USB" cable (vendor=0x054c, product=0x0437). This USB
cable is a so-called data cable used to connect a Sony mobile phone to a
computer. Supported models are Sony CMD-J5, J6, J7, J16, J26, J70 and
Z7.
I have used this patch with my Sony CMD-J70 for several days and I
haven't encountered any kernel/hardware issue.
From: Khanh-Dang Nguyen Thu Lam <kdntl@yahoo.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Yan Zheng [Fri, 7 Aug 2009 17:51:33 +0000 (13:51 -0400)]
Btrfs: fix balancing oops when invalidate_inode_pages2 returns EBUSY
invalidate_inode_pages2_range may return -EBUSY occasionally
which results Oops. This patch fixes the issue by moving
invalidate_inode_pages2_range into a loop and keeping calling
it until the return value is not -EBUSY.
The EBUSY return is temporary, and can happen when the btrfs release page
function is unable to release a page because the EXTENT_LOCK
bit is set.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Julia Lawall [Fri, 7 Aug 2009 17:51:33 +0000 (13:51 -0400)]
Btrfs: correct error-handling zlib error handling
find_zlib_workspace returns an ERR_PTR value in an error case instead of NULL.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@match exists@
expression x, E;
statement S1, S2;
@@
x = find_zlib_workspace(...)
... when != x = E
(
* if (x == NULL || ...) S1 else S2
|
* if (x == NULL && ...) S1 else S2
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Bartlomiej Zolnierkiewicz [Fri, 7 Aug 2009 17:47:08 +0000 (13:47 -0400)]
Btrfs: remove superfluous NULL pointer check in btrfs_rename()
This takes care of the following entry from Dan's list:
fs/btrfs/inode.c +4788 btrfs_rename(36) warning: variable derefenced before check 'old_inode'
Reported-by: Dan Carpenter <error27@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Julia Lawall <julia@diku.dk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Linus Torvalds [Fri, 7 Aug 2009 17:46:51 +0000 (10:46 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm: (30 commits)
ARM: 5639/1: arm: clkdev.c should include <linux/clk.h>
ARM: 5638/1: arch/arm/kernel/signal.c: use correct address space for CRUNCH
ARM: 5637/1: [KS8695] Don't reference CLOCK_TICK_RATE in drivers
ARM: S3C64XX: serial: Fix section mismatch warning
ARM: S3C24XX: serial: Fix section mismatch warnings
ARM: S3C: PWM fix for low duty cycle
ARM: 5597/1: [PCI] reset all internal hardware prior PCI initialization
ARM: 5627/1: Fix restoring of lr at the end of mcount
ARM: 5624/1: Document cache aliasing region
S3C64XX: Fix ARMCLK configuration
S3C64XX: Fix get_rate() for ARMCLK
S3C24XX: GPIO: Fix pin range check in s3c_gpiolib_getchip
mx3 defconfig update
mx27 defconfig update
ARM: 5623/1: Treo680: ir shutdown typo fix
ARM: includecheck fix: plat-stmp3xxx/pinmux.c
ARM: includecheck fix: plat-s3c64xx/pm.c
ARM: includecheck fix: mach-omap2/mcbsp.c
ARM: includecheck fix: mach-omap1/mcbsp.c
ARM: includecheck fix: board-sffsdr.c
...
Linus Torvalds [Fri, 7 Aug 2009 17:46:27 +0000 (10:46 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
net: Fix spinlock use in alloc_netdev_mq()
Linus Torvalds [Fri, 7 Aug 2009 17:46:09 +0000 (10:46 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] KVM: Read buffer overflow
[S390] kernel: Storing machine flags early in lowcore
Linus Torvalds [Fri, 7 Aug 2009 17:44:11 +0000 (10:44 -0700)]
Merge git://git.infradead.org/~dwmw2/iommu-2.6.31
* git://git.infradead.org/~dwmw2/iommu-2.6.31:
intel-iommu: Fix enabling snooping feature by mistake
intel-iommu: Mask physical address to correct page size in intel_map_single()
intel-iommu: Correct sglist size calculation.
Linus Torvalds [Fri, 7 Aug 2009 17:43:07 +0000 (10:43 -0700)]
Merge branch 'perfcounters-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix double list iteration in per task precise stats
perf: Auto-detect libelf
perf symbol: Fix symbol parsing in certain cases: use the build-id as a symlink
perf_counter/powerpc: Check oprofile_cpu_type for NULL before using it
ftrace: Fix perf-tracepoint OOPS
perf report: Add missing command line options to man page
perf: Auto-detect libbfd
perf report: Make --sort comm,dso,symbol the default
Linus Torvalds [Fri, 7 Aug 2009 17:42:31 +0000 (10:42 -0700)]
Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
jffs2: Fix return value from jffs2_do_readpage_nolock()
mtd: mtdblock: introduce mtdblks_lock
mtd: remove 'SBC8240 Wind River' Device Driver Code
mtd: OneNAND: OMAP2/3: free GPMC CS on module removal
mtd: OneNAND: fix incorrect bufferram offset
mtd: blkdevs: do not forget to get MTD devices
mtd: fix the conversion from dev to mtd_info
mtd: let include/linux/mtd/partitions.h stand on its own
Linus Torvalds [Fri, 7 Aug 2009 17:42:14 +0000 (10:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: matrix_keypad - make matrix keymap size dynamic
Input: wistron_btns - support Prestigio Wifi RF kill button
Input: i8042 - add Asus G1S to noloop exception list
Linus Torvalds [Fri, 7 Aug 2009 17:41:36 +0000 (10:41 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: setup MC/VRAM the same way for suspend/resume
drm/radeon/kms: Fix caching mode selection for GTT object
Linus Torvalds [Thu, 6 Aug 2009 22:09:34 +0000 (15:09 -0700)]
flat: fix uninitialized ptr with shared libs
The new credentials code broke load_flat_shared_library() as it now uses
an uninitialized cred pointer.
Reported-by: Bernd Schmidt <bernds_cb1@t-online.de>
Tested-by: Bernd Schmidt <bernds_cb1@t-online.de>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Albin Tonnerre [Thu, 6 Aug 2009 22:09:32 +0000 (15:09 -0700)]
lib/decompress_*: only include <linux/slab.h> if STATIC is not defined
These includes were added by
079effb6933f34b9b1b67b08bd4fd7fb672d16ef
("kmemtrace, kbuild: fix slab.h dependency problem in
lib/decompress_inflate.c") to fix the build when using kmemtrace. However
this is not necessary when used to create a compressed kernel, and
actually creates issues (brings a lot of things unavailable in the
decompression environment), so don't include it if STATIC is defined.
Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Phillip Lougher [Thu, 6 Aug 2009 22:09:31 +0000 (15:09 -0700)]
bzip2/lzma: remove nasty uncompressed size hack in pre-boot environment
decompress_bunzip2 and decompress_unlzma have a nasty hack that subtracts
4 from the input length if being called in the pre-boot environment.
This is a nasty hack because it relies on the fact that flush = NULL only
when called from the pre-boot environment (i.e.
arch/x86/boot/compressed/misc.c). initramfs.c/do_mounts_rd.c pass in a
flush buffer (flush != NULL).
This hack prevents the decompressors from being used with flush = NULL by
other callers unless knowledge of the hack is propagated to them.
This patch removes the hack by making decompress (called only from the
pre-boot environment) a wrapper function that subtracts 4 from the input
length before calling the decompressor.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Phillip Lougher [Thu, 6 Aug 2009 22:09:30 +0000 (15:09 -0700)]
bzip2/lzma/gzip: fix comments describing decompressor API
Fix and improve comments in decompress/generic.h that describe the
decompressor API. Also remove an unused definition, and rename INBUF_LEN
in lib/decompress_inflate.c to conform to bzip2/lzma naming.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Thu, 6 Aug 2009 22:09:28 +0000 (15:09 -0700)]
execve: must clear current->clear_child_tid
While looking at Jens Rosenboom bug report
(http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
a dying "ps" program, we found following problem.
clone() syscall has special support for TID of created threads. This
support includes two features.
One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
TID value.
One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
thread dies.
The integer location is a user provided pointer, provided at clone()
time.
kernel keeps this pointer value into current->clear_child_tid.
At execve() time, we should make sure kernel doesnt keep this user
provided pointer, as full user memory is replaced by a new one.
As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
memory in forked processes.
Following sequence could happen:
1) bash (or any program) starts a new process, by a fork() call that
glibc maps to a clone( ... CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
...) syscall
2) When new process starts, its current->clear_child_tid is set to a
location that has a meaning only in bash (or initial program) context
(&THREAD_SELF->tid)
3) This new process does the execve() syscall to start a new program.
current->clear_child_tid is left unchanged (a non NULL value)
4) If this new program creates some threads, and initial thread exits,
kernel will attempt to clear the integer pointed by
current->clear_child_tid from mm_release() :
if (tsk->clear_child_tid
&& !(tsk->flags & PF_SIGNALED)
&& atomic_read(&mm->mm_users) > 1) {
u32 __user * tidptr = tsk->clear_child_tid;
tsk->clear_child_tid = NULL;
/*
* We don't check the error code - if userspace has
* not set up a proper pointer then tough luck.
*/
<< here >> put_user(0, tidptr);
sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
}
5) OR : if new program is not multi-threaded, but spied by /proc/pid
users (ps command for example), mm_users > 1, and the exiting program
could corrupt 4 bytes in a persistent memory area (shm or memory mapped
file)
If current->clear_child_tid points to a writeable portion of memory of the
new program, kernel happily and silently corrupts 4 bytes of memory, with
unexpected effects.
Fix is straightforward and should not break any sane program.
Reported-by: Jens Rosenboom <jens@mcbone.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sonny Rao <sonnyrao@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>