perf/x86/intel/cqm: Return cached counter value from IRQ context
Peter reported the following potential crash which I was able to
reproduce with his test program,
[ 148.765788] ------------[ cut here ]------------
[ 148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260()
[ 148.765797] Modules linked in:
[ 148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4
[ 148.765803]
ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007
[ 148.765805]
0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000
[ 148.765807]
ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640
[ 148.765809] Call Trace:
[ 148.765810] <NMI> [<
ffffffff818bdfd5>] dump_stack+0x45/0x57
[ 148.765818] [<
ffffffff810e413a>] warn_slowpath_common+0x8a/0xc0
[ 148.765822] [<
ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[ 148.765824] [<
ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[ 148.765825] [<
ffffffff810e422a>] warn_slowpath_null+0x1a/0x20
[ 148.765827] [<
ffffffff811613f6>] smp_call_function_many+0xb6/0x260
[ 148.765829] [<
ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[ 148.765831] [<
ffffffff81161748>] on_each_cpu_mask+0x28/0x60
[ 148.765832] [<
ffffffff8107f6ef>] intel_cqm_event_count+0x7f/0xe0
[ 148.765836] [<
ffffffff811cdd35>] perf_output_read+0x2a5/0x400
[ 148.765839] [<
ffffffff811d2e5a>] perf_output_sample+0x31a/0x590
[ 148.765840] [<
ffffffff811d333d>] ? perf_prepare_sample+0x26d/0x380
[ 148.765841] [<
ffffffff811d3497>] perf_event_output+0x47/0x60
[ 148.765843] [<
ffffffff811d36c5>] __perf_event_overflow+0x215/0x240
[ 148.765844] [<
ffffffff811d4124>] perf_event_overflow+0x14/0x20
[ 148.765847] [<
ffffffff8107e7f4>] intel_pmu_handle_irq+0x1d4/0x440
[ 148.765849] [<
ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[ 148.765853] [<
ffffffff81219bad>] ? vunmap_page_range+0x19d/0x2f0
[ 148.765854] [<
ffffffff81219d11>] ? unmap_kernel_range_noflush+0x11/0x20
[ 148.765859] [<
ffffffff814ce6fe>] ? ghes_copy_tofrom_phys+0x11e/0x2a0
[ 148.765863] [<
ffffffff8109e5db>] ? native_apic_msr_write+0x2b/0x30
[ 148.765865] [<
ffffffff8109e44d>] ? x2apic_send_IPI_self+0x1d/0x20
[ 148.765869] [<
ffffffff81065135>] ? arch_irq_work_raise+0x35/0x40
[ 148.765872] [<
ffffffff811c8d86>] ? irq_work_queue+0x66/0x80
[ 148.765875] [<
ffffffff81075306>] perf_event_nmi_handler+0x26/0x40
[ 148.765877] [<
ffffffff81063ed9>] nmi_handle+0x79/0x100
[ 148.765879] [<
ffffffff81064422>] default_do_nmi+0x42/0x100
[ 148.765880] [<
ffffffff81064563>] do_nmi+0x83/0xb0
[ 148.765884] [<
ffffffff818c7c0f>] end_repeat_nmi+0x1e/0x2e
[ 148.765886] [<
ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[ 148.765888] [<
ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[ 148.765890] [<
ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[ 148.765891] <<EOE>> [<
ffffffff8110ab66>] finish_task_switch+0x156/0x210
[ 148.765898] [<
ffffffff818c1671>] __schedule+0x341/0x920
[ 148.765899] [<
ffffffff818c1c87>] schedule+0x37/0x80
[ 148.765903] [<
ffffffff810ae1af>] ? do_page_fault+0x2f/0x80
[ 148.765905] [<
ffffffff818c1f4a>] schedule_user+0x1a/0x50
[ 148.765907] [<
ffffffff818c666c>] retint_careful+0x14/0x32
[ 148.765908] ---[ end trace
e33ff2be78e14901 ]---
The CQM task events are not safe to be called from within interrupt
context because they require performing an IPI to read the counter value
on all sockets. And performing IPIs from within IRQ context is a
"no-no".
Make do with the last read counter value currently event in
event->count when we're invoked in this context.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Will Auld <will.auld@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>