irq_work: Force raised irq work to run on irq work interrupt
authorFrederic Weisbecker <fweisbec@gmail.com>
Sat, 16 Aug 2014 16:37:19 +0000 (18:37 +0200)
committerFrederic Weisbecker <fweisbec@gmail.com>
Sat, 13 Sep 2014 16:38:15 +0000 (18:38 +0200)
The nohz full kick, which restarts the tick when any resource depend
on it, can't be executed anywhere given the operation it does on timers.
If it is called from the scheduler or timers code, chances are that
we run into a deadlock.

This is why we run the nohz full kick from an irq work. That way we make
sure that the kick runs on a virgin context.

However if that's the case when irq work runs in its own dedicated
self-ipi, things are different for the big bunch of archs that don't
support the self triggered way. In order to support them, irq works are
also handled by the timer interrupt as fallback.

Now when irq works run on the timer interrupt, the context isn't blank.
More precisely, they can run in the context of the hrtimer that runs the
tick. But the nohz kick cancels and restarts this hrtimer and cancelling
an hrtimer from itself isn't allowed. This is why we run in an endless
loop:

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34
Workqueue: btrfs-endio-write normal_work_helper [btrfs]
 ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37
 ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010
 ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000
Call Trace:
 <NMI[<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a
 [<ffffffff8a7ef928>] panic+0xd4/0x207
 [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120
 [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350
 [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0
 [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
 [<ffffffff8a187934>] perf_event_overflow+0x14/0x20
 [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410
 [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50
 [<ffffffff8a007b72>] nmi_handle+0xd2/0x390
 [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390
 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
 [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0
 [<ffffffff8a008268>] do_nmi+0xb8/0x100
 [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e
 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
 <<EOE><IRQ[<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450
 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
 [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90
 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
 [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50
 [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0
 [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30
 [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90
 [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100
 [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10
 [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70
 [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50
 [<ffffffff8a0f700b>] update_process_times+0x5b/0x70
 [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60
 [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60
 [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470
 [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0
 [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270
 [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60
 [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50
 [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80

To fix this we force non-lazy irq works to run on irq work self-IPIs
when available. That ability of the arch to trigger irq work self IPIs
is available with arch_irq_work_has_interrupt().

Reported-by: Catalin Iacob <iacobcatalin@gmail.com>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
include/linux/irq_work.h
kernel/irq_work.c
kernel/time/timer.c

index 6b47b2ede405c6ced957c5a716a11c90db06ac54..bf3fe719c7ce9d3c0efa3c2449cc0d2a7e5b43ad 100644 (file)
@@ -39,6 +39,7 @@ bool irq_work_queue_on(struct irq_work *work, int cpu);
 #endif
 
 void irq_work_run(void);
+void irq_work_tick(void);
 void irq_work_sync(struct irq_work *work);
 
 #ifdef CONFIG_IRQ_WORK
index e6bcbe756663abd64adf9c416eddee7a91c9a2c9..385b85aded199f5f2be0ee590278b559ac53b136 100644 (file)
@@ -115,8 +115,10 @@ bool irq_work_needs_cpu(void)
 
        raised = &__get_cpu_var(raised_list);
        lazy = &__get_cpu_var(lazy_list);
-       if (llist_empty(raised) && llist_empty(lazy))
-               return false;
+
+       if (llist_empty(raised) || arch_irq_work_has_interrupt())
+               if (llist_empty(lazy))
+                       return false;
 
        /* All work should have been flushed before going offline */
        WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
@@ -171,6 +173,15 @@ void irq_work_run(void)
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
 
+void irq_work_tick(void)
+{
+       struct llist_head *raised = &__get_cpu_var(raised_list);
+
+       if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
+               irq_work_run_list(raised);
+       irq_work_run_list(&__get_cpu_var(lazy_list));
+}
+
 /*
  * Synchronize against the irq_work @entry, ensures the entry is not
  * currently in use.
index aca5dfe2fa3de5df0738b6dc495b2e186483a593..9bbb8344ed3bf675c9605383c0614be37cc4ddb5 100644 (file)
@@ -1385,7 +1385,7 @@ void update_process_times(int user_tick)
        rcu_check_callbacks(cpu, user_tick);
 #ifdef CONFIG_IRQ_WORK
        if (in_irq())
-               irq_work_run();
+               irq_work_tick();
 #endif
        scheduler_tick();
        run_posix_cpu_timers(p);