Improve behaviour of spurious IRQ detect
authorAlan Cox <alan@lxorguk.ukuu.org.uk>
Mon, 16 Jul 2007 06:40:55 +0000 (23:40 -0700)
committerLinus Torvalds <torvalds@woody.linux-foundation.org>
Mon, 16 Jul 2007 16:05:46 +0000 (09:05 -0700)
Currently we handle spurious IRQ activity based upon seeing a lot of
invalid interrupts, and we clear things back on the base of lots of valid
interrupts.

Unfortunately in some cases you get legitimate invalid interrupts caused by
timing asynchronicity between the PCI bus and the APIC bus when disabling
interrupts and pulling other tricks.  In this case although the spurious
IRQs are not a problem our unhandled counters didn't clear and they act as
a slow running timebomb.  (This is effectively what the serial port/tty
problem that was fixed by clearing counters when registering a handler
showed up)

It's easy enough to add a second parameter - time.  This means that if we
see a regular stream of harmless spurious interrupts which are not harming
processing we don't go off and do something stupid like disable the IRQ
after a month of running.  OTOH lockups and performance killers show up a
lot more than 10/second

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
include/linux/irq.h
kernel/irq/spurious.c

index 1695054e8c63909a62b8e12d16fe5ca1c8cc625c..44657197fcb03de46c060e7634c73f3e126038d7 100644 (file)
@@ -161,6 +161,7 @@ struct irq_desc {
        unsigned int            wake_depth;     /* nested wake enables */
        unsigned int            irq_count;      /* For detecting broken IRQs */
        unsigned int            irqs_unhandled;
+       unsigned long           last_unhandled; /* Aging timer for unhandled count */
        spinlock_t              lock;
 #ifdef CONFIG_SMP
        cpumask_t               affinity;
index bd9e272d55e9f007984b06e0999f6a37bb7208e1..32b161972fad2cda220f3f4698e9fb565017b6a1 100644 (file)
@@ -172,7 +172,17 @@ void note_interrupt(unsigned int irq, struct irq_desc *desc,
                    irqreturn_t action_ret)
 {
        if (unlikely(action_ret != IRQ_HANDLED)) {
-               desc->irqs_unhandled++;
+               /*
+                * If we are seeing only the odd spurious IRQ caused by
+                * bus asynchronicity then don't eventually trigger an error,
+                * otherwise the couter becomes a doomsday timer for otherwise
+                * working systems
+                */
+               if (jiffies - desc->last_unhandled > HZ/10)
+                       desc->irqs_unhandled = 1;
+               else
+                       desc->irqs_unhandled++;
+               desc->last_unhandled = jiffies;
                if (unlikely(action_ret != IRQ_NONE))
                        report_bad_irq(irq, desc, action_ret);
        }