[SCSI] aacraid: Fix for arrays are going offline in the system. System hangs
authorMahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com>
Tue, 18 Jun 2013 11:32:07 +0000 (17:02 +0530)
committerJames Bottomley <JBottomley@Parallels.com>
Thu, 27 Jun 2013 01:01:42 +0000 (18:01 -0700)
One of the customer had reported that the set of raid logical arrays will
become unavailable (I/O offline) after a long hours of IO stress test.  The OS
wouldn`t be accessible afterwards and require a hard reset.

This driver patch has a fix for race condition between the doorbell and the
circular buffer. The driver is modified to do an extra read after clearing the
doorbell in case there had been a completion posted during the small timing
window.

With this fix, we ran IO stress for ~13 days. There were no IO failures.

Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
drivers/scsi/aacraid/src.c

index 0f56d8d7524ff783709fef8797508ea8ce0bcbcb..7e17107643d48a2066b1b00dbb954d35c75db0ef 100644 (file)
@@ -93,6 +93,9 @@ static irqreturn_t aac_src_intr_message(int irq, void *dev_id)
                        int send_it = 0;
                        extern int aac_sync_mode;
 
+                       src_writel(dev, MUnit.ODR_C, bellbits);
+                       src_readl(dev, MUnit.ODR_C);
+
                        if (!aac_sync_mode) {
                                src_writel(dev, MUnit.ODR_C, bellbits);
                                src_readl(dev, MUnit.ODR_C);