4.2 Using MSI
-Most of the hard work is done for the driver in the PCI layer. It simply
-has to request that the PCI layer set up the MSI capability for this
+Most of the hard work is done for the driver in the PCI layer. The driver
+simply has to request that the PCI layer set up the MSI capability for this
device.
-4.2.1 pci_enable_msi
+To automatically use MSI or MSI-X interrupt vectors, use the following
+function:
-int pci_enable_msi(struct pci_dev *dev)
+ int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
+ unsigned int max_vecs, unsigned int flags);
-A successful call allocates ONE interrupt to the device, regardless
-of how many MSIs the device supports. The device is switched from
-pin-based interrupt mode to MSI mode. The dev->irq number is changed
-to a new number which represents the message signaled interrupt;
-consequently, this function should be called before the driver calls
-request_irq(), because an MSI is delivered via a vector that is
-different from the vector of a pin-based interrupt.
+which allocates up to max_vecs interrupt vectors for a PCI device. It
+returns the number of vectors allocated or a negative error. If the device
+has a requirements for a minimum number of vectors the driver can pass a
+min_vecs argument set to this limit, and the PCI core will return -ENOSPC
+if it can't meet the minimum number of vectors.
-4.2.2 pci_enable_msi_range
+The flags argument should normally be set to 0, but can be used to pass the
+PCI_IRQ_NOMSI and PCI_IRQ_NOMSIX flag in case a device claims to support
+MSI or MSI-X, but the support is broken, or to pass PCI_IRQ_NOLEGACY in
+case the device does not support legacy interrupt lines.
-int pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec)
+To get the Linux IRQ numbers passed to request_irq() and free_irq() and the
+vectors, use the following function:
-This function allows a device driver to request any number of MSI
-interrupts within specified range from 'minvec' to 'maxvec'.
+ int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
-If this function returns a positive number it indicates the number of
-MSI interrupts that have been successfully allocated. In this case
-the device is switched from pin-based interrupt mode to MSI mode and
-updates dev->irq to be the lowest of the new interrupts assigned to it.
-The other interrupts assigned to the device are in the range dev->irq
-to dev->irq + returned value - 1. Device driver can use the returned
-number of successfully allocated MSI interrupts to further allocate
-and initialize device resources.
+Any allocated resources should be freed before removing the device using
+the following function:
-If this function returns a negative number, it indicates an error and
-the driver should not attempt to request any more MSI interrupts for
-this device.
+ void pci_free_irq_vectors(struct pci_dev *dev);
-This function should be called before the driver calls request_irq(),
-because MSI interrupts are delivered via vectors that are different
-from the vector of a pin-based interrupt.
+If a device supports both MSI-X and MSI capabilities, this API will use the
+MSI-X facilities in preference to the MSI facilities. MSI-X supports any
+number of interrupts between 1 and 2048. In contrast, MSI is restricted to
+a maximum of 32 interrupts (and must be a power of two). In addition, the
+MSI interrupt vectors must be allocated consecutively, so the system might
+not be able to allocate as many vectors for MSI as it could for MSI-X. On
+some platforms, MSI interrupts must all be targeted at the same set of CPUs
+whereas MSI-X interrupts can all be targeted at different CPUs.
-It is ideal if drivers can cope with a variable number of MSI interrupts;
-there are many reasons why the platform may not be able to provide the
-exact number that a driver asks for.
+If a device supports neither MSI-X or MSI it will fall back to a single
+legacy IRQ vector.
-There could be devices that can not operate with just any number of MSI
-interrupts within a range. See chapter 4.3.1.3 to get the idea how to
-handle such devices for MSI-X - the same logic applies to MSI.
+The typical usage of MSI or MSI-X interrupts is to allocate as many vectors
+as possible, likely up to the limit supported by the device. If nvec is
+larger than the number supported by the device it will automatically be
+capped to the supported limit, so there is no need to query the number of
+vectors supported beforehand:
-4.2.1.1 Maximum possible number of MSI interrupts
-
-The typical usage of MSI interrupts is to allocate as many vectors as
-possible, likely up to the limit returned by pci_msi_vec_count() function:
-
-static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec)
-{
- return pci_enable_msi_range(pdev, 1, nvec);
-}
-
-Note the value of 'minvec' parameter is 1. As 'minvec' is inclusive,
-the value of 0 would be meaningless and could result in error.
-
-Some devices have a minimal limit on number of MSI interrupts.
-In this case the function could look like this:
-
-static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec)
-{
- return pci_enable_msi_range(pdev, FOO_DRIVER_MINIMUM_NVEC, nvec);
-}
-
-4.2.1.2 Exact number of MSI interrupts
+ nvec = pci_alloc_irq_vectors(pdev, 1, nvec, 0);
+ if (nvec < 0)
+ goto out_err;
If a driver is unable or unwilling to deal with a variable number of MSI
-interrupts it could request a particular number of interrupts by passing
-that number to pci_enable_msi_range() function as both 'minvec' and 'maxvec'
-parameters:
-
-static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec)
-{
- return pci_enable_msi_range(pdev, nvec, nvec);
-}
-
-Note, unlike pci_enable_msi_exact() function, which could be also used to
-enable a particular number of MSI-X interrupts, pci_enable_msi_range()
-returns either a negative errno or 'nvec' (not negative errno or 0 - as
-pci_enable_msi_exact() does).
-
-4.2.1.3 Single MSI mode
-
-The most notorious example of the request type described above is
-enabling the single MSI mode for a device. It could be done by passing
-two 1s as 'minvec' and 'maxvec':
-
-static int foo_driver_enable_single_msi(struct pci_dev *pdev)
-{
- return pci_enable_msi_range(pdev, 1, 1);
-}
-
-Note, unlike pci_enable_msi() function, which could be also used to
-enable the single MSI mode, pci_enable_msi_range() returns either a
-negative errno or 1 (not negative errno or 0 - as pci_enable_msi()
-does).
-
-4.2.3 pci_enable_msi_exact
-
-int pci_enable_msi_exact(struct pci_dev *dev, int nvec)
-
-This variation on pci_enable_msi_range() call allows a device driver to
-request exactly 'nvec' MSIs.
-
-If this function returns a negative number, it indicates an error and
-the driver should not attempt to request any more MSI interrupts for
-this device.
-
-By contrast with pci_enable_msi_range() function, pci_enable_msi_exact()
-returns zero in case of success, which indicates MSI interrupts have been
-successfully allocated.
-
-4.2.4 pci_disable_msi
-
-void pci_disable_msi(struct pci_dev *dev)
-
-This function should be used to undo the effect of pci_enable_msi_range().
-Calling it restores dev->irq to the pin-based interrupt number and frees
-the previously allocated MSIs. The interrupts may subsequently be assigned
-to another device, so drivers should not cache the value of dev->irq.
-
-Before calling this function, a device driver must always call free_irq()
-on any interrupt for which it previously called request_irq().
-Failure to do so results in a BUG_ON(), leaving the device with
-MSI enabled and thus leaking its vector.
-
-4.2.4 pci_msi_vec_count
-
-int pci_msi_vec_count(struct pci_dev *dev)
-
-This function could be used to retrieve the number of MSI vectors the
-device requested (via the Multiple Message Capable register). The MSI
-specification only allows the returned value to be a power of two,
-up to a maximum of 2^5 (32).
-
-If this function returns a negative number, it indicates the device is
-not capable of sending MSIs.
-
-If this function returns a positive number, it indicates the maximum
-number of MSI interrupt vectors that could be allocated.
-
-4.3 Using MSI-X
-
-The MSI-X capability is much more flexible than the MSI capability.
-It supports up to 2048 interrupts, each of which can be controlled
-independently. To support this flexibility, drivers must use an array of
-`struct msix_entry':
-
-struct msix_entry {
- u16 vector; /* kernel uses to write alloc vector */
- u16 entry; /* driver uses to specify entry */
-};
-
-This allows for the device to use these interrupts in a sparse fashion;
-for example, it could use interrupts 3 and 1027 and yet allocate only a
-two-element array. The driver is expected to fill in the 'entry' value
-in each element of the array to indicate for which entries the kernel
-should assign interrupts; it is invalid to fill in two entries with the
-same number.
-
-4.3.1 pci_enable_msix_range
-
-int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
- int minvec, int maxvec)
-
-Calling this function asks the PCI subsystem to allocate any number of
-MSI-X interrupts within specified range from 'minvec' to 'maxvec'.
-The 'entries' argument is a pointer to an array of msix_entry structs
-which should be at least 'maxvec' entries in size.
-
-On success, the device is switched into MSI-X mode and the function
-returns the number of MSI-X interrupts that have been successfully
-allocated. In this case the 'vector' member in entries numbered from
-0 to the returned value - 1 is populated with the interrupt number;
-the driver should then call request_irq() for each 'vector' that it
-decides to use. The device driver is responsible for keeping track of the
-interrupts assigned to the MSI-X vectors so it can free them again later.
-Device driver can use the returned number of successfully allocated MSI-X
-interrupts to further allocate and initialize device resources.
-
-If this function returns a negative number, it indicates an error and
-the driver should not attempt to allocate any more MSI-X interrupts for
-this device.
-
-This function, in contrast with pci_enable_msi_range(), does not adjust
-dev->irq. The device will not generate interrupts for this interrupt
-number once MSI-X is enabled.
-
-Device drivers should normally call this function once per device
-during the initialization phase.
-
-It is ideal if drivers can cope with a variable number of MSI-X interrupts;
-there are many reasons why the platform may not be able to provide the
-exact number that a driver asks for.
-
-There could be devices that can not operate with just any number of MSI-X
-interrupts within a range. E.g., an network adapter might need let's say
-four vectors per each queue it provides. Therefore, a number of MSI-X
-interrupts allocated should be a multiple of four. In this case interface
-pci_enable_msix_range() can not be used alone to request MSI-X interrupts
-(since it can allocate any number within the range, without any notion of
-the multiple of four) and the device driver should master a custom logic
-to request the required number of MSI-X interrupts.
-
-4.3.1.1 Maximum possible number of MSI-X interrupts
-
-The typical usage of MSI-X interrupts is to allocate as many vectors as
-possible, likely up to the limit returned by pci_msix_vec_count() function:
-
-static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec)
-{
- return pci_enable_msix_range(adapter->pdev, adapter->msix_entries,
- 1, nvec);
-}
-
-Note the value of 'minvec' parameter is 1. As 'minvec' is inclusive,
-the value of 0 would be meaningless and could result in error.
-
-Some devices have a minimal limit on number of MSI-X interrupts.
-In this case the function could look like this:
-
-static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec)
-{
- return pci_enable_msix_range(adapter->pdev, adapter->msix_entries,
- FOO_DRIVER_MINIMUM_NVEC, nvec);
-}
-
-4.3.1.2 Exact number of MSI-X interrupts
-
-If a driver is unable or unwilling to deal with a variable number of MSI-X
-interrupts it could request a particular number of interrupts by passing
-that number to pci_enable_msix_range() function as both 'minvec' and 'maxvec'
-parameters:
-
-static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec)
-{
- return pci_enable_msix_range(adapter->pdev, adapter->msix_entries,
- nvec, nvec);
-}
-
-Note, unlike pci_enable_msix_exact() function, which could be also used to
-enable a particular number of MSI-X interrupts, pci_enable_msix_range()
-returns either a negative errno or 'nvec' (not negative errno or 0 - as
-pci_enable_msix_exact() does).
-
-4.3.1.3 Specific requirements to the number of MSI-X interrupts
-
-As noted above, there could be devices that can not operate with just any
-number of MSI-X interrupts within a range. E.g., let's assume a device that
-is only capable sending the number of MSI-X interrupts which is a power of
-two. A routine that enables MSI-X mode for such device might look like this:
-
-/*
- * Assume 'minvec' and 'maxvec' are non-zero
- */
-static int foo_driver_enable_msix(struct foo_adapter *adapter,
- int minvec, int maxvec)
-{
- int rc;
-
- minvec = roundup_pow_of_two(minvec);
- maxvec = rounddown_pow_of_two(maxvec);
-
- if (minvec > maxvec)
- return -ERANGE;
-
-retry:
- rc = pci_enable_msix_range(adapter->pdev, adapter->msix_entries,
- maxvec, maxvec);
- /*
- * -ENOSPC is the only error code allowed to be analyzed
- */
- if (rc == -ENOSPC) {
- if (maxvec == 1)
- return -ENOSPC;
-
- maxvec /= 2;
-
- if (minvec > maxvec)
- return -ENOSPC;
-
- goto retry;
- }
-
- return rc;
-}
-
-Note how pci_enable_msix_range() return value is analyzed for a fallback -
-any error code other than -ENOSPC indicates a fatal error and should not
-be retried.
-
-4.3.2 pci_enable_msix_exact
-
-int pci_enable_msix_exact(struct pci_dev *dev,
- struct msix_entry *entries, int nvec)
-
-This variation on pci_enable_msix_range() call allows a device driver to
-request exactly 'nvec' MSI-Xs.
-
-If this function returns a negative number, it indicates an error and
-the driver should not attempt to allocate any more MSI-X interrupts for
-this device.
-
-By contrast with pci_enable_msix_range() function, pci_enable_msix_exact()
-returns zero in case of success, which indicates MSI-X interrupts have been
-successfully allocated.
-
-Another version of a routine that enables MSI-X mode for a device with
-specific requirements described in chapter 4.3.1.3 might look like this:
-
-/*
- * Assume 'minvec' and 'maxvec' are non-zero
- */
-static int foo_driver_enable_msix(struct foo_adapter *adapter,
- int minvec, int maxvec)
-{
- int rc;
-
- minvec = roundup_pow_of_two(minvec);
- maxvec = rounddown_pow_of_two(maxvec);
-
- if (minvec > maxvec)
- return -ERANGE;
-
-retry:
- rc = pci_enable_msix_exact(adapter->pdev,
- adapter->msix_entries, maxvec);
-
- /*
- * -ENOSPC is the only error code allowed to be analyzed
- */
- if (rc == -ENOSPC) {
- if (maxvec == 1)
- return -ENOSPC;
-
- maxvec /= 2;
-
- if (minvec > maxvec)
- return -ENOSPC;
-
- goto retry;
- } else if (rc < 0) {
- return rc;
- }
-
- return maxvec;
-}
-
-4.3.3 pci_disable_msix
-
-void pci_disable_msix(struct pci_dev *dev)
-
-This function should be used to undo the effect of pci_enable_msix_range().
-It frees the previously allocated MSI-X interrupts. The interrupts may
-subsequently be assigned to another device, so drivers should not cache
-the value of the 'vector' elements over a call to pci_disable_msix().
-
-Before calling this function, a device driver must always call free_irq()
-on any interrupt for which it previously called request_irq().
-Failure to do so results in a BUG_ON(), leaving the device with
-MSI-X enabled and thus leaking its vector.
-
-4.3.3 The MSI-X Table
-
-The MSI-X capability specifies a BAR and offset within that BAR for the
-MSI-X Table. This address is mapped by the PCI subsystem, and should not
-be accessed directly by the device driver. If the driver wishes to
-mask or unmask an interrupt, it should call disable_irq() / enable_irq().
+interrupts it can request a particular number of interrupts by passing that
+number to pci_alloc_irq_vectors() function as both 'min_vecs' and
+'max_vecs' parameters:
-4.3.4 pci_msix_vec_count
+ ret = pci_alloc_irq_vectors(pdev, nvec, nvec, 0);
+ if (ret < 0)
+ goto out_err;
-int pci_msix_vec_count(struct pci_dev *dev)
+The most notorious example of the request type described above is enabling
+the single MSI mode for a device. It could be done by passing two 1s as
+'min_vecs' and 'max_vecs':
-This function could be used to retrieve number of entries in the device
-MSI-X table.
+ ret = pci_alloc_irq_vectors(pdev, 1, 1, 0);
+ if (ret < 0)
+ goto out_err;
-If this function returns a negative number, it indicates the device is
-not capable of sending MSI-Xs.
+Some devices might not support using legacy line interrupts, in which case
+the PCI_IRQ_NOLEGACY flag can be used to fail the request if the platform
+can't provide MSI or MSI-X interrupts:
-If this function returns a positive number, it indicates the maximum
-number of MSI-X interrupt vectors that could be allocated.
+ nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_NOLEGACY);
+ if (nvec < 0)
+ goto out_err;
-4.4 Handling devices implementing both MSI and MSI-X capabilities
+4.3 Legacy APIs
-If a device implements both MSI and MSI-X capabilities, it can
-run in either MSI mode or MSI-X mode, but not both simultaneously.
-This is a requirement of the PCI spec, and it is enforced by the
-PCI layer. Calling pci_enable_msi_range() when MSI-X is already
-enabled or pci_enable_msix_range() when MSI is already enabled
-results in an error. If a device driver wishes to switch between MSI
-and MSI-X at runtime, it must first quiesce the device, then switch
-it back to pin-interrupt mode, before calling pci_enable_msi_range()
-or pci_enable_msix_range() and resuming operation. This is not expected
-to be a common operation but may be useful for debugging or testing
-during development.
+The following old APIs to enable and disable MSI or MSI-X interrupts should
+not be used in new code:
-4.5 Considerations when using MSIs
+ pci_enable_msi() /* deprecated */
+ pci_enable_msi_range() /* deprecated */
+ pci_enable_msi_exact() /* deprecated */
+ pci_disable_msi() /* deprecated */
+ pci_enable_msix_range() /* deprecated */
+ pci_enable_msix_exact() /* deprecated */
+ pci_disable_msix() /* deprecated */
-4.5.1 Choosing between MSI-X and MSI
+Additionally there are APIs to provide the number of supported MSI or MSI-X
+vectors: pci_msi_vec_count() and pci_msix_vec_count(). In general these
+should be avoided in favor of letting pci_alloc_irq_vectors() cap the
+number of vectors. If you have a legitimate special use case for the count
+of vectors we might have to revisit that decision and add a
+pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
-If your device supports both MSI-X and MSI capabilities, you should use
-the MSI-X facilities in preference to the MSI facilities. As mentioned
-above, MSI-X supports any number of interrupts between 1 and 2048.
-In contrast, MSI is restricted to a maximum of 32 interrupts (and
-must be a power of two). In addition, the MSI interrupt vectors must
-be allocated consecutively, so the system might not be able to allocate
-as many vectors for MSI as it could for MSI-X. On some platforms, MSI
-interrupts must all be targeted at the same set of CPUs whereas MSI-X
-interrupts can all be targeted at different CPUs.
+4.4 Considerations when using MSIs
-4.5.2 Spinlocks
+4.4.1 Spinlocks
Most device drivers have a per-device spinlock which is taken in the
interrupt handler. With pin-based interrupts or a single MSI, it is not
spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
and acquire the lock (see Documentation/DocBook/kernel-locking).
-4.6 How to tell whether MSI/MSI-X is enabled on a device
+4.5 How to tell whether MSI/MSI-X is enabled on a device
Using 'lspci -v' (as root) may show some devices with "MSI", "Message
Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities