Adrian Hunter [Tue, 12 Apr 2016 11:25:09 +0000 (14:25 +0300)]
mmc: sdhci: Tidy together LED code
ifdef's make the code more complicated and harder to read.
Move all the LED code together to reduce the ifdef's to
one place.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Adrian Hunter [Tue, 12 Apr 2016 11:25:08 +0000 (14:25 +0300)]
mmc: sdhci: Fix error paths in sdhci_add_host()
Some error paths in sdhci_add_host() simply returned without
cleaning up. Also the return value from mmc_add_host()
was not being checked.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Adrian Hunter [Tue, 12 Apr 2016 11:25:07 +0000 (14:25 +0300)]
mmc: sdhci: Remove redundant condition
The logic '!mmc.f_max || (mmc.f_max && mmc.f_max > max_clk)'
is equivalent to '!mmc.f_max || mmc.f_max > max_clk'.
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Adrian Hunter [Tue, 12 Apr 2016 11:25:06 +0000 (14:25 +0300)]
mmc: sdhci-acpi: Set MMC_CAP_AGGRESSIVE_PM for Broxton controllers
Set MMC_CAP_AGGRESSIVE_PM for Broxton host controllers.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Adrian Hunter [Tue, 12 Apr 2016 11:25:05 +0000 (14:25 +0300)]
mmc: sdhci-pci: Set MMC_CAP_AGGRESSIVE_PM for Broxton controllers
Set MMC_CAP_AGGRESSIVE_PM for Broxton host controllers.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ludovic Desroches [Thu, 7 Apr 2016 09:13:10 +0000 (11:13 +0200)]
mmc: sdhci: Remove SDHCI_QUIRK2_NEED_DELAY_AFTER_INT_CLK_RST
SDHCI_QUIRK2_NEED_DELAY_AFTER_INT_CLK_RST quirk is not used anymore so
remove it.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ludovic Desroches [Thu, 7 Apr 2016 09:13:09 +0000 (11:13 +0200)]
mmc: sdhci-of-at91: Implement specific ->set_clock() function
Disabling the internal clock while configuring the SD card clock can
lead to internal clock stabilization issue and/or unexpected switch to
the base clock when using presets.
A quirk SDHCI_QUIRK2_NEED_DELAY_AFTER_INT_CLK_RST was introduced to fix
these bugs. The cause was assumed to be a too long internal
re-synchronisation but it seems in some cases the delay (even if longer)
doesn't fix this bug. The safest workaround is to not disable/enable the
internal clock during the SD card clock configuration.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ludovic Desroches [Thu, 7 Apr 2016 09:13:08 +0000 (11:13 +0200)]
mmc: sdhci: Introduce sdhci_calc_clk()
In order to remove the SDHCI_QUIRK2_NEED_DELAY_AFTER_INT_CLK_RST and to
reduce code duplication, put the code relative to the SD clock
configuration in a function which can be used by hosts for the
implementation of the ->set_clock() callback.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 11 Apr 2016 13:32:41 +0000 (15:32 +0200)]
mmc: sdhci: Move sdhci_runtime_pm_bus_off|on() to avoid pre-definition
There are no need to have two versions of sdhci_runtime_pm_bus_off|on(),
depending on whether CONFIG_PM is set or unset. Thus it's easy to move the
implementation of these functions a bit earlier to avoid the unnecessary
pre-definition of them, so let's do that.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Masahiro Yamada [Fri, 8 Apr 2016 05:22:37 +0000 (14:22 +0900)]
mmc: sdhci-pic32: remove owner assignment
A platform_driver does not need to set an owner, it will be populated
by the driver core.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Thu, 7 Apr 2016 08:56:39 +0000 (10:56 +0200)]
mmc: sdhci: Remove redundant runtime PM calls
Commit
9250aea76bfc ("mmc: core: Enable runtime PM management of host
devices"), made some calls to the runtime PM API from the driver
redundant. Especially those which deals with runtime PM reference
counting, so let's remove them.
Moreover as SDHCI have its own wrapper functions for runtime PM these
becomes superfluous, so let's remove them as well.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Gwendal Grignou [Fri, 1 Apr 2016 23:04:22 +0000 (16:04 -0700)]
mmc: core: Do regular power cycle when lacking eMMC HW reset support
The eMMC HW reset may be implemented either via the host ops ->hw_reset()
callback or through DT and the eMMC pwrseq. Additionally some eMMC cards
don't support HW reset.
To allow a reset to be done for the different combinations of mmc hosts
and eMMC/MMC cards, let's implement a fallback via trying a regular power
cycle. This improves the mmc block layer retry mechanism of failing I/O
requests.
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
[Ulf: Rewrote changelog]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 21 Mar 2016 14:43:41 +0000 (15:43 +0100)]
mmc: tmio: Remove redundant runtime PM calls
Commit
9250aea76bfc ("mmc: core: Enable runtime PM management of host
devices"), made some calls to the runtime PM API from the driver
redundant. Especially those which deals with runtime PM reference
counting, so let's remove them.
Cc: Ian Molton <ian@mnementh.co.uk>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 21 Mar 2016 13:40:07 +0000 (14:40 +0100)]
mmc: sdhci-pci: Remove redundant runtime PM calls
Commit
9250aea76bfc ("mmc: core: Enable runtime PM management of host
devices"), made some calls to the runtime PM API from the driver
redundant. Especially those which deals with runtime PM reference
counting, so let's remove them.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 21 Mar 2016 13:33:35 +0000 (14:33 +0100)]
mmc: sdhci-acpi: Remove redundant runtime PM calls
Commit
9250aea76bfc ("mmc: core: Enable runtime PM management of host
devices"), made some calls to the runtime PM API from the driver
redundant. Especially those which deals with runtime PM reference
counting, so let's remove them.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 21 Mar 2016 13:28:36 +0000 (14:28 +0100)]
mmc: omap_hsmmc: Remove redundant runtime PM calls
Commit
9250aea76bfc ("mmc: core: Enable runtime PM management of host
devices"), made some calls to the runtime PM API from the driver
redundant. Especially those which deals with runtime PM reference
counting, so let's remove them.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 21 Mar 2016 13:21:25 +0000 (14:21 +0100)]
mmc: mediatek: Remove redundant runtime PM calls
Commit
9250aea76bfc ("mmc: core: Enable runtime PM management of host
devices"), made some calls to the runtime PM API from the driver
redundant. Especially those which deals with runtime PM reference
counting, so let's remove them.
Cc: Chaotian Jing <chaotian.jing@mediatek.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 21 Mar 2016 13:17:00 +0000 (14:17 +0100)]
mmc: mmci: Remove redundant runtime PM calls
Commit
9250aea76bfc ("mmc: core: Enable runtime PM management of host
devices"), made some calls to the runtime PM API from the driver
redundant. Especially those which deals with runtime PM reference
counting, so let's remove them.
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 21 Mar 2016 13:09:07 +0000 (14:09 +0100)]
mmc: atmel-mci: Remove redundant runtime PM calls
Commit
9250aea76bfc ("mmc: core: Enable runtime PM management of host
devices"), made some calls to the runtime PM API from the driver
redundant. Especially those which deals with runtime PM reference
counting, so let's remove them.
Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Tested-by: Ludovic Desroches <ludovic.desroches@atmel.com>
David Lechner [Tue, 5 Apr 2016 17:31:51 +0000 (12:31 -0500)]
ARM: davinci: remove mmc dma resources
The davinci_mmc driver no longer uses platform resources for getting dma
channels. Instead lookup is now done using dma_slave_map.
Signed-off-by: David Lechner <david@lechnology.com>
Acked-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
David Lechner [Tue, 5 Apr 2016 17:31:50 +0000 (12:31 -0500)]
mmc: davinci: prepare clock
When trying to use this driver with the common clock framework, enabling
the clock fails because it was not prepared. This fixes the problem by
calling clk_prepare and clk_enable in a single function. Ditto for
clk_disable_unprepare.
Signed-off-by: David Lechner <david@lechnology.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
David Lechner [Tue, 5 Apr 2016 17:31:49 +0000 (12:31 -0500)]
mmc: davinci: fix unwinding in probe
Unwiding from an error in davinci_mmcsd_probe was a mess. Some errors were
not handled and not all paths unwound correctly. Also using devm_ where
possible to simplify things.
Signed-off-by: David Lechner <david@lechnology.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Shawn Lin [Thu, 31 Mar 2016 07:34:10 +0000 (15:34 +0800)]
mmc: dw_mmc: remove setup_clock callback
Now, no dw_mmc variant drivers use this callback, let's
remove it.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Shawn Lin [Thu, 31 Mar 2016 07:34:02 +0000 (15:34 +0800)]
mmc: dw_mmc-exynos: remove dw_mci_exynos_setup_clock
We combine what dw_mci_exynos_setup_clock does with init
hook to simplify the code
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Shawn Lin [Thu, 31 Mar 2016 07:33:53 +0000 (15:33 +0800)]
mmc: dw_mmc-rockchip: remove setup_clock for rockchip
We remove setup_clock hook and combine it into
init hook to simplify the code
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Jaehoon Chung [Thu, 31 Mar 2016 05:53:18 +0000 (14:53 +0900)]
mmc: dw_mmc: exynos: add the function for controlling SMU
Some of Exynos has the Security management Unit(SMU).
This patch adds the function for controlling SMU.
In future, if exynos needs to control SMU, it can be implemented
in "config_smu" function, not "init" function.
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Shawn Lin [Wed, 3 Feb 2016 03:26:04 +0000 (11:26 +0800)]
mmc: dw_mmc: remove unused EVENT_XFER_ERROR
EVENT_XFER_ERROR isn't been used now, so it can be removed.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Shawn Lin [Wed, 9 Mar 2016 02:34:46 +0000 (10:34 +0800)]
mmc: dw_mmc: avoid using dmaengine_terminate_all
dmaengine_terminate_all is deprecated and should be
replaced by more explicit synchronous and asynchronous
terminate functions. This change is based on the
commit
b36f09c3c441 ("dmaengine: Add transfer termination
synchronization support"). Currently dw_mci_stop_dma
may be called under the spinlock, let's migrate
dmaengine_terminate_all to async terminate. This could
avoid the race condition of use-after-free resouce of
dmaengine once slave-dma driver implement the synchronize
method.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Shawn Lin [Wed, 9 Mar 2016 02:33:55 +0000 (10:33 +0800)]
mmc: dw_mmc: fix warning reported by kernel-doc
Try to fix the warning reported by:
scripts/kernel-doc -man -v include/linux/mmc/dw_mmc.h > /dev/null
warning: No description found for parameter 'irq_lock'
warning: No description found for parameter 'stop_abort'
warning: No description found for parameter 'prev_blksz'
warning: No description found for parameter 'timing'
warning: No description found for parameter 'ring_size'
warning: No description found for parameter 'dms'
warning: No description found for parameter 'phy_regs'
warning: No description found for parameter 'fifoth_val'
warning: No description found for parameter 'vqmmc_enabled'
warning: No description found for parameter 'cmd11_timer'
warning: Excess struct/union/enum/typedef member 'card_tasklet'
description in 'dw_mci'
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Shawn Lin [Tue, 1 Mar 2016 07:12:53 +0000 (15:12 +0800)]
mmc: dw_mmc-rockchip: fix failing to mount partition with "discard"
Without MMC_CAP_ERASE support, we fail to mount partition
with "discard" option since mmc_queue_setup_discard is limited
for checking mmc_can_erase. Without doing mmc_queue_setup_discard,
blk_queue_discard fails to test QUEUE_FLAG_DISCARD flag, so we get
the following log from f2fs(actually similar to other file system):
mounting with "discard" option, but the device does not support discard
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Shawn Lin [Wed, 3 Feb 2016 03:26:44 +0000 (11:26 +0800)]
mmc: dw_mmc-rockchip: remove dw_mci_rockchip_pmops
dw_mci_rockchip_pmops just copy-paste what dw_mci_pltfm_pmops
have done. Let's remove it.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Wolfram Sang [Fri, 1 Apr 2016 15:44:37 +0000 (17:44 +0200)]
mmc: sh_mobile_sdhi: Add UHS-I mode support
Implement voltage switch, supporting modes up to SDR-50.
Based on work by Shinobu Uehara, Rob Taylor, William Towle and Ian Molton.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Wolfram Sang [Fri, 1 Apr 2016 15:44:36 +0000 (17:44 +0200)]
mmc: host: add note that set_ios needs to handle 0Hz properly
While here, refactor the comments so that they are before the
declaration they are referring to.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Wolfram Sang [Fri, 1 Apr 2016 15:44:35 +0000 (17:44 +0200)]
mmc: tmio: stop clock when 0Hz is requested
Setting frequency to 0 is not enough, the clock explicitly has to be
disabled. Otherwise voltage switching (which needs SDCLK to be quiet)
fails for various cards.
Because we now do the 'new_clock == 0' check right at the beginning,
the indentation level of the rest of the code can be decreased a little.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Wolfram Sang [Fri, 1 Apr 2016 15:44:34 +0000 (17:44 +0200)]
mmc: tmio: always start clock after frequency calculation
Starting the clock is always done after frequency change anyhow, so we can
do it directly after the clock calculation and remove the specific calls.
This is the first part of doing proper clock de-/activation at calculation
time.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Wolfram Sang [Fri, 1 Apr 2016 15:44:33 +0000 (17:44 +0200)]
mmc: tmio: Add UHS-I mode support
Based on work by Shinobu Uehara and Ben Dooks. This adds the voltage
switch operation needed for all UHS-I modes, but not the tuning needed
for SDR-104 which will come later.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ben Hutchings [Fri, 1 Apr 2016 15:44:32 +0000 (17:44 +0200)]
mmc: tmio, sh_mobile_sdhi: Add support for variable input clock frequency
Currently tmio_mmc assumes that the input clock frequency is fixed and
only its own clock divider can be changed. This is not true in the
case of sh_mobile_sdhi; we can use the clock API to change it.
In tmio_mmc:
- Delegate setting of f_min from tmio to the clk_enable operation (if
implemented), as it can be smaller than f_max / 512
- Add an optional clk_update operation called from tmio_mmc_set_clock()
that updates the input clock frequency
- Rename tmio_mmc_clk_update() to tmio_mmc_clk_enable(), to avoid
confusion with the clk_update operation
In sh_mobile_sdhi:
- Make the setting of f_max conditional; it should be set through the
max-frequency property in the device tree in future
- Set f_min based on the input clock's minimum frequency
- Implement the clk_update operation, selecting the best input clock
frequency for the bus frequency that's wanted
sh_mobile_sdhi_clk_update() is loosely based on Kuninori Morimoto's work
in sh_mmcif.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ben Hutchings [Fri, 1 Apr 2016 15:44:31 +0000 (17:44 +0200)]
mmc: tmio, sh_mobile_sdhi: Pass tmio_mmc_host ptr to clk_{enable, disable} ops
Change the clk_enable operation to take a pointer to the struct
tmio_mmc_host and have it set f_max. For consistency, also change the
clk_disable operation to take a pointer to struct tmio_mmc_host.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Baolin Wang [Thu, 31 Mar 2016 03:16:27 +0000 (11:16 +0800)]
mmc: core: Provide tracepoints for request processing
This patch provides some tracepoints for the lifecycle of a mmc request
from starting to completion to help with performance analysis of MMC
subsystem.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Andreas Fenkart [Sun, 20 Mar 2016 23:58:08 +0000 (00:58 +0100)]
mmc: omap_hsmmc: pass omap_hsmmc_host pointer directly
unnecessary indirection via 'struct device' back to omap_hsmmc_host
Signed-off-by: Andreas Fenkart <afenkart@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
David Lechner [Thu, 17 Mar 2016 03:45:32 +0000 (22:45 -0500)]
mmc: davinci: remove matching string
The string "MMCSDCLK" is not actually used for clock lookup, so can be
removed.
Signed-off-by: David Lechner <david@lechnology.com>
Acked-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Peter Ujfalusi [Thu, 17 Mar 2016 03:45:31 +0000 (22:45 -0500)]
mmc: davinci_mmc: Use dma_request_chan() to requesting DMA channel
With the new dma_request_chan() the client driver does not need to look for
the DMA resource and it does not need to pass filter_fn anymore.
By switching to the new API the driver can now support deferred probing
against DMA.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Thu, 11 Feb 2016 12:59:55 +0000 (13:59 +0100)]
mmc: sh_mmci: Get rid of wrapper function for regulators
As there are two callers of sh_mmcif_set_power() and because its only
additional action is to check for a valid regulator, let's just remove it.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Thu, 11 Feb 2016 12:59:54 +0000 (13:59 +0100)]
mmc: sh_mmcif: Restructure ->set_ios()
Both from a runtime PM and clock management point of view, the ->set_ios()
code is unnecessary complex.
A suboptimal path is also executed when the mmc core requests a clock rate
of zero. As that happens during the card initialization phase, trying to
save power by decreasing the runtime PM usage count and gating the clock
via clk_disable_unprepare() is just superfluous.
Moreover, from a runtime PM point of view the core will anyway keep the
device active during the entire card initialization phase.
Restructure the code to rely on the ios->power_mode to understand when the
runtime PM usage count needs to be increased. Let's also deal with clock
rate changes by simply applying the rate.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Thu, 11 Feb 2016 12:59:53 +0000 (13:59 +0100)]
mmc: sh_mmcif: Make sure the device stays active when needed in ->probe()
While accessing the device, make sure it stays active by increasing the
runtime PM usage count for it.
Let's also defer to enable runtime PM until we really need access to the
device. This also enables the error path in ->probe() to become simpler.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Linus Torvalds [Sun, 1 May 2016 22:52:31 +0000 (15:52 -0700)]
Linux 4.6-rc6
Linus Torvalds [Sun, 1 May 2016 01:57:42 +0000 (18:57 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal fixes from Eduardo Valentin:
"A couple of minor fixes for the thermal subsystem.
Specifics in this pull request:
- Fixes in hisilicon thermal driver
- More fixes of unsigned to int type change in thermal_core.c"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: use %d to print S32 parameters
thermal: hisilicon: increase temperature resolution
Linus Torvalds [Sat, 30 Apr 2016 01:50:08 +0000 (18:50 -0700)]
Merge tag 'powerpc-4.6-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A few more powerpc fixes for 4.6:
- cxl: Keep IRQ mappings on context teardown from Michael Neuling
- cxl: Poll for outstanding IRQs when detaching a context from
Michael Neuling
- Wire up preadv2 and pwritev2 syscalls from Rui Salvaterra"
* tag 'powerpc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: wire up preadv2 and pwritev2 syscalls
cxl: Poll for outstanding IRQs when detaching a context
cxl: Keep IRQ mappings on context teardown
Linus Torvalds [Sat, 30 Apr 2016 00:59:26 +0000 (17:59 -0700)]
Merge tag 'edac_fix_for_4.6' of git://git./linux/kernel/git/bp/bp
Pull EDAC fix from Borislav Petkov:
"Make sure sb_edac and i7core_edac do not terminate MCE processing on
the decoding callchain prematurely"
* tag 'edac_fix_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
Linus Torvalds [Sat, 30 Apr 2016 00:39:51 +0000 (17:39 -0700)]
Merge tag 'pm+acpi-4.6-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"One revert of a recent cpufreq commit that introduced a regression and
a fix for intel_pstate's Turbo Activation Ratio handling code.
Specifics:
- Revert cpufreq commit that attempted to fix a problem in the
ondemand/conservative governor code, but did that incorrectly and
introduced another problem instead (Rafael Wysocki).
- Fix incorrect decoding of MSR contents related to the Turbo
Activation Ratio (TAR) handling in the intel_pstate driver
(Srinivas Pandruvada)"
* tag 'pm+acpi-4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
Linus Torvalds [Sat, 30 Apr 2016 00:32:19 +0000 (17:32 -0700)]
Merge tag 'mmc-v4.6-rc4' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a two MMC host fixes:
- sdhci-acpi: Reduce Baytrail eMMC/SD/SDIO hangs
- sunxi: Disable eMMC HS-DDR for Allwinner A80"
* tag 'mmc-v4.6-rc4' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: sunxi: Disable eMMC HS-DDR (MMC_CAP_1_8V_DDR) for Allwinner A80
mmc: sdhci-acpi: Reduce Baytrail eMMC/SD/SDIO hangs
Linus Torvalds [Sat, 30 Apr 2016 00:18:55 +0000 (17:18 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A few fixes all over the place:
radeon is probably the biggest standout, it's a fix for screen
corruption or hung black outputs so I thought it was worth pulling in.
Otherwise some amdgpu power control fixes, some misc vmwgfx fixes, one
etnaviv fix, one virtio-gpu fix, two DP MST fixes, and a single TTM
fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/vmwgfx: Fix order of operation
drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
drm/amdgpu: disable vm interrupts with vm_fault_stop=2
drm/amdgpu: print a message if ATPX dGPU power control is missing
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
drm/radeon: fix vertical bars appear on monitor (v2)
drm/ttm: fix kref count mess in ttm_bo_move_to_lru_tail
drm/virtio: send vblank event after crtc updates
drm/dp/mst: Restore primary hub guid on resume
drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()
drm/etnaviv: don't move linear memory window on 3D cores without MC2.0
Linus Torvalds [Sat, 30 Apr 2016 00:07:54 +0000 (17:07 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Final set of -rc fixes for 4.6.
I've collected up a number of patches that are all pretty small with
the exception of only a couple. The hfi1 driver has a number of
important patches, and it is what really drives the line count of this
pull request up. These are all small and I've got this kernel built
and running in the test lab (I have most of the hardware, I think nes
is the only thing in this patch set that I can't say I've personally
tested and have up and running).
Summary:
- A number of collected fixes for oopses, memory corruptions,
deadlocks, etc. All of these fixes are small (many only 5-10
lines), obvious, and tested.
- Fix for the security issue related to the use of write for
bi-directional communications"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
RDMA/nes: don't leak skb if carrier down
IB/security: Restrict use of the write() interface
IB/hfi1: Use kernel default llseek for ui device
IB/hfi1: Don't attempt to free resources if initialization failed
IB/hfi1: Fix missing lock/unlock in verbs drain callback
IB/rdmavt: Fix send scheduling
IB/hfi1: Prevent unpinning of wrong pages
IB/hfi1: Fix deadlock caused by locking with wrong scope
IB/hfi1: Prevent NULL pointer deferences in caching code
MAINTAINERS: Update iser/isert maintainer contact info
IB/mlx5: Expose correct max_sge_rd limit
RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
iw_cxgb4: handle draining an idle qp
iw_cxgb3: initialize ibdev.iwcm->ifname for port mapping
iw_cxgb4: initialize ibdev.iwcm->ifname for port mapping
IB/core: Don't drain non-existent rq queue-pair
IB/core: Fix oops in ib_cache_gid_set_default_gid
Linus Torvalds [Fri, 29 Apr 2016 18:21:22 +0000 (11:21 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"20 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
Documentation/sysctl/vm.txt: update numa_zonelist_order description
lib/stackdepot.c: allow the stack trace hash to be zero
rapidio: fix potential NULL pointer dereference
mm/memory-failure: fix race with compound page split/merge
ocfs2/dlm: return zero if deref_done message is successfully handled
Ananth has moved
kcov: don't profile branches in kcov
kcov: don't trace the code coverage code
mm: wake kcompactd before kswapd's short sleep
.mailmap: add Frank Rowand
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: call swap_slot_free_notify() with page lock held
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
mailmap: fix Krzysztof Kozlowski's misspelled name
thp: keep huge zero page pinned until tlb flush
mm: exclude HugeTLB pages from THP page_mapped() logic
kexec: export OFFSET(page.compound_head) to find out compound tail page
kexec: update VMCOREINFO for compound_order/dtor
Tony Luck [Fri, 29 Apr 2016 13:42:25 +0000 (15:42 +0200)]
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Rafael J. Wysocki [Fri, 29 Apr 2016 12:22:25 +0000 (14:22 +0200)]
Merge branch 'pm-cpufreq-fixes'
* pm-cpufreq-fixes:
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
Dave Airlie [Fri, 29 Apr 2016 04:31:44 +0000 (14:31 +1000)]
Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few fixes for 4.6.
- revert amdgpu PX commit that was previously reverted on the radeon side
- cleaned up version of the NI+ MC update display fix for radeon
- TTM kref fix
* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: disable vm interrupts with vm_fault_stop=2
drm/amdgpu: print a message if ATPX dGPU power control is missing
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
drm/radeon: fix vertical bars appear on monitor (v2)
drm/ttm: fix kref count mess in ttm_bo_move_to_lru_tail
Dave Airlie [Fri, 29 Apr 2016 04:27:50 +0000 (14:27 +1000)]
Merge branch 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux into drm-fixes
three misc vmwgfx fixes
* 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux:
drm/vmwgfx: Fix order of operation
drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
Linus Torvalds [Fri, 29 Apr 2016 03:24:27 +0000 (20:24 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two boot crash fixes and an IRQ handling crash fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Handle zero vector gracefully in clear_vector_irq()
Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging"
xen/qspinlock: Don't kick CPU if IRQ is not initialized
Linus Torvalds [Fri, 29 Apr 2016 03:19:04 +0000 (20:19 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"x86 PMU driver fixes plus a core code race fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix incorrect lbr_sel_mask value
perf/x86/intel/pt: Don't die on VMXON
perf/core: Fix perf_event_open() vs. execve() race
perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX
perf/core: Make sysctl_perf_cpu_time_max_percent conform to documentation
perf/x86/intel/rapl: Add missing Haswell model
perf/x86/intel: Add model number for Skylake Server to perf
Linus Torvalds [Fri, 29 Apr 2016 02:59:17 +0000 (19:59 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two lockdep fixes"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Fix lock_chain::base size
locking/lockdep: Fix ->irq_context calculation
Linus Torvalds [Fri, 29 Apr 2016 02:54:50 +0000 (19:54 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fix from Ingo Molnar:
"This fixes a bug in the efivars code"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Fix out-of-bounds read in variable_matches()
Linus Torvalds [Fri, 29 Apr 2016 02:44:47 +0000 (19:44 -0700)]
Merge tag 'media/v4.6-4' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Some regression fixes:
- videobuf2 core: avoid the risk of going past buffer on multi-planes
and fix rw mode
- fix support for 4K formats at V4L2 core
- fix a trouble at davinci_fpe, caused by a bad patch
- usbvision: revert a patch with a partial fixup. The fixup patch
was merged already, and this one has some issues"
* tag 'media/v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] vb2-memops: Fix over allocation of frame vectors
[media] media: vb2: Fix regression on poll() for RW mode
[media] v4l2-dv-timings.h: fix polarity for 4k formats
[media] davinci_vpfe: Revert "staging: media: davinci_vpfe: remove,unnecessary ret variable"
[media] usbvision: revert commit
588afcc1
[media] videobuf2-v4l2: Verify planes array in buffer dequeueing
[media] videobuf2-core: Check user space planes array in dqbuf
Linus Torvalds [Fri, 29 Apr 2016 02:38:45 +0000 (19:38 -0700)]
Merge tag 'sound-4.6-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Usually we get a big collection of fixes for ASoC once during rc. And
this is it.
At this time, most of fixes are about Intel Skylake ASoC driver, which
is a new and still on-going development. Along with it, a slight
large LOC is seen in legacy HD-audio driver, but it's merely a code
move to the upper layer.
Other than that, the rest are small or trivial fixes to various
drivers, in addition to an ASoC dapm debugfs code fix"
* tag 'sound-4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ALSA: hda - Update BCLK also at hotplug for i915 HSW/BDW
ALSA: hda - Add dock support for ThinkPad X260
ASoC: wm5102: Free compressed IRQ in CODEC remove
ASoC: arizona: Free speaker thermal IRQs in CODEC remove
ASoC: Intel: Skylake: Fix ibs/obs calc for non-integral sampling rates
ASoC: Intel: sst: fix a loop timeout in sst_hsw_stream_reset()
ASoC: Intel: Skylake: Fix to turn OFF codec power when entering S3
ASoC: hdac_hdmi: Fix codec power state in S3 during playback
ASoC: hdac_hdmi: Fix to use dev_pm ops instead soc pm
ASoC: wm8962: Correct typo when setting DSPCLK rate
ASoC: nau8825: Fix jack detection across suspend
ASoC: Intel: Skylake: Fix DSP resource de-allocation
ASoC: Intel: Skylake: Fix for unloading module only when it is loaded
ASoC: Intel: Skylake: Fix kbuild dependency
ASoC: dapm: Make sure we have a card when displaying component widgets
ASoC: rt5640: Correct the digital interface data select
ASoC: Intel: Skylake: remove call to pci_dev_put
ASoC: Intel: Skylake: Call i915 exit last
ASoC: Intel: Skylake: Unmap the address last
ASoC: Intel: Skylake: Freeup properly on skl_dsp_free
...
Xishi Qiu [Thu, 28 Apr 2016 23:19:11 +0000 (16:19 -0700)]
Documentation/sysctl/vm.txt: update numa_zonelist_order description
Commit
3193913ce62c ("mm: page_alloc: default node-ordering on 64-bit
NUMA, zone-ordering on 32-bit") changes the default value of
numa_zonelist_order. Update the document.
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Potapenko [Thu, 28 Apr 2016 23:19:09 +0000 (16:19 -0700)]
lib/stackdepot.c: allow the stack trace hash to be zero
Do not bail out from depot_save_stack() if the stack trace has zero hash.
Initially depot_save_stack() silently dropped stack traces with zero
hashes, however there's actually no point in reserving this zero value.
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Zapolskiy [Thu, 28 Apr 2016 23:19:06 +0000 (16:19 -0700)]
rapidio: fix potential NULL pointer dereference
The change fixes improper check for a returned error value by
class_create() function, which on error returns ERR_PTR() value, thus the
original check always results in a dead code on error path.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Thu, 28 Apr 2016 23:19:03 +0000 (16:19 -0700)]
mm/memory-failure: fix race with compound page split/merge
get_hwpoison_page() must recheck relation between head and tail pages.
n-horiguchi said: without this recheck, the race causes kernel to pin an
irrelevant page, and finally makes kernel crash for refcount mismatch.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xuejiufei [Thu, 28 Apr 2016 23:19:01 +0000 (16:19 -0700)]
ocfs2/dlm: return zero if deref_done message is successfully handled
dlm_deref_lockres_done_handler() should return zero if the message is
successfully handled.
Fixes:
60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message").
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ananth N Mavinakayanahalli [Thu, 28 Apr 2016 23:18:58 +0000 (16:18 -0700)]
Ananth has moved
The current ID is going away soon... update email address
Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Thu, 28 Apr 2016 23:18:55 +0000 (16:18 -0700)]
kcov: don't profile branches in kcov
Profiling 'if' statements in __sanitizer_cov_trace_pc() leads to
unbound recursion and crash:
__sanitizer_cov_trace_pc() ->
ftrace_likely_update ->
__sanitizer_cov_trace_pc() ...
Define DISABLE_BRANCH_PROFILING to disable this tracer.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
James Morse [Thu, 28 Apr 2016 23:18:52 +0000 (16:18 -0700)]
kcov: don't trace the code coverage code
Kcov causes the compiler to add a call to __sanitizer_cov_trace_pc() in
every basic block. Ftrace patches in a call to _mcount() to each
function it has annotated.
Letting these mechanisms annotate each other is a bad thing. Break the
loop by adding 'notrace' to __sanitizer_cov_trace_pc() so that ftrace
won't try to patch this code.
This patch lets arm64 with KCOV and STACK_TRACER boot.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Thu, 28 Apr 2016 23:18:49 +0000 (16:18 -0700)]
mm: wake kcompactd before kswapd's short sleep
When kswapd goes to sleep it checks if the node is balanced and at first
it sleeps only for HZ/10 time, then rechecks if the node is still
balanced and nobody has woken it during the initial sleep. Only then it
goes fully sleep until an allocation slowpath wakes it up again.
For higher-order allocations, waking up kcompactd is done only before
the full sleep. This turns out to be an issue in case another
high-order allocation fails during the initial sleep. It will wake
kswapd up, however kswapd considers the zone balanced from the order-0
perspective, and will just quickly try to sleep again. So if there's a
longer stream of high-order allocations hitting the slowpath and waking
up kswapd, it might never actually wake up kcompactd, which may be
considered a regression from kswapd-based compaction. In the worst
case, it might be that a single allocation that cannot direct
reclaim/compact itself is waking kswapd in the retry loop and preventing
kcompactd from being woken up and unblocking it.
This patch makes sure kcompactd is woken up in such situations by simply
moving the wakeup before the short initial sleep. More efficient
solution would be to wake kcompactd immediately instead of kswapd if the
node is already order-0 balanced, but in that case we should also move
reset_isolation_suitable() call to kcompactd so it's not adding to the
allocator's latency. Since it's late in the 4.6 cycle, let's go with
the simpler change for now.
Fixes:
accf62422b3a ("mm, kswapd: replace kswapd compaction with waking up kcompactd")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frank Rowand [Thu, 28 Apr 2016 23:18:47 +0000 (16:18 -0700)]
.mailmap: add Frank Rowand
Set current email address to replace obsolete email addresses.
Signed-off-by: Frank Rowand <frank.rowand@sonymobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 28 Apr 2016 23:18:44 +0000 (16:18 -0700)]
mm/hwpoison: fix wrong num_poisoned_pages accounting
Currently, migration code increses num_poisoned_pages on *failed*
migration page as well as successfully migrated one at the trial of
memory-failure. It will make the stat wrong. As well, it marks the
page as PG_HWPoison even if the migration trial failed. It would mean
we cannot recover the corrupted page using memory-failure facility.
This patches fixes it.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 28 Apr 2016 23:18:41 +0000 (16:18 -0700)]
mm: call swap_slot_free_notify() with page lock held
Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in
page_swap_info. The reason is that page_endio in rw_page unlocks the
page if read I/O is completed so we need to hold a PG_lock again to
check PageSwapCache. Otherwise, the page can be removed from swapcache.
Kernel BUG at
c00f9040 [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 4 PID: 13446 Comm: RenderThread Tainted: G W
3.10.84-g9f14aec-dirty #73
task:
c3b73200 ti:
dd192000 task.ti:
dd192000
PC is at page_swap_info+0x10/0x2c
LR is at swap_slot_free_notify+0x18/0x6c
pc : [<
c00f9040>] lr : [<
c00f5560>] psr:
400f0113
sp :
dd193d78 ip :
c2deb1e4 fp :
da015180
r10:
00000000 r9 :
000200da r8 :
c120fe08
r7 :
00000000 r6 :
00000000 r5 :
c249a6c0 r4 : =
c249a6c0
r3 :
00000000 r2 :
40080009 r1 :
200f0113 r0 : =
c249a6c0
..<snip> ..
Call Trace:
page_swap_info+0x10/0x2c
swap_slot_free_notify+0x18/0x6c
swap_readpage+0x90/0x11c
read_swap_cache_async+0x134/0x1ac
swapin_readahead+0x70/0xb0
handle_pte_fault+0x320/0x6fc
handle_mm_fault+0xc0/0xf0
do_page_fault+0x11c/0x36c
do_DataAbort+0x34/0x118
Fixes:
3f2b1a04f44933f2 ("zram: revive swap_slot_free_notify")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Tested-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 28 Apr 2016 23:18:38 +0000 (16:18 -0700)]
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
We have been reclaimed highmem zone if buffer_heads is over limit but
commit
6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from
shrink_zone()") changed the behavior so it doesn't reclaim highmem zone
although buffer_heads is over the limit. This patch restores the logic.
Fixes:
6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gerald Schaefer [Thu, 28 Apr 2016 23:18:35 +0000 (16:18 -0700)]
numa: fix /proc/<pid>/numa_maps for THP
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture. On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.
On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available. In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped. On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.
This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Thu, 28 Apr 2016 23:18:32 +0000 (16:18 -0700)]
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
Khugepaged detects own VMAs by checking vm_file and vm_ops but this way
it cannot distinguish private /dev/zero mappings from other special
mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap.
This fixes false-positive VM_BUG_ON and prevents installing THP where
they are not expected.
Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com
Fixes:
78f11a255749 ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups")
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Kozlowski [Thu, 28 Apr 2016 23:18:30 +0000 (16:18 -0700)]
mailmap: fix Krzysztof Kozlowski's misspelled name
Patchwork introduced a garbled Polish character in commit
1e3012d0fdc5
("crypto: s5p-sss - Use memcpy_toio for iomem annotated memory") so fix
the mail mapping. Additionally prefer to use kernel.org account for
personal work, instead of my gmail address.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 28 Apr 2016 23:18:27 +0000 (16:18 -0700)]
thp: keep huge zero page pinned until tlb flush
Andrea has found[1] a race condition on MMU-gather based TLB flush vs
split_huge_page() or shrinker which frees huge zero under us (patch 1/2
and 2/2 respectively).
With new THP refcounting, we don't need patch 1/2: mmu_gather keeps the
page pinned until flush is complete and the pin prevents the page from
being split under us.
We still need patch 2/2. This is simplified version of Andrea's patch.
We don't need fancy encoding.
[1] http://lkml.kernel.org/r/
1447938052-22165-1-git-send-email-aarcange@redhat.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steve Capper [Thu, 28 Apr 2016 23:18:24 +0000 (16:18 -0700)]
mm: exclude HugeTLB pages from THP page_mapped() logic
HugeTLB pages cannot be split, so we use the compound_mapcount to track
rmaps.
Currently page_mapped() will check the compound_mapcount, but will also
go through the constituent pages of a THP compound page and query the
individual _mapcount's too.
Unfortunately, page_mapped() does not distinguish between HugeTLB and
THP compound pages and assumes that a compound page always needs to have
HPAGE_PMD_NR pages querying.
For most cases when dealing with HugeTLB this is just inefficient, but
for scenarios where the HugeTLB page size is less than the pmd block
size (e.g. when using contiguous bit on ARM) this can lead to crashes.
This patch adjusts the page_mapped function such that we skip the
unnecessary THP reference checks for HugeTLB pages.
Fixes:
e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Signed-off-by: Steve Capper <steve.capper@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Kumagai [Thu, 28 Apr 2016 23:18:21 +0000 (16:18 -0700)]
kexec: export OFFSET(page.compound_head) to find out compound tail page
PageAnon() always look at head page to check PAGE_MAPPING_ANON and tail
page's page->mapping has just a poisoned data since commit
1c290f642101
("mm: sanitize page->mapping for tail pages").
If makedumpfile checks page->mapping of a compound tail page to
distinguish anonymous page as usual, it must fail in newer kernel. So
it's necessary to export OFFSET(page.compound_head) to avoid checking
compound tail pages.
The problem is that unnecessary hugepages won't be removed from a dump
file in kernels 4.5.x and later. This means that extra disk space would
be consumed. It's a problem, but not critical.
Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Kumagai [Thu, 28 Apr 2016 23:18:18 +0000 (16:18 -0700)]
kexec: update VMCOREINFO for compound_order/dtor
makedumpfile refers page.lru.next to get the order of compound pages for
page filtering.
However, now the order is stored in page.compound_order, hence
VMCOREINFO should be updated to export the offset of
page.compound_order.
The fact is, page.compound_order was introduced already in kernel 4.0,
but the offset of it was the same as page.lru.next until kernel 4.3, so
this was not actual problem.
The above can be said also for page.lru.prev and page.compound_dtor,
it's necessary to detect hugetlbfs pages. Further, the content was
changed from direct address to the ID which means dtor.
The problem is that unnecessary hugepages won't be removed from a dump
file in kernels 4.4.x and later. This means that extra disk space would
be consumed. It's a problem, but not critical.
Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 29 Apr 2016 01:59:24 +0000 (18:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"There is a lifecycle fix in the auth code, a fix for a narrow race
condition on map, and a helpful message in the log when there is a
feature mismatch (which happens frequently now that the default
server-side options have changed)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: report unsupported features to syslog
rbd: fix rbd map vs notify races
libceph: make authorizer destruction independent of ceph_auth_client
Linus Torvalds [Fri, 29 Apr 2016 01:52:11 +0000 (18:52 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Three more bug fixes for 4.6
- Due to a race in the dynamic page table code a multi-threaded
program can cause a translation specification exception. With
panic_on_oops a user space program can crash the system.
- An information leak with the /dev/sclp device.
- A use after free in the s390 PCI code"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/sclp_ctl: fix potential information leak with /dev/sclp
s390/mm: fix asce_bits handling with dynamic pagetable levels
s390/pci: fix use after free in dma_init
Florian Westphal [Sun, 24 Apr 2016 20:18:59 +0000 (22:18 +0200)]
RDMA/nes: don't leak skb if carrier down
Alternatively one could free the skb, OTOH I don't think this test is
useful so just remove it.
Cc: <linux-rdma@vger.kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sinclair Yeh [Thu, 21 Apr 2016 18:29:31 +0000 (11:29 -0700)]
drm/vmwgfx: Fix order of operation
mode->hdisplay * (var->bits_per_pixel + 7) gets evaluated before
the division, potentially making the pitch larger than it should
be.
Since the original intention is to do a div-round-up, just use
the macro instead.
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Charmaine Lee [Tue, 12 Apr 2016 15:19:08 +0000 (08:19 -0700)]
drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
Instead of calling vmw_cmd_ok, call vmw_cmd_dx_cid_check to
validate the context id for query commands.
Signed-off-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Charmaine Lee [Tue, 12 Apr 2016 15:14:23 +0000 (08:14 -0700)]
drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
Fixes piglit tests nv_conditional_render-* crashes.
Signed-off-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Jason Gunthorpe [Mon, 11 Apr 2016 01:13:13 +0000 (19:13 -0600)]
IB/security: Restrict use of the write() interface
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl(). This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.
For the immediate repair, detect and deny suspicious accesses to
the write API.
For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).
The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Fri, 22 Apr 2016 18:17:03 +0000 (11:17 -0700)]
IB/hfi1: Use kernel default llseek for ui device
The ui device llseek had a mistake with SEEK_END and did
not fully follow seek semantics. Correct all this by
using a kernel supplied function for fixed size devices.
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 20 Apr 2016 13:05:36 +0000 (06:05 -0700)]
IB/hfi1: Don't attempt to free resources if initialization failed
Attempting to free resources which have not been allocated and
initialized properly led to the following kernel backtrace:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<
ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
PGD
852a43067 PUD
85d4a6067 PMD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 2831 Comm: osu_bw Tainted: G IO 3.12.18-wfr+ #1
task:
ffff88085b15b540 ti:
ffff8808588fe000 task.ti:
ffff8808588fe000
RIP: 0010:[<
ffffffffa09658fe>] [<
ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
RSP: 0018:
ffff8808588ffde0 EFLAGS:
00010282
RAX:
0000000000000000 RBX:
ffff880858a31800 RCX:
0000000000000000
RDX:
ffff88085d971bc0 RSI:
ffff880858a318f8 RDI:
ffff880858a318c0
RBP:
ffff8808588ffe20 R08:
0000000000000000 R09:
0000000000000000
R10:
ffff88087ffd6f40 R11:
0000000001100348 R12:
ffff880852900000
R13:
ffff880858a318c0 R14:
0000000000000000 R15:
ffff88085d971be8
FS:
00007f4674e83740(0000) GS:
ffff88087f400000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000000 CR3:
000000085c377000 CR4:
00000000001407f0
Stack:
ffffffffa0941a71 ffff880858a318f8 ffff88085d971bc0 ffff880858a31800
ffff880852900000 ffff880858a31800 00000000003ffff7 ffff88085d971bc0
ffff8808588ffe60 ffffffffa09663fc ffff8808588ffe60 ffff880858a31800
Call Trace:
[<
ffffffffa0941a71>] ? find_mmu_handler+0x51/0x70 [hfi1]
[<
ffffffffa09663fc>] hfi1_user_exp_rcv_free+0x6c/0x120 [hfi1]
[<
ffffffffa0932809>] hfi1_file_close+0x1a9/0x340 [hfi1]
[<
ffffffff8116c189>] __fput+0xe9/0x270
[<
ffffffff8116c35e>] ____fput+0xe/0x10
[<
ffffffff81065707>] task_work_run+0xa7/0xe0
[<
ffffffff81002969>] do_notify_resume+0x59/0x80
[<
ffffffff814ffc1a>] int_signal+0x12/0x17
This commit re-arranges the context initialization code in a way that
would allow for context event flags to be used to determine whether
the context has been successfully initialized.
In turn, this can be used to skip the resource de-allocation if they
were never allocated in the first place.
Fixes:
3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Wed, 20 Apr 2016 13:05:30 +0000 (06:05 -0700)]
IB/hfi1: Fix missing lock/unlock in verbs drain callback
The iowait_sdma_drained() callback lacked locking to
protect the qp s_flags field.
This causes the s_flags to be out of sync
on multiple CPUs, potentially corrupting the s_flags.
Fixes:
a545f5308b6c ("staging/rdma/hfi: fix CQ completion order issue")
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jubin John [Tue, 12 Apr 2016 17:47:00 +0000 (10:47 -0700)]
IB/rdmavt: Fix send scheduling
call_send is used to determine whether to send immediately or schedule
a send for later. The current logic in rdmavt is inverted and has a
negative impact on the latency of the hfi1 and qib drivers. Fix this
regression by correctly calling send immediately when call_send is set.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Tue, 12 Apr 2016 17:46:16 +0000 (10:46 -0700)]
IB/hfi1: Prevent unpinning of wrong pages
The routine used by the SDMA cache to handle already
cached nodes can extend an already existing node.
In its error handling code, the routine will unpin pages
when not all pages of the buffer extension were pinned.
There was a bug in that part of the routine, which would
mistakenly unpin pages from the original set rather than
the newly pinned pages.
This commit fixes that bug by offsetting the page array
to the proper place pointing at the beginning of the newly
pinned pages.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Tue, 12 Apr 2016 17:46:03 +0000 (10:46 -0700)]
IB/hfi1: Fix deadlock caused by locking with wrong scope
The locking around the interval RB tree is designed to prevent
access to the tree while it's being modified. The locking in its
current form is too overzealous, which is causing a deadlock in
certain cases with the following backtrace:
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
CPU: 0 PID: 5836 Comm: IMB-MPI1 Tainted: G O 3.12.18-wfr+ #1
0000000000000000 ffff88087f206c50 ffffffff814f1caa ffffffff817b53f0
ffff88087f206cc8 ffffffff814ecd56 0000000000000010 ffff88087f206cd8
ffff88087f206c78 0000000000000000 0000000000000000 0000000000001662
Call Trace:
<NMI> [<
ffffffff814f1caa>] dump_stack+0x45/0x56
[<
ffffffff814ecd56>] panic+0xc2/0x1cb
[<
ffffffff810d4370>] ? restart_watchdog_hrtimer+0x50/0x50
[<
ffffffff810d4432>] watchdog_overflow_callback+0xc2/0xd0
[<
ffffffff81109b4e>] __perf_event_overflow+0x8e/0x2b0
[<
ffffffff8110a714>] perf_event_overflow+0x14/0x20
[<
ffffffff8101c906>] intel_pmu_handle_irq+0x1b6/0x390
[<
ffffffff814f927b>] perf_event_nmi_handler+0x2b/0x50
[<
ffffffff814f8ad8>] nmi_handle.isra.3+0x88/0x180
[<
ffffffff814f8d39>] do_nmi+0x169/0x310
[<
ffffffff814f8177>] end_repeat_nmi+0x1e/0x2e
[<
ffffffff81272600>] ? unmap_single+0x30/0x30
[<
ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
[<
ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
[<
ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
<<EOE>> <IRQ> [<
ffffffffa056c4a8>] hfi1_mmu_rb_search+0x38/0x70 [hfi1]
[<
ffffffffa05919cb>] user_sdma_free_request+0xcb/0x120 [hfi1]
[<
ffffffffa0593393>] user_sdma_txreq_cb+0x263/0x350 [hfi1]
[<
ffffffffa057fad7>] ? sdma_txclean+0x27/0x1c0 [hfi1]
[<
ffffffffa0593130>] ? user_sdma_send_pkts+0x1710/0x1710 [hfi1]
[<
ffffffffa057fdd6>] sdma_make_progress+0x166/0x480 [hfi1]
[<
ffffffff810762c9>] ? ttwu_do_wakeup+0x19/0xd0
[<
ffffffffa0581c7e>] sdma_engine_interrupt+0x8e/0x100 [hfi1]
[<
ffffffffa0546bdd>] sdma_interrupt+0x5d/0xa0 [hfi1]
[<
ffffffff81097e57>] handle_irq_event_percpu+0x47/0x1d0
[<
ffffffff81098017>] handle_irq_event+0x37/0x60
[<
ffffffff8109aa5f>] handle_edge_irq+0x6f/0x120
[<
ffffffff810044af>] handle_irq+0xbf/0x150
[<
ffffffff8104c9b7>] ? irq_enter+0x17/0x80
[<
ffffffff8150168d>] do_IRQ+0x4d/0xc0
[<
ffffffff814f7c6a>] common_interrupt+0x6a/0x6a
<EOI> [<
ffffffff81073524>] ? finish_task_switch+0x54/0xe0
[<
ffffffff814f56c6>] __schedule+0x3b6/0x7e0
[<
ffffffff810763a6>] __cond_resched+0x26/0x30
[<
ffffffff814f5eda>] _cond_resched+0x3a/0x50
[<
ffffffff814f4f82>] down_write+0x12/0x30
[<
ffffffffa0591619>] hfi1_release_user_pages+0x69/0x90 [hfi1]
[<
ffffffffa059173a>] sdma_rb_remove+0x9a/0xc0 [hfi1]
[<
ffffffffa056c00d>] __mmu_rb_remove.isra.5+0x5d/0x70 [hfi1]
[<
ffffffffa056c536>] hfi1_mmu_rb_remove+0x56/0x70 [hfi1]
[<
ffffffffa059427b>] hfi1_user_sdma_process_request+0x74b/0x1160 [hfi1]
[<
ffffffffa055c763>] hfi1_aio_write+0xc3/0x100 [hfi1]
[<
ffffffff8116a14c>] do_sync_readv_writev+0x4c/0x80
[<
ffffffff8116b58b>] do_readv_writev+0xbb/0x230
[<
ffffffff811a9da1>] ? fsnotify+0x241/0x320
[<
ffffffff81073524>] ? finish_task_switch+0x54/0xe0
[<
ffffffff8116b795>] vfs_writev+0x35/0x60
[<
ffffffff8116b8c9>] SyS_writev+0x49/0xc0
[<
ffffffff810cd876>] ? __audit_syscall_exit+0x1f6/0x2a0
[<
ffffffff814ff992>] system_call_fastpath+0x16/0x1b
As evident from the backtrace above, the process was being put to sleep
while holding the lock.
Limiting the scope of the lock only to the RB tree operation fixes the
above error allowing for proper locking and the process being put to
sleep when needed.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Tue, 12 Apr 2016 17:45:57 +0000 (10:45 -0700)]
IB/hfi1: Prevent NULL pointer deferences in caching code
There is a potential kernel crash when the MMU notifier calls the
invalidation routines in the hfi1 pinned page caching code for sdma.
The invalidation routine could call the remove callback
for the node, which in turn ends up dereferencing the
current task_struct to get a pointer to the mm_struct.
However, the mm_struct pointer could be NULL resulting in
the following backtrace:
BUG: unable to handle kernel NULL pointer dereference at
00000000000000a8
IP: [<
ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
15
task:
ffff88085e66e080 ti:
ffff88085c244000 task.ti:
ffff88085c244000
RIP: 0010:[<
ffffffffa041f75a>] [<
ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
RSP: 0000:
ffff88085c245878 EFLAGS:
00010002
RAX:
0000000000000000 RBX:
ffff88105b9bbd40 RCX:
ffffea003931a830
RDX:
0000000000000004 RSI:
ffff88105754a9c0 RDI:
ffff88105754a9c0
RBP:
ffff88085c245890 R08:
ffff88105b9bbd70 R09:
00000000fffffffb
R10:
ffff88105b9bbd58 R11:
0000000000000013 R12:
ffff88105754a9c0
R13:
0000000000000001 R14:
0000000000000001 R15:
ffff88105b9bbd40
FS:
0000000000000000(0000) GS:
ffff88107ef40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000000000a8 CR3:
0000000001a0b000 CR4:
00000000001407e0
Stack:
ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
Call Trace:
[<
ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
[<
ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
[<
ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
[<
ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
[<
ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
[<
ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
[<
ffffffff81141fad>] try_to_unmap+0x4d/0x60
[<
ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
[<
ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
[<
ffffffff81121491>] shrink_lruvec+0x4c1/0x640
[<
ffffffff81121641>] shrink_zone+0x31/0x100
[<
ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
[<
ffffffff811229e3>] kswapd+0x403/0x7e0
[<
ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
[<
ffffffff81068ac0>] kthread+0xc0/0xd0
[<
ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
[<
ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
[<
ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
To correct this, the mm_struct passed to us by the MMU notifier is
used (which is what should have been done to begin with). This avoids
the broken derefences and ensures that the correct mm_struct is used.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Sun, 3 Apr 2016 12:03:12 +0000 (15:03 +0300)]
MAINTAINERS: Update iser/isert maintainer contact info
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 31 Mar 2016 16:03:25 +0000 (19:03 +0300)]
IB/mlx5: Expose correct max_sge_rd limit
mlx5 devices (Connect-IB, ConnectX-4, ConnectX-4-LX) has a limitation
where rdma read work queue entries cannot exceed 512 bytes.
A rdma_read wqe needs to fit in 512 bytes:
- wqe control segment (16 bytes)
- rdma segment (16 bytes)
- scatter elements (16 bytes each)
So max_sge_rd should be: (512 - 16 - 16) / 16 = 30.
Cc: linux-stable@vger.kernel.org
Reported-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagig@grimberg.me>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>