Len Brown [Sat, 4 Mar 2017 23:18:28 +0000 (18:18 -0500)]
tools/power turbostat: update HWP dump to decimal from hex
Syntax only.
The HWP CAPABILTIES and REQUEST ratios are more easily
viewed in decimal -- just multiply by 100 and you get MHz...
new:
cpu0: MSR_HWP_CAPABILITIES: 0x010c1b23 (high 35 guar 27 eff 12 low 1)
cpu0: MSR_HWP_REQUEST: 0x80002301 (min 1 max 35 des 0 epp 0x80 window 0x0 pkg 0x0)
old:
cpu0: MSR_HWP_CAPABILITIES: 0x010c1b23 (high 0x23 guar 0x1b eff 0xc low 0x1)
cpu0: MSR_HWP_REQUEST: 0x80002301 (min 0x1 max 0x23 des 0x0 epp 0x80 window 0x0 pkg 0x0)
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 4 Mar 2017 23:10:45 +0000 (18:10 -0500)]
tools/power turbostat: enable package THERM_INTERRUPT dump
cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x884b0800 (25 C)
cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C)
Enable the same per-core output, but hide it behind --debug
because it is too verbose on big systems.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 4 Mar 2017 22:23:07 +0000 (17:23 -0500)]
tools/power turbostat: show missing Core and GFX power on SKL and KBL
While the current SDM is silent on the matter, the Core and GFX
RAPL power meters on SKL and KBL appear to work -- so show them.
Reported-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 4 Mar 2017 20:42:48 +0000 (15:42 -0500)]
tools/power turbostat: bugfix: GFXMHz column not changing
turbostat displays a GFXMHz column, which comes from reading
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
But GFXMHz was not changing, even when a manual
cat /sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
showed a new value.
It turns out that a rewind() on the open file is not sufficient,
fflush() (or a close/open) is needed to read fresh values.
Reported-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 21 Jan 2017 07:33:23 +0000 (02:33 -0500)]
tools/power turbostat: version 17.02.24
The turbostat before this last set of changes is obsolete.
This new version can do a lot more, but it also has
some different defaults, that might catch some off-guard.
So it seems a good time to give a new version number.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Thu, 23 Feb 2017 23:10:27 +0000 (18:10 -0500)]
tools/power turbostat: bugfix: --add u32 was printed as u64
When the "u32" keyword is used with --add, it means that
the output should be truncated to 32-bits. This was not
happening and all 64-bits were printed.
Also, when no column name was used for an added MSR,
The default column name was in deximal, eg. MSR16.
Users report that they tend to use hex MSR numbers,
so print them in hex. To always fit into the columns,
use the syntax M0x10. Note that the user can always
supply any column header that they want.
eg --add msr0x10,MY_TSC
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Thu, 23 Feb 2017 22:00:51 +0000 (17:00 -0500)]
tools/power turbostat: show error on exec
When turbostat is run in one-shot command mode,
the parent takes the 'before' counter snapshot,
fork/exec/wait for the child to exit,
takes the 'after' counter snapshot,
and prints the results.
however, if the child fails to exec the command,
it immediately returns, without indicating that
anythign was wrong.
Add an error message showing that exec failed:
sudo turbostat sleeeep 4
...
turbostat: exec sleeeep: No such file or directory
...
Note that the parent will still print out the statistics,
because it can't tell the difference between the failed
exec and a command that is purposefully returning
the same status. Unfortunately, this may obscure the
error message. However, if the --out parameter is used,
the error message is evident on stderr.
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 22 Feb 2017 05:11:12 +0000 (00:11 -0500)]
tools/power turbostat: dump p-state software config
cpu1: cpufreq driver: acpi-cpufreq
cpu1: cpufreq governor: ondemand
cpufreq boost: 1
or
cpu0: cpufreq driver: intel_pstate
cpu0: cpufreq governor: powersave
cpufreq intel_pstate no_turbo: 0
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 22 Feb 2017 04:43:41 +0000 (23:43 -0500)]
tools/power turbostat: show package number, even without --debug
On multi-package systems, the "Package" column was being displayed
only if --debug was used. Show it always.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 22 Feb 2017 04:21:13 +0000 (23:21 -0500)]
tools/power turbostat: support "--hide C1" etc.
Originally, the only way to hide the sysfs C-state statistics columns
was with "--hide sysfs". This was because we process "--hide" before
we probe for those columns.
hack --hide to remember deferred hide requests, and apply
them when sysfs is probed.
"--hide sysfs" is still available as short-hand to refer to
the entire group of counters.
The down-side of this change is that we no longer error check for
bogus --hide column names. But the user will quickly figure that
out if a column they mean to hide is still there...
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 22 Feb 2017 03:33:42 +0000 (22:33 -0500)]
tools/power turbostat: move --Package and --processor into the --cpu option
--Package is now "--cpu package",
which will display just the 1st CPU in each package
--processor is not "--cpu core"
which will display just the 1st CPU in each core
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 15 Feb 2017 05:30:22 +0000 (00:30 -0500)]
tools/power turbostat: turbostat.8 update
update examples to show recently updated features.
In particular
--add
--show
--hide
--cpu
--list
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 17 Feb 2017 04:07:51 +0000 (23:07 -0500)]
tools/power turbostat: update --list feature
Make it possible to take the entire un-edited output
from `turbostat --list` and feed it to "turbostat --show"
or "turbostat --hide".
To do this, the leading comma was removed
(no mater what columns are active)
and also they dynamic C-state "C1, C2, C3" etc are replaced
by the string "sysfs", which refers to them as a group.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Thu, 16 Feb 2017 02:45:40 +0000 (21:45 -0500)]
tools/power turbostat: use wide columns to display large numbers
When a counter overlfows 7 columns, it shifts the remaining
columns to the right, so they no longer line up under
their column header.
Update turbostat to dectect when it is handling large
numbers, and switch to wider columns where, necessary.
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 15 Feb 2017 22:15:11 +0000 (17:15 -0500)]
tools/power turbostat: Add --list option to show available header names
It is handy to know the list of column header names,
so that they can be used with --add and --skip
The new --list option shows them:
sudo ./turbostat --list --hide sysfs
,Core,CPU,Avg_MHz,Busy%,Bzy_MHz,TSC_MHz,IRQ,SMI,CPU%c1,CPU%c3,CPU%c6,CPU%c7,CoreTmp,PkgTmp,GFX%rc6,GFXMHz,PkgWatt,CorWatt,GFXWatt
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 15 Feb 2017 03:07:52 +0000 (22:07 -0500)]
tools/power turbostat: fix zero IRQ count shown in one-shot command mode
The IRQ column has been working for periodic mode,
but not in one-shot command mode, it shows only 0.
until now.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 11 Feb 2017 04:54:15 +0000 (23:54 -0500)]
tools/power turbostat: add --cpu parameter
With the --cpu parameter, turbostat prints only lines
for the specified set of CPUs:
sudo ./turbostat --quiet --show Core,CPU --cpu 0,1,3..5,6-7
Core CPU
- -
0 0
0 4
1 1
1 5
2 6
3 3
3 7
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Thu, 9 Feb 2017 23:25:22 +0000 (18:25 -0500)]
tools/power turbostat: print sysfs C-state stats
When turbostat shows % of time in a CPU idle power state,
it has always been showing information from underlying
hardware residency counters.
While this reflects what the hardware is doing, and is thus
useful for understanding the hardware,
it doesn't directly tell us what Linux requested --
which is useful for tuning Linux itself.
Here we add columns to turbostat to show the
Linux cpuidle sub-system statistics:
/sys/devices/system/cpu/cpu*/cpuidle/state*/*
The first group of columns are the "usage", which is the
number of times software requested that C-state in the
measurement interval. eg C1 below.
The second group of columns are the "time", which is the percentage
of the measurement interval time that software has requested
the specified C-state. eg C1% below.
These software counters can be compared to the underlying
hardware residency counters (eg CPU%c1 CPU%c3 CPU%c6 CPU%c7)
to compare what sofware requested to what the hardware delivered.
These sysfs attributes are discovered when turbostat starts,
rather than being "built in". So the --show and --hide
parameters do not know about these dynamic column names.
However "--show sysfs" and "--hide sysfs" act on the
entire group of columns:
turbostat --show sysfs
...
cpu4: POLL: CPUIDLE CORE POLL IDLE
cpu4: C1: MWAIT 0x00
cpu4: C1E: MWAIT 0x01
cpu4: C3: MWAIT 0x10
cpu4: C6: MWAIT 0x20
cpu4: C7s: MWAIT 0x32
...
C1 C1E C3 C6 C7s C1% C1E% C3% C6% C7s%
3 6 5 1 188 0.00 0.02 0.00 0.00 99.93
0 6 5 0 58 0.00 0.16 0.02 0.00 99.70
0 0 0 0 9 0.00 0.00 0.00 0.00 99.96
0 0 0 1 24 0.00 0.00 0.00 0.02 99.93
0 0 0 0 9 0.00 0.00 0.00 0.00 99.97
0 0 0 0 32 0.00 0.00 0.00 0.00 99.96
0 0 0 0 7 0.00 0.00 0.00 0.00 99.98
2 0 0 0 36 0.00 0.00 0.00 0.00 99.97
1 0 0 0 13 0.00 0.00 0.00 0.00 99.98
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 8 Feb 2017 07:41:51 +0000 (02:41 -0500)]
tools/power turbostat: extend --add option to accept /sys path
Previously, the --add option could specify only an MSR.
Here is is extended so an arbitrary /sys attribute,
as specified by an absolute file path name.
sudo ./turbostat --add /sys/devices/system/cpu/cpu0/cpuidle/state5/usage
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 10 Feb 2017 06:56:47 +0000 (01:56 -0500)]
tools/power turbostat: skip unused counters on BDX
Skip these two counters on BDX, as they are always zero:
cc7, pc7
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 1 Feb 2017 04:07:49 +0000 (23:07 -0500)]
tools/power turbostat: fix decoding for GLM, DNV, SKX turbo-ratio limits
Newer processors do not hard-code the the number of cpus in each bin
to {1, 2, 3, 4, 5, 6, 7, 8} Rather, they can specify any number
of CPUS in each of the 8 bins:
eg.
...
37 * 100.0 = 3600.0 MHz max turbo 4 active cores
38 * 100.0 = 3700.0 MHz max turbo 3 active cores
39 * 100.0 = 3800.0 MHz max turbo 2 active cores
39 * 100.0 = 3900.0 MHz max turbo 1 active cores
could now look something like this:
...
37 * 100.0 = 3600.0 MHz max turbo 16 active cores
38 * 100.0 = 3700.0 MHz max turbo 8 active cores
39 * 100.0 = 3800.0 MHz max turbo 4 active cores
39 * 100.0 = 3900.0 MHz max turbo 2 active cores
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 27 Jan 2017 07:36:41 +0000 (02:36 -0500)]
tools/power turbostat: skip unused counters on SKX
Skip these four counters on SKX, as they are always zero:
cc3, pc3
cc7, pc7
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 27 Jan 2017 07:13:27 +0000 (02:13 -0500)]
tools/power turbostat: Denverton: use HW CC1 counter, skip C3, C7
The CC1 column in tubostat can be computed by subtracting
the core c-state residency countes from the total Cx residency.
CC1 = (Idle_time_as_measured by MPERF) - (all core C-states with
residency counters)
However, as the underlying counter reads are not atomic,
error can be noticed in this calculations, especially
when the numbers are small.
Denverton has a hardware CC1 residency counter
to improve the accuracy of the cc1 statistic -- use it.
At the same time, Denverton has no concept of CC3, PC3, CC7, PC7,
so skip collecting and printing those columns.
Finally, a note of clarification.
Turbostat prints the standard PC2 residency counter,
but on Denverton hardware, that actually means PC1E.
Turbostat prints the standard PC6 residency counter,
but on Denverton hardware, that actually means PC2.
At this point, we document that differnce in this commit message,
rather than adding a quirk to the software.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 27 Jan 2017 06:45:35 +0000 (01:45 -0500)]
tools/power turbostat: initial Gemini Lake SOC support
Gemini Lake is similar to Apollo Lake (Broxton/Goldmont)
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 17 Feb 2017 04:48:50 +0000 (23:48 -0500)]
x86: intel-family.h: Add GEMINI_LAKE SOC
Cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 27 Jan 2017 05:50:45 +0000 (00:50 -0500)]
tools/power turbostat: bug fixes to --add, --show/--hide features
Fix a bug with --add, where the title of the column
is un-initialized if not specified by the user.
The initial implementation of --show and --hide
neglected to handle the pc8/pc9/pc10 counters.
Fix a bug where "--show Core" only worked with --debug
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 10 Feb 2017 05:29:51 +0000 (00:29 -0500)]
tools/power turbostat: use tsc_tweak everwhere it is needed
The CPU ticks at a rate in the "bus clock" domain.
eg. 100 MHz * bus_ratio.
On newer processors, the TSC has been moved out of this BCLK
domain and into a separate crystal-clock domain.
While the TSC ticks "close to" the base frequency, those that look
closely at the numbers will notice small errors in calculations that
mix units of TSC clocks and bus clocks.
"tsc_tweak" was introduced to address the most visible
mixing -- the %Busy and the the Busy_MHz calculations.
(A simplification as since removed TSC from the BusyMHz calculation)
Here we apply the tsc_tweak to everyplace where BCLK
and TSC units are mixed. The results is that
on a system which is 100% idle, the sum of the C-states
are now much more likely to be closer to 100%.
Reported-by: Travis Downs <travis.downs@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 21 Jan 2017 07:26:00 +0000 (02:26 -0500)]
tools/power turbostat: print system config, unless --quiet
Some users want turbostat to tell them everything, by default.
Some users want turbostat to be quiet, by default.
I find that I'm in the 1st camp, and so I've never liked
needing to type the --debug parameter to decode the system
configuration.
So here we change the default and print the system configuration,
by default. (The --debug option is now un-documented, though
it does still exist for debugging turbostat internals)
When you do not want to see the system configuration
header, use the new "--quiet" option.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 21 Jan 2017 06:59:12 +0000 (01:59 -0500)]
tools/power turbostat: show all columns, independent of --debug
Some time ago, turbostat overflowed 80 columns.
So on the assumption that a "casual" user would always
want topology and frequency columns, we hid the rest
of the columns and the system configuration decoding
behind the --debug option.
Not everybody liked that change -- including me.
I use --debug 99% of the time...
Well, now we have "-o file" to put turbostat output into a file,
so unless you are watching real-time in a small window,
column count is less frequently a factor.
And more recently, we got the "--hide columnA,columnB" option
to specify columns to skip.
So now we "un-hide" the rest of the columns from behind --debug,
and show them all, by default.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 21 Jan 2017 06:26:16 +0000 (01:26 -0500)]
tools/power turbostat: decode MSR_MISC_FEATURE_CONTROL
useful for observing if the BIOS disabled prefetch
Not architectural, but docuemented as present on NHM, SNB
and is present on others.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 21 Jan 2017 06:15:09 +0000 (01:15 -0500)]
x86 msr_index.h: Define MSR_MISC_FEATURE_CONTROL
This non-architectural MSR has disable bits
for various prefetchers on modern processors.
While these bits are generally touched only by the BIOS,
say, via BIOS SETUP, it is useful to dump them
when examining options that can alter performance.
Cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 21 Jan 2017 05:50:08 +0000 (00:50 -0500)]
tools/power turbostat: decode CPUID(6).TURBO
show the CPUID feature for turbo to clarify the case
when it may not be shown in MISC_ENABLE
CPUID(6): APERF, TURBO, DTS, PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, EPB
cpu4: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT TURBO)
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 13 Jan 2017 04:49:18 +0000 (23:49 -0500)]
tools/power turbostat: dump Atom P-states correctly
Turbostat dumps MSR_TURBO_RATIO_LIMIT on Core Architecture.
But Atom Architecture uses MSR_ATOM_CORE_RATIOS and
MSR_ATOM_CORE_TURBO_RATIOS.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 25 Feb 2017 21:55:17 +0000 (16:55 -0500)]
intel_pstate: use MSR_ATOM_RATIOS definitions from msr-index.h
Originally, these MSRs were locally defined in this driver.
Now the definitions are in msr-index.h -- use them.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 13 Jan 2017 04:22:28 +0000 (23:22 -0500)]
x86 msr-index.h: Define Atom specific core ratio MSR locations
These MSRs are currently used by the intel_pstate driver,
using a local definition.
Cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Thu, 12 Jan 2017 04:17:24 +0000 (23:17 -0500)]
tools/power turbostat: further decode MSR_IA32_MISC_ENABLE
Decode MISC_ENABLE.NO_TURBO,
also use the #defines in msr-index.h for decoding this register
cpu0: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT TURBO)
Although it is not architectural, decode also
MSR_IA32_MISC_ENABLE.prefetch-disable (bit-9).
documented to be present on: Core, P4, Intel-Xeon
reserved on: Atom, Silvermont, Nehalem, SNB, PHI ec.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Thu, 12 Jan 2017 03:12:25 +0000 (22:12 -0500)]
tools/power turbostat: add precision to --debug frequency output
Add a digit of precision to the --debug output for frequency range.
This is useful when BCLK is not an integer.
old:
6 * 83 = 500 MHz max efficiency frequency
26 * 83 = 2166 MHz base frequency
new:
6 * 83.3 = 499.8 MHz max efficiency frequency
26 * 83.3 = 2165.8 MHz base frequency
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 10 Feb 2017 05:27:20 +0000 (00:27 -0500)]
tools/power turbostat: Baytrail c-state support
The Baytrail SOC, with its Silvermont core, has some unique properties:
1. a hardware CC1 residency counter
2. a module-c6 residency counter
3. a package-c6 counter at traditional package-c7 counter address.
The SOC does not support c3, pc3, c7 or pc7 counters.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 8 Jan 2017 04:26:22 +0000 (23:26 -0500)]
x86: msr-index.h: Remove unused MSR_NHM_SNB_PKG_CST_CFG_CTL
The two users, intel_idle driver and turbostat utility
are using the new name, MSR_PKG_CST_CONFIG_CONTROL
Cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 8 Jan 2017 04:24:57 +0000 (23:24 -0500)]
tools/power turbostat: use new name for MSR_PKG_CST_CONFIG_CONTROL
Previously called MSR_NHM_SNB_PKG_CST_CFG_CTL
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 8 Jan 2017 04:23:25 +0000 (23:23 -0500)]
intel_idle: use new name for MSR_PKG_CST_CONFIG_CONTROL
previously known as MSR_NHM_SNB_PKG_CST_CFG_CTL
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 8 Jan 2017 04:21:18 +0000 (23:21 -0500)]
x86: msr-index.h: Define MSR_PKG_CST_CONFIG_CONTROL
define MSR_PKG_CST_CONFIG_CONTROL (0xE2),
which is the string used by Intel Documentation.
We use this MSR in intel_idle and turbostat by a previous name,
to be updated in the next patch.
Cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 8 Jan 2017 04:13:15 +0000 (23:13 -0500)]
tools/power turbostat: update MSR_PKG_CST_CONFIG_CONTROL decoding
AMT value 0 is unlimited, not PC0
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 8 Jan 2017 03:40:23 +0000 (22:40 -0500)]
tools/power turbostat: Baytrail: remove debug line in quiet mode
Without --debug, a debug line was printed on Baytrail:
SLM BCLK: 83.3 Mhz
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 8 Jan 2017 03:37:48 +0000 (22:37 -0500)]
tools/power turbostat: decode Baytrail CC6 and MC6 demotion configuration
with --debug, see:
cpu0: MSR_CC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-CC6-Demotion)
cpu0: MSR_MC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-MC6-Demotion)
Note that the hardware default is to enable demotion,
and Linux started clearing these registers in 3.17.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 1 Jan 2017 18:08:33 +0000 (13:08 -0500)]
tools/power turbostat: BYT does not have MSR_MISC_PWR_MGMT
and so --debug fails with:
turbostat: msr 1 offset 0x1aa read failed: Input/output error
It seems that baytrail, and airmont do not have this MSR.
It is included in subsequent Goldmont Atom.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 10 Feb 2017 05:25:41 +0000 (00:25 -0500)]
tools/power turbostat: Add --show and --hide parameters
Add the "--show" and "--hide" cmdline parameters.
By default, turbostat shows all columns.
turbostat --hide counter_list
will continue showing all columns, except for those listed.
turbostat --show counter_list
will show _only_ the listed columns
These features work for built-in counters, and have no effect
on columns added with the --add parameter.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 10 Feb 2017 03:22:13 +0000 (22:22 -0500)]
tools/power turbostat: fix bugs in --add option
When --add was used more than once, overflowed buffers
caused some counters to be stored on top of others,
corrupting the results. Simplify the code by simply
reserving space for up to 16 added counters per each
cpu, core, package.
Per-cpu added counters were being printed only per-core.
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Sun, 19 Feb 2017 22:34:00 +0000 (14:34 -0800)]
Linux 4.10
Al Viro [Sun, 19 Feb 2017 07:15:27 +0000 (07:15 +0000)]
Fix missing sanity check in /dev/sg
What happens is that a write to /dev/sg is given a request with non-zero
->iovec_count combined with zero ->dxfer_len. Or with ->dxferp pointing
to an array full of empty iovecs.
Having write permission to /dev/sg shouldn't be equivalent to the
ability to trigger BUG_ON() while holding spinlocks...
Found by Dmitry Vyukov and syzkaller.
[ The BUG_ON() got changed to a WARN_ON_ONCE(), but this fixes the
underlying issue. - Linus ]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Thumshirn [Tue, 31 Jan 2017 09:16:00 +0000 (10:16 +0100)]
scsi: don't BUG_ON() empty DMA transfers
Don't crash the machine just because of an empty transfer. Use WARN_ON()
combined with returning an error.
Found by Dmitry Vyukov and syzkaller.
[ Changed to "WARN_ON_ONCE()". Al has a patch that should fix the root
cause, but a BUG_ON() is not acceptable in any case, and a WARN_ON()
might still be a cause of excessive log spamming.
NOTE! If this warning ever triggers, we may end up leaking resources,
since this doesn't bother to try to clean the command up. So this
WARN_ON_ONCE() triggering does imply real problems. But BUG_ON() is
much worse.
People really need to stop using BUG_ON() for "this shouldn't ever
happen". It makes pretty much any bug worse. - Linus ]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Willem de Bruijn [Sun, 19 Feb 2017 00:00:45 +0000 (19:00 -0500)]
ipv6: release dst on error in ip6_dst_lookup_tail
If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped
check, release the dst before returning an error.
Fixes:
ec5e3b0a1d41 ("ipv6: Inhibit IPv4-mapped src address on the wire.")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 19 Feb 2017 01:38:09 +0000 (17:38 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Two more bugfixes that came in during this week:
- a defconfig change to enable a vital driver used on some Qualcomm
based phones. This was already queued for 4.11, but the maintainer
asked to have it in 4.10 after all.
- a regression fix for the reset controller framework, this got
broken by a typo in the 4.10 merge window"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: multi_v7_defconfig: enable Qualcomm RPMCC
reset: fix shared reset triggered_count decrement on error
Linus Torvalds [Sun, 19 Feb 2017 01:36:15 +0000 (17:36 -0800)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A couple of fixes from Kees concerning problems he spotted with our
user access support"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user()
ARM: 8657/1: uaccess: consistently check object sizes
Linus Torvalds [Sun, 19 Feb 2017 01:34:56 +0000 (17:34 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"Make the build clean by working around yet another GCC stupidity"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vm86: Fix unused variable warning if THP is disabled
Linus Torvalds [Sun, 19 Feb 2017 01:33:17 +0000 (17:33 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
"Move the futex init function to core initcall so user mode helper does
not run into an uninitialized futex syscall"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Move futex_init() to core_initcall
Linus Torvalds [Sun, 19 Feb 2017 01:30:36 +0000 (17:30 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two small fixes::
- Prevent deadlock on the tick broadcast lock. Found and fixed by
Mike.
- Stop using printk() in the timekeeping debug code to prevent a
deadlock against the scheduler"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Use deferred printk() in debug code
tick/broadcast: Prevent deadlock on tick_broadcast_lock
Linus Torvalds [Sun, 19 Feb 2017 01:29:00 +0000 (17:29 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix leak in dpaa_eth error paths, from Dan Carpenter.
2) Use after free when using IPV6_RECVPKTINFO, from Andrey Konovalov.
3) fanout_release() cannot be invoked from atomic contexts, from Anoob
Soman.
4) Fix bogus attempt at lockdep annotation in IRDA.
5) dev_fill_metadata_dst() can OOP on a NULL dst cache pointer, from
Paolo Abeni.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
irda: Fix lockdep annotations in hashbin_delete().
vxlan: fix oops in dev_fill_metadata_dst
dccp: fix freeing skb too early for IPV6_RECVPKTINFO
dpaa_eth: small leak on error
packet: Do not call fanout_release from atomic contexts
Sergey Senozhatsky [Sat, 18 Feb 2017 11:42:54 +0000 (03:42 -0800)]
printk: use rcuidle console tracepoint
Use rcuidle console tracepoint because, apparently, it may be issued
from an idle CPU:
hw-breakpoint: Failed to enable monitor mode on CPU 0.
hw-breakpoint: CPU 0 failed to disable vector catch
===============================
[ ERR: suspicious RCU usage. ]
4.10.0-rc8-next-
20170215+ #119 Not tainted
-------------------------------
./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 0
RCU used illegally from extended quiescent state!
2 locks held by swapper/0/0:
#0: (cpu_pm_notifier_lock){......}, at: [<
c0237e2c>] cpu_pm_exit+0x10/0x54
#1: (console_lock){+.+.+.}, at: [<
c01ab350>] vprintk_emit+0x264/0x474
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-
20170215+ #119
Hardware name: Generic OMAP4 (Flattened Device Tree)
console_unlock
vprintk_emit
vprintk_default
printk
reset_ctrl_regs
dbg_cpu_pm_notify
notifier_call_chain
cpu_pm_exit
omap_enter_idle_coupled
cpuidle_enter_state
cpuidle_enter_state_coupled
do_idle
cpu_startup_entry
start_kernel
This RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus
disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
lockdep to be enabled "current->lockdep_recursion == 0".
Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Russell King <rmk@armlinux.org.uk>
Cc: <stable@vger.kernel.org> [3.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Gross [Mon, 2 Jan 2017 20:35:05 +0000 (14:35 -0600)]
ARM: multi_v7_defconfig: enable Qualcomm RPMCC
This patch enables the Qualcomm RPM based Clock Controller present on
A-family boards.
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
David S. Miller [Fri, 17 Feb 2017 21:19:39 +0000 (16:19 -0500)]
irda: Fix lockdep annotations in hashbin_delete().
A nested lock depth was added to the hasbin_delete() code but it
doesn't actually work some well and results in tons of lockdep splats.
Fix the code instead to properly drop the lock around the operation
and just keep peeking the head of the hashbin queue.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 17 Feb 2017 21:01:58 +0000 (13:01 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fix from Jens Axboe:
"A single fix for a lockdep splat reported by Thomas and Gabriel"
* 'for-linus' of git://git.kernel.dk/linux-block:
cfq-iosched: don't call wbt_disable_default() with IRQs disabled
Paolo Abeni [Fri, 17 Feb 2017 18:14:27 +0000 (19:14 +0100)]
vxlan: fix oops in dev_fill_metadata_dst
Since the commit
0c1d70af924b ("net: use dst_cache for vxlan device")
vxlan_fill_metadata_dst() calls vxlan_get_route() passing a NULL
dst_cache pointer, so the latter should explicitly check for
valid dst_cache ptr. Unfortunately the commit
d71785ffc7e7 ("net: add
dst_cache to ovs vxlan lwtunnel") removed said check.
As a result is possible to trigger a null pointer access calling
vxlan_fill_metadata_dst(), e.g. with:
ovs-vsctl add-br ovs-br0
ovs-vsctl add-port ovs-br0 vxlan0 -- set interface vxlan0 \
type=vxlan options:remote_ip=192.168.1.1 \
options:key=1234 options:dst_port=4789 ofport_request=10
ip address add dev ovs-br0 172.16.1.2/24
ovs-vsctl set Bridge ovs-br0 ipfix=@i -- --id=@i create IPFIX \
targets=\"172.16.1.1:1234\" sampling=1
iperf -c 172.16.1.1 -u -l 1000 -b 10M -t 1 -p 1234
This commit addresses the issue passing to vxlan_get_route() the
dst_cache already available into the lwt info processed by
vxlan_fill_metadata_dst().
Fixes:
d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Konovalov [Thu, 16 Feb 2017 16:22:46 +0000 (17:22 +0100)]
dccp: fix freeing skb too early for IPV6_RECVPKTINFO
In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet
is forcibly freed via __kfree_skb in dccp_rcv_state_process if
dccp_v6_conn_request successfully returns.
However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb
is saved to ireq->pktopts and the ref count for skb is incremented in
dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed
in dccp_rcv_state_process.
Fix by calling consume_skb instead of doing goto discard and therefore
calling __kfree_skb.
Similar fixes for TCP:
fb7e2399ec17f1004c0e0ccfd17439f8759ede01 [TCP]: skb is unexpectedly freed.
0aea76d35c9651d55bbaf746e7914e5f9ae5a25d tcp: SYN packets are now
simply consumed
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 17 Feb 2017 17:58:32 +0000 (09:58 -0800)]
Merge tag 'powerpc-4.10-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"One fix from Paul: we can not use the radix MMU under a hypervisor for
now.
Although the code checked if the processor supports radix, that is not
sufficient"
* tag 'powerpc-4.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Disable use of radix under a hypervisor
Linus Torvalds [Fri, 17 Feb 2017 17:56:34 +0000 (09:56 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fix from Dmitry Torokhov:
"Just a single change to Elan touchpad driver to recognize a new ACPI
ID"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elan_i2c - add ELAN0605 to the ACPI table
Linus Torvalds [Fri, 17 Feb 2017 17:53:59 +0000 (09:53 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"I2C has a revert to fix a regression"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: designware: detect when dynamic tar update is possible"
Linus Torvalds [Fri, 17 Feb 2017 17:52:33 +0000 (09:52 -0800)]
Merge tag 'mmc-v4.10-rc8' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
"Fix multi-bit bus width without high-speed mode for MMC"
* tag 'mmc-v4.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: fix multi-bit bus width without high-speed mode
Linus Torvalds [Fri, 17 Feb 2017 17:51:05 +0000 (09:51 -0800)]
Merge tag 'ntb-4.10-bugfixes' of git://github.com/jonmason/ntb
Pull NTB bugfixes frfom Jon Mason:
"NTB bug fixes to address a crash when unloading the ntb module, a DMA
engine unmap leak, allowing the proper queue choice, and clearing the
SKX irq bit"
* tag 'ntb-4.10-bugfixes' of git://github.com/jonmason/ntb:
ntb: ntb_hw_intel: link_poll isn't clearing the pending status properly
ntb_transport: Pick an unused queue
ntb: ntb_perf missing dmaengine_unmap_put
NTB: ntb_transport: fix debugfs_remove_recursive
Dan Carpenter [Thu, 16 Feb 2017 09:56:10 +0000 (12:56 +0300)]
dpaa_eth: small leak on error
This should be >= instead of > here. It means that we don't increment
the free count enough so it becomes off by one.
Fixes:
9ad1a3749333 ("dpaa_eth: add support for DPAA Ethernet")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 17 Feb 2017 16:25:15 +0000 (17:25 +0100)]
Merge tag 'reset-for-4.10-fixes' of https://git.pengutronix.de/git/pza/linux into fixes
Pull "Reset controller fixes for v4.10" from Philipp Zabel:
- Remove erroneous negation of the error check of the reset function
to decrement trigger_count in the error case, not on success. This
fixes shared resets to actually only trigger once, as intended.
* tag 'reset-for-4.10-fixes' of https://git.pengutronix.de/git/pza/linux:
reset: fix shared reset triggered_count decrement on error
Anoob Soman [Wed, 15 Feb 2017 20:25:39 +0000 (20:25 +0000)]
packet: Do not call fanout_release from atomic contexts
Commit
6664498280cf ("packet: call fanout_release, while UNREGISTERING a
netdev"), unfortunately, introduced the following issues.
1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside
rcu_read-side critical section. rcu_read_lock disables preemption, most often,
which prohibits calling sleeping functions.
[ ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side critical section!
[ ]
[ ] rcu_scheduler_active = 1, debug_locks = 0
[ ] 4 locks held by ovs-vswitchd/1969:
[ ] #0: (cb_lock){++++++}, at: [<
ffffffff8158a6c9>] genl_rcv+0x19/0x40
[ ] #1: (ovs_mutex){+.+.+.}, at: [<
ffffffffa04878ca>] ovs_vport_cmd_del+0x4a/0x100 [openvswitch]
[ ] #2: (rtnl_mutex){+.+.+.}, at: [<
ffffffff81564157>] rtnl_lock+0x17/0x20
[ ] #3: (rcu_read_lock){......}, at: [<
ffffffff81614165>] packet_notifier+0x5/0x3f0
[ ]
[ ] Call Trace:
[ ] [<
ffffffff813770c1>] dump_stack+0x85/0xc4
[ ] [<
ffffffff810c9077>] lockdep_rcu_suspicious+0x107/0x110
[ ] [<
ffffffff810a2da7>] ___might_sleep+0x57/0x210
[ ] [<
ffffffff810a2fd0>] __might_sleep+0x70/0x90
[ ] [<
ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
[ ] [<
ffffffff810de93f>] ? vprintk_default+0x1f/0x30
[ ] [<
ffffffff81186e88>] ? printk+0x4d/0x4f
[ ] [<
ffffffff816106dd>] fanout_release+0x1d/0xe0
[ ] [<
ffffffff81614459>] packet_notifier+0x2f9/0x3f0
2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock).
"sleeping function called from invalid context"
[ ] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
[ ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd
[ ] INFO: lockdep is turned off.
[ ] Call Trace:
[ ] [<
ffffffff813770c1>] dump_stack+0x85/0xc4
[ ] [<
ffffffff810a2f52>] ___might_sleep+0x202/0x210
[ ] [<
ffffffff810a2fd0>] __might_sleep+0x70/0x90
[ ] [<
ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
[ ] [<
ffffffff816106dd>] fanout_release+0x1d/0xe0
[ ] [<
ffffffff81614459>] packet_notifier+0x2f9/0x3f0
3. calling dev_remove_pack(&fanout->prot_hook), from inside
spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
-> synchronize_net(), which might sleep.
[ ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x00000002
[ ] INFO: lockdep is turned off.
[ ] Call Trace:
[ ] [<
ffffffff813770c1>] dump_stack+0x85/0xc4
[ ] [<
ffffffff81186274>] __schedule_bug+0x64/0x73
[ ] [<
ffffffff8162b8cb>] __schedule+0x6b/0xd10
[ ] [<
ffffffff8162c5db>] schedule+0x6b/0x80
[ ] [<
ffffffff81630b1d>] schedule_timeout+0x38d/0x410
[ ] [<
ffffffff810ea3fd>] synchronize_sched_expedited+0x53d/0x810
[ ] [<
ffffffff810ea6de>] synchronize_rcu_expedited+0xe/0x10
[ ] [<
ffffffff8154eab5>] synchronize_net+0x35/0x50
[ ] [<
ffffffff8154eae3>] dev_remove_pack+0x13/0x20
[ ] [<
ffffffff8161077e>] fanout_release+0xbe/0xe0
[ ] [<
ffffffff81614459>] packet_notifier+0x2f9/0x3f0
4. fanout_release() races with calls from different CPU.
To fix the above problems, remove the call to fanout_release() under
rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order
to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to
__fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
fanout->prot_hook is removed as well.
Fixes:
6664498280cf ("packet: call fanout_release, while UNREGISTERING a netdev")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Anoob Soman <anoob.soman@citrix.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jerome Brunet [Wed, 15 Feb 2017 18:15:51 +0000 (19:15 +0100)]
reset: fix shared reset triggered_count decrement on error
For a shared reset, when the reset is successful, the triggered_count is
incremented when trying to call the reset callback, so that another device
sharing the same reset line won't trigger it again. If the reset has not
been triggered successfully, the trigger_count should be decremented.
The code does the opposite, and decrements the trigger_count on success.
As a consequence, another device sharing the reset will be able to trigger
it again.
Fixed be removing negation in from of the error code of the reset function.
Fixes:
7da33a37b48f ("reset: allow using reset_control_reset with shared reset")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Dave Jiang [Thu, 16 Feb 2017 23:22:36 +0000 (16:22 -0700)]
ntb: ntb_hw_intel: link_poll isn't clearing the pending status properly
On Skylake hardware, the link_poll isn't clearing the pending interrupt
bit. Adding a new function for SKX that handles clearing of status bit the
right way.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Fixes:
783dfa6c ("ntb: Adding Skylake Xeon NTB support")
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Thomas VanSelus [Mon, 13 Feb 2017 22:46:26 +0000 (16:46 -0600)]
ntb_transport: Pick an unused queue
Fix typo causing ntb_transport_create_queue to select the first
queue every time, instead of using the next free queue.
Signed-off-by: Thomas VanSelus <tvanselus@xes-inc.com>
Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Fixes:
fce8a7bb5 ("PCI-Express Non-Transparent Bridge Support")
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Dave Jiang [Mon, 30 Jan 2017 21:21:17 +0000 (14:21 -0700)]
ntb: ntb_perf missing dmaengine_unmap_put
In the normal I/O execution path, ntb_perf is missing a call to
dmaengine_unmap_put() after submission. That causes us to leak
unmap objects.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Fixes:
8a7b6a77 ("ntb: ntb perf tool")
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Allen Hubbe [Tue, 27 Dec 2016 22:57:04 +0000 (17:57 -0500)]
NTB: ntb_transport: fix debugfs_remove_recursive
The call to debugfs_remove_recursive(qp->debugfs_dir) of the sub-level
directory must not be later than
debugfs_remove_recursive(nt_debugfs_dir) of the top-level directory.
Otherwise, the sub-level directory will not exist, and it would be
invalid (panic) to attempt to remove it. This removes the top-level
directory last, after sub-level directories have been cleaned up.
Signed-off-by: Allen Hubbe <Allen.Hubbe@dell.com>
Fixes:
e26a5843f ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Linus Torvalds [Fri, 17 Feb 2017 02:44:38 +0000 (18:44 -0800)]
Merge tag 'drm-fixes-for-v4.10-final' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Just two last minute fixes, one for DP MST oopses and one for a radeon
regression"
* tag 'drm-fixes-for-v4.10-final' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: Use mode h/vdisplay fields to hide out of bounds HW cursor
drm/dp/mst: fix kernel oops when turning off secondary monitor
Dave Airlie [Fri, 17 Feb 2017 01:13:17 +0000 (11:13 +1000)]
Merge branch 'drm-fixes-4.10' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
One regression fix for interlaced modes on radeon
* 'drm-fixes-4.10' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: Use mode h/vdisplay fields to hide out of bounds HW cursor
Linus Torvalds [Thu, 16 Feb 2017 20:19:18 +0000 (12:19 -0800)]
Revert "nohz: Fix collision between tick and other hrtimers"
This reverts commit
24b91e360ef521a2808771633d76ebc68bd5604b and commit
7bdb59f1ad47 ("tick/nohz: Fix possible missing clock reprog after tick
soft restart") that depends on it,
Pavel reports that it causes occasional boot hangs for him that seem to
depend on just how the machine was booted. In particular, his machine
hangs at around the PCI fixups of the EHCI USB host controller, but only
hangs from cold boot, not from a warm boot.
Thomas Gleixner suspecs it's a CPU hotplug interaction, particularly
since Pavel also saw suspend/resume issues that seem to be related.
We're reverting for now while trying to figure out the root cause.
Reported-bisected-and-tested-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org # reverted commits were marked for stable
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 16 Feb 2017 18:22:41 +0000 (10:22 -0800)]
Merge tag 'media/v4.10-5' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"A regression fix that makes the Siano driver to work again after the
CONFIG_VMAP_STACK change"
* tag 'media/v4.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] siano: make it work again with CONFIG_VMAP_STACK
Miklos Szeredi [Thu, 16 Feb 2017 16:49:02 +0000 (17:49 +0100)]
vfs: fix uninitialized flags in splice_to_pipe()
Flags (PIPE_BUF_FLAG_PACKET, PIPE_BUF_FLAG_GIFT) could remain on the
unused part of the pipe ring buffer. Previously splice_to_pipe() left
the flags value alone, which could result in incorrect behavior.
Uninitialized flags appears to have been there from the introduction of
the splice syscall.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # 2.6.17+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 16 Feb 2017 17:05:34 +0000 (09:05 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"Fix a use after free bug introduced in 4.2 and using an uninitialized
value introduced in 4.9"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix uninitialized flags in pipe_buffer
fuse: fix use after free issue in fuse_dev_do_read()
Linus Torvalds [Thu, 16 Feb 2017 17:03:37 +0000 (09:03 -0800)]
Merge tag 'pci-v4.10-fixes-4' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"Add back pcie_pme_remove() so we free the IRQ when removing PCIe port
devices; previously the leaked IRQ caused an MSI BUG_ON"
* tag 'pci-v4.10-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/PME: Restore pcie_pme_driver.remove
Linus Torvalds [Thu, 16 Feb 2017 16:37:18 +0000 (08:37 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) In order to avoid problems in the future, make cgroup bpf overriding
explicit using BPF_F_ALLOW_OVERRIDE. From Alexei Staovoitov.
2) LLC sets skb->sk without proper skb->destructor and this explodes,
fix from Eric Dumazet.
3) Make sure when we have an ipv4 mapped source address, the
destination is either also an ipv4 mapped address or
ipv6_addr_any(). Fix from Jonathan T. Leighton.
4) Avoid packet loss in fec driver by programming the multicast filter
more intelligently. From Rui Sousa.
5) Handle multiple threads invoking fanout_add(), fix from Eric
Dumazet.
6) Since we can invoke the TCP input path in process context, without
BH being disabled, we have to accomodate that in the locking of the
TCP probe. Also from Eric Dumazet.
7) Fix erroneous emission of NETEVENT_DELAY_PROBE_TIME_UPDATE when we
aren't even updating that sysctl value. From Marcus Huewe.
8) Fix endian bugs in ibmvnic driver, from Thomas Falcon.
[ This is the second version of the pull that reverts the nested
rhashtable changes that looked a bit too scary for this late in the
release - Linus ]
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
rhashtable: Revert nested table changes.
ibmvnic: Fix endian errors in error reporting output
ibmvnic: Fix endian error when requesting device capabilities
net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification
net: xilinx_emaclite: fix freezes due to unordered I/O
net: xilinx_emaclite: fix receive buffer overflow
bpf: kernel header files need to be copied into the tools directory
tcp: tcp_probe: use spin_lock_bh()
uapi: fix linux/if_pppol2tp.h userspace compilation errors
packet: fix races in fanout_add()
ibmvnic: Fix initial MTU settings
net: ethernet: ti: cpsw: fix cpsw assignment in resume
kcm: fix a null pointer dereference in kcm_sendmsg()
net: fec: fix multicast filtering hardware setup
ipv6: Handle IPv4-mapped src to in6addr_any dst.
ipv6: Inhibit IPv4-mapped src address on the wire.
net/mlx5e: Disable preemption when doing TC statistics upcall
rhashtable: Add nested tables
tipc: Fix tipc_sk_reinit race conditions
gfs2: Use rhashtable walk interface in glock_hash_walk
...
Michel Dänzer [Wed, 15 Feb 2017 02:28:45 +0000 (11:28 +0900)]
drm/radeon: Use mode h/vdisplay fields to hide out of bounds HW cursor
The crtc_h/vdisplay fields may not match the CRTC viewport dimensions
with special modes such as interlaced ones.
Fixes the HW cursor disappearing in the bottom half of the screen with
interlaced modes.
Fixes:
6b16cf7785a4 ("drm/radeon: Hide the HW cursor while it's out of bounds")
Cc: stable@vger.kernel.org
Reported-by: Ashutosh Kumar <ashutosh.kumar@amd.com>
Tested-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kees Cook [Thu, 16 Feb 2017 00:44:37 +0000 (01:44 +0100)]
ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user()
The 64-bit get_user() wasn't clearing the high word due to a typo in the
error handler. The exception handler entry was already correct, though.
Noticed during recent usercopy test additions in lib/test_user_copy.c.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Kees Cook [Thu, 16 Feb 2017 00:43:58 +0000 (01:43 +0100)]
ARM: 8657/1: uaccess: consistently check object sizes
In commit
76624175dcae ("arm64: uaccess: consistently check object sizes"),
the object size checks are moved outside the access_ok() so that bad
destinations are detected before hitting the "memset(dest, 0, size)" in the
copy_from_user() failure path.
This makes the same change for arm, with attention given to possibly
extracting the uaccess routines into a common header file for all
architectures in the future.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Jens Axboe [Thu, 16 Feb 2017 14:57:33 +0000 (07:57 -0700)]
cfq-iosched: don't call wbt_disable_default() with IRQs disabled
wbt_disable_default() calls del_timer_sync() to wait for the wbt
timer to finish before disabling throttling. We can't do this with
IRQs disable. This fixes a lockdep splat on boot, if non-root
cgroups are used.
Reported-by: Gabriel C <nix.or.die@gmail.com>
Fixes:
87760e5eef35 ("block: hook up writeback throttling")
Signed-off-by: Jens Axboe <axboe@fb.com>
Miklos Szeredi [Thu, 16 Feb 2017 14:08:20 +0000 (15:08 +0100)]
fuse: fix uninitialized flags in pipe_buffer
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes:
d82718e348fe ("fuse_dev_splice_read(): switch to add_to_pipe()")
Cc: <stable@vger.kernel.org> # 4.9+
David S. Miller [Thu, 16 Feb 2017 03:29:51 +0000 (22:29 -0500)]
rhashtable: Revert nested table changes.
This reverts commits:
6a25478077d987edc5e2f880590a2bc5fcab4441
9dbbfb0ab6680c6a85609041011484e6658e7d3c
40137906c5f55c252194ef5834130383e639536f
It's too risky to put in this late in the release
cycle. We'll put these changes into the next merge
window instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Thu, 16 Feb 2017 03:26:41 +0000 (13:26 +1000)]
Merge tag 'drm-misc-fixes-2017-02-15' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes
dp/mst oops fix for v4.10
* tag 'drm-misc-fixes-2017-02-15' of git://anongit.freedesktop.org/git/drm-misc:
drm/dp/mst: fix kernel oops when turning off secondary monitor
Paul Mackerras [Thu, 16 Feb 2017 02:49:21 +0000 (13:49 +1100)]
powerpc/64: Disable use of radix under a hypervisor
Currently, if the kernel is running on a POWER9 processor under a
hypervisor, it may try to use the radix MMU even though it doesn't have
the necessary code to do so (it doesn't negotiate use of radix, and it
doesn't do the H_REGISTER_PROC_TBL hcall). If the hypervisor supports
both radix and HPT, then it will set up the guest to use HPT (since the
guest doesn't request radix in the CAS call), but if the radix feature
bit is set in the ibm,pa-features property (which is valid, since
ibm,pa-features is defined to represent the capabilities of the
processor) the guest will try to use radix, resulting in a crash when
it turns the MMU on.
This makes the minimal fix for the current code, which is to disable
radix unless we are running in hypervisor mode.
Fixes:
2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Thomas Falcon [Wed, 15 Feb 2017 16:33:33 +0000 (10:33 -0600)]
ibmvnic: Fix endian errors in error reporting output
Error reports received from firmware were not being converted from
big endian values, leading to bogus error codes reported on little
endian systems.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Wed, 15 Feb 2017 16:32:11 +0000 (10:32 -0600)]
ibmvnic: Fix endian error when requesting device capabilities
When a vNIC client driver requests a faulty device setting, the
server returns an acceptable value for the client to request.
This 64 bit value was incorrectly being swapped as a 32 bit value,
resulting in loss of data. This patch corrects that by using
the 64 bit swap function.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcus Huewe [Wed, 15 Feb 2017 00:00:36 +0000 (01:00 +0100)]
net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification
When setting a neigh related sysctl parameter, we always send a
NETEVENT_DELAY_PROBE_TIME_UPDATE netevent. For instance, when
executing
sysctl net.ipv6.neigh.wlp3s0.retrans_time_ms=2000
a NETEVENT_DELAY_PROBE_TIME_UPDATE netevent is generated.
This is caused by commit
2a4501ae18b5 ("neigh: Send a
notification when DELAY_PROBE_TIME changes"). According to the
commit's description, it was intended to generate such an event
when setting the "delay_first_probe_time" sysctl parameter.
In order to fix this, only generate this event when actually
setting the "delay_first_probe_time" sysctl parameter. This fix
should not have any unintended side-effects, because all but one
registered netevent callbacks check for other netevent event
types (the registered callbacks were obtained by grepping for
"register_netevent_notifier"). The only callback that uses the
NETEVENT_DELAY_PROBE_TIME_UPDATE event is
mlxsw_sp_router_netevent_event() (in
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c): in case
of this event, it only accesses the DELAY_PROBE_TIME of the
passed neigh_parms.
Fixes:
2a4501ae18b5 ("neigh: Send a notification when DELAY_PROBE_TIME changes")
Signed-off-by: Marcus Huewe <suse-tux@gmx.de>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anssi Hannula [Tue, 14 Feb 2017 17:11:45 +0000 (19:11 +0200)]
net: xilinx_emaclite: fix freezes due to unordered I/O
The xilinx_emaclite uses __raw_writel and __raw_readl for register
accesses. Those functions do not imply any kind of memory barriers and
they may be reordered.
The driver does not seem to take that into account, though, and the
driver does not satisfy the ordering requirements of the hardware.
For clear examples, see xemaclite_mdio_write() and xemaclite_mdio_read()
which try to set MDIO address before initiating the transaction.
I'm seeing system freezes with the driver with GCC 5.4 and current
Linux kernels on Zynq-7000 SoC immediately when trying to use the
interface.
In commit
123c1407af87 ("net: emaclite: Do not use microblaze and ppc
IO functions") the driver was switched from non-generic
in_be32/out_be32 (memory barriers, big endian) to
__raw_readl/__raw_writel (no memory barriers, native endian), so
apparently the device follows system endianness and the driver was
originally written with the assumption of memory barriers.
Rather than try to hunt for each case of missing barrier, just switch
the driver to use iowrite32/ioread32/iowrite32be/ioread32be depending
on endianness instead.
Tested on little-endian Zynq-7000 ARM SoC FPGA.
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Fixes:
123c1407af87 ("net: emaclite: Do not use microblaze and ppc IO
functions")
Signed-off-by: David S. Miller <davem@davemloft.net>
Anssi Hannula [Tue, 14 Feb 2017 17:11:44 +0000 (19:11 +0200)]
net: xilinx_emaclite: fix receive buffer overflow
xilinx_emaclite looks at the received data to try to determine the
Ethernet packet length but does not properly clamp it if
proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer
overflow and a panic via skb_panic() as the length exceeds the allocated
skb size.
Fix those cases.
Also add an additional unconditional check with WARN_ON() at the end.
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Fixes:
bb81b2ddfa19 ("net: add Xilinx emac lite device driver")
Signed-off-by: David S. Miller <davem@davemloft.net>
Yinghai Lu [Wed, 15 Feb 2017 05:17:48 +0000 (21:17 -0800)]
PCI/PME: Restore pcie_pme_driver.remove
In addition to making PME non-modular,
d7def2040077 ("PCI/PME: Make
explicitly non-modular") removed the pcie_pme_driver .remove() method,
pcie_pme_remove().
pcie_pme_remove() freed the PME IRQ that was requested in pci_pme_probe().
The fact that we don't free the IRQ after
d7def2040077 causes the following
crash when removing a PCIe port device via /sys:
------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:370!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 1 PID: 14509 Comm: sh Tainted: G W
4.8.0-rc1-yh-00012-gd29438d
RIP: 0010:[<
ffffffff9758bbf5>] free_msi_irqs+0x65/0x190
...
Call Trace:
[<
ffffffff9758cda4>] pci_disable_msi+0x34/0x40
[<
ffffffff97583817>] cleanup_service_irqs+0x27/0x30
[<
ffffffff97583e9a>] pcie_port_device_remove+0x2a/0x40
[<
ffffffff97584250>] pcie_portdrv_remove+0x40/0x50
[<
ffffffff97576d7b>] pci_device_remove+0x4b/0xc0
[<
ffffffff9785ebe6>] __device_release_driver+0xb6/0x150
[<
ffffffff9785eca5>] device_release_driver+0x25/0x40
[<
ffffffff975702e4>] pci_stop_bus_device+0x74/0xa0
[<
ffffffff975704ea>] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[<
ffffffff97578810>] remove_store+0x50/0x70
[<
ffffffff9785a378>] dev_attr_store+0x18/0x30
[<
ffffffff97260b64>] sysfs_kf_write+0x44/0x60
[<
ffffffff9725feae>] kernfs_fop_write+0x10e/0x190
[<
ffffffff971e13f8>] __vfs_write+0x28/0x110
[<
ffffffff970b0fa4>] ? percpu_down_read+0x44/0x80
[<
ffffffff971e53a7>] ? __sb_start_write+0xa7/0xe0
[<
ffffffff971e53a7>] ? __sb_start_write+0xa7/0xe0
[<
ffffffff971e1f04>] vfs_write+0xc4/0x180
[<
ffffffff971e3089>] SyS_write+0x49/0xa0
[<
ffffffff97001a46>] do_syscall_64+0xa6/0x1b0
[<
ffffffff9819201e>] entry_SYSCALL64_slow_path+0x25/0x25
...
RIP [<
ffffffff9758bbf5>] free_msi_irqs+0x65/0x190
RSP <
ffff89ad3085bc48>
---[ end trace
f4505e1dac5b95d3 ]---
Segmentation fault
Restore pcie_pme_remove().
[bhelgaas: changelog]
Fixes:
d7def2040077 ("PCI/PME: Make explicitly non-modular")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org # v4.9+
Sergey Senozhatsky [Wed, 15 Feb 2017 04:43:32 +0000 (13:43 +0900)]
timekeeping: Use deferred printk() in debug code
We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.
[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-
20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<
c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<
c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.
[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:
[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock
[ 48.951721] Possible unsafe locking scenario:
[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***
[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<
c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<
c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<
c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-
20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<
c0110208>] (unwind_backtrace) from [<
c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<
c010c224>] (show_stack) from [<
c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<
c04ca6c0>] (dump_stack) from [<
c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<
c019b5cc>] (print_circular_bug) from [<
c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<
c019d2a8>] (validate_chain) from [<
c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<
c019ec18>] (__lock_acquire) from [<
c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<
c019f634>] (lock_acquire) from [<
c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<
c01d0ea0>] (ktime_get_update_offsets_now) from [<
c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<
c01cc624>] (retrigger_next_event) from [<
c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<
c01e4e24>] (on_each_cpu) from [<
c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<
c01cafc4>] (clock_was_set_work) from [<
c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<
c015664c>] (process_one_work) from [<
c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<
c0157774>] (worker_thread) from [<
c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<
c015d644>] (kthread) from [<
c0107830>] (ret_from_fork+0x14/0x24)
Replace printk() with printk_deferred(), which does not call into
the scheduler.
Fixes:
0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>