Prarit Bhargava [Mon, 15 Jun 2015 17:43:29 +0000 (13:43 -0400)]
intel_pstate: Fix overflow in busy_scaled due to long delay
The kernel may delay interrupts for a long time which can result in timers
being delayed. If this occurs the intel_pstate driver will crash with a
divide by zero error:
divide error: 0000 [#1] SMP
Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 113 PID: 0 Comm: swapper/113 Tainted: G W -------------- 3.10.0-229.1.2.el7.x86_64 #1
Hardware name: IBM x3950 X6 -[
3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014
task:
ffff880fe8abe660 ti:
ffff880fe8ae4000 task.ti:
ffff880fe8ae4000
RIP: 0010:[<
ffffffff814a9279>] [<
ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
RSP: 0018:
ffff883fff4e3db8 EFLAGS:
00010206
RAX:
0000000027100000 RBX:
ffff883fe6965100 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
0000000000000010 RDI:
000000002e53632d
RBP:
ffff883fff4e3e20 R08:
000e6f69a5a125c0 R09:
ffff883fe84ec001
R10:
0000000000000002 R11:
0000000000000005 R12:
00000000000049f5
R13:
0000000000271000 R14:
00000000000049f5 R15:
0000000000000246
FS:
0000000000000000(0000) GS:
ffff883fff4e0000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f7668601000 CR3:
000000000190a000 CR4:
00000000001407e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Stack:
ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071
ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd
ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100
Call Trace:
<IRQ>
[<
ffffffff81099dc1>] ? run_posix_cpu_timers+0x51/0x840
[<
ffffffff810b69dd>] ? trigger_load_balance+0x5d/0x200
[<
ffffffff814a9100>] ? pid_param_set+0x130/0x130
[<
ffffffff8107df56>] call_timer_fn+0x36/0x110
[<
ffffffff814a9100>] ? pid_param_set+0x130/0x130
[<
ffffffff8107fdcf>] run_timer_softirq+0x21f/0x320
[<
ffffffff81077b2f>] __do_softirq+0xef/0x280
[<
ffffffff816156dc>] call_softirq+0x1c/0x30
[<
ffffffff81015d95>] do_softirq+0x65/0xa0
[<
ffffffff81077ec5>] irq_exit+0x115/0x120
[<
ffffffff81616355>] smp_apic_timer_interrupt+0x45/0x60
[<
ffffffff81614a1d>] apic_timer_interrupt+0x6d/0x80
<EOI>
[<
ffffffff814a9c32>] ? cpuidle_enter_state+0x52/0xc0
[<
ffffffff814a9c28>] ? cpuidle_enter_state+0x48/0xc0
[<
ffffffff814a9d65>] cpuidle_idle_call+0xc5/0x200
[<
ffffffff8101d14e>] arch_cpu_idle+0xe/0x30
[<
ffffffff810c67c1>] cpu_startup_entry+0xf1/0x290
[<
ffffffff8104228a>] start_secondary+0x1ba/0x230
Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29
RIP [<
ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
RSP <
ffff883fff4e3db8>
The kernel values for cpudata for CPU 113 were:
struct cpudata {
cpu = 113,
timer = {
entry = {
next = 0x0,
prev = 0xdead000000200200
},
expires =
8357799745,
base = 0xffff883fe84ec001,
function = 0xffffffff814a9100 <intel_pstate_timer_func>,
data =
18446612406765768960,
<snip>
i_gain = 0,
d_gain = 0,
deadband = 0,
last_err = 22489
},
last_sample_time = {
tv64 =
4063132438017305
},
prev_aperf =
287326796397463,
prev_mperf =
251427432090198,
sample = {
core_pct_busy = 23081,
aperf =
2937407,
mperf =
3257884,
freq =
2524484,
time = {
tv64 =
4063149215234118
}
}
}
which results in the time between samples = last_sample_time - sample.time
=
4063149215234118 -
4063132438017305 =
16777216813 which is 16.777 seconds.
The duration between reads of the APERF and MPERF registers overflowed a s32
sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result
is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.
While the kernel shouldn't be delaying for a long time, it can and does
happen and the intel_pstate driver should not panic in this situation. This
patch changes the div_fp() function to use div64_s64() to allow for "long"
division. This will avoid the overflow condition on long delays.
[v2]: use div64_s64() in div_fp()
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tang Yuantian [Thu, 4 Jun 2015 06:25:42 +0000 (14:25 +0800)]
cpufreq: qoriq: optimize the CPU frequency switching time
Each time the CPU switches its frequency, the clock nodes in
DTS are walked through to find proper clock source. This is
very time-consuming, for example, it is up to 500+ us on T4240.
Besides, switching time varies from clock to clock.
To optimize this, each input clock of CPU is buffered, so that
it can be picked up instantly when needed.
Since for each CPU each input clock is stored in a pointer
which takes 4 or 8 bytes memory and normally there are several
input clocks per CPU, that will not take much memory as well.
Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Shailendra Verma [Sat, 23 May 2015 05:06:49 +0000 (10:36 +0530)]
cpufreq: gx-suspmod: Fix two typos in two comments
Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Shailendra Verma [Sat, 23 May 2015 05:06:18 +0000 (10:36 +0530)]
cpufreq: nforce2: Fix typo in comment to function nforce2_init()
Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Wed, 3 Jun 2015 10:27:13 +0000 (15:57 +0530)]
cpufreq: governor: Serialize governor callbacks
There are several races reported in cpufreq core around governors (only
ondemand and conservative) by different people.
There are at least two race scenarios present in governor code:
(a) Concurrent access/updates of governor internal structures.
It is possible that fields such as 'dbs_data->usage_count', etc. are
accessed simultaneously for different policies using same governor
structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And
because of this we can dereference bad pointers.
For example consider a system with two CPUs with separate 'struct
cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave.
CPU0 switching to powersave and CPU1 to ondemand:
CPU0 CPU1
store* store*
cpufreq_governor_exit() cpufreq_governor_init()
dbs_data = cdata->gdbs_data;
if (!--dbs_data->usage_count)
kfree(dbs_data);
dbs_data->usage_count++;
*Bad pointer dereference*
There are other races possible between EXIT and START/STOP/LIMIT as
well. Its really complicated.
(b) Switching governor state in bad sequence:
For example trying to switch a governor to START state, when the
governor is in EXIT state. There are some checks present in
__cpufreq_governor() but they aren't sufficient as they compare events
against 'policy->governor_enabled', where as we need to take governor's
state into account, which can be used by multiple policies.
These two issues need to be solved separately and the responsibility
should be properly divided between cpufreq and governor core.
The first problem is more about the governor core, as it needs to
protect its structures properly. And the second problem should be fixed
in cpufreq core instead of governor, as its all about sequence of
events.
This patch is trying to solve only the first problem.
There are two types of data we need to protect,
- 'struct common_dbs_data': No matter what, there is going to be a
single copy of this per governor.
- 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we
will have per-policy copy of this data, otherwise a single copy.
Because of such complexities, the mutex present in 'struct dbs_data' is
insufficient to solve our problem. For example we need to protect
fetching of 'dbs_data' from different structures at the beginning of
cpufreq_governor_dbs(), to make sure it isn't currently being updated.
This can be fixed if we can guarantee serialization of event parsing
code for an individual governor. This is best solved with a mutex per
governor, and the placeholder for that is 'struct common_dbs_data'.
And so this patch moves the mutex from 'struct dbs_data' to 'struct
common_dbs_data' and takes it at the beginning and drops it at the end
of cpufreq_governor_dbs().
Tested with and without following configuration options:
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Thu, 4 Jun 2015 11:13:27 +0000 (16:43 +0530)]
cpufreq: governor: split cpufreq_governor_dbs()
cpufreq_governor_dbs() is hardly readable, it is just too big and
complicated. Lets make it more readable by splitting out event specific
routines.
Order of statements is changed at few places, but that shouldn't bring
any functional change.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Wed, 3 Jun 2015 10:27:11 +0000 (15:57 +0530)]
cpufreq: governor: register notifier from cs_init()
Notifiers are required only for conservative governor and the common
governor code is unnecessarily polluted with that. Handle that from
cs_init/exit() instead of cpufreq_governor_dbs().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Mon, 8 Jun 2015 12:55:32 +0000 (18:25 +0530)]
cpufreq: Remove cpufreq_update_policy()
cpufreq_update_policy() was kept as a separate routine earlier as it was
handling migration of sysfs directories, which isn't the case anymore.
It is only updating policy->cpu now and is called by a single caller.
The WARN_ON() isn't really required anymore, as we are just updating the
cpu now, not moving the sysfs directories.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Wed, 10 Jun 2015 00:20:23 +0000 (02:20 +0200)]
cpufreq: Restart governor as soon as possible
__cpufreq_remove_dev_finish() is doing two things today:
- Restarts the governor if some CPUs from concerned policy are still
online.
- Frees the policy if all CPUs are offline.
The first task of restarting the governor can be moved to
__cpufreq_remove_dev_prepare() to restart the governor early. There is
no race between _prepare() and _finish() as they would be handling
completely different cases. _finish() will only be required if we are
going to free the policy and that has nothing to do with restarting the
governor.
Original-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Mon, 8 Jun 2015 12:55:30 +0000 (18:25 +0530)]
cpufreq: Call cpufreq_policy_put_kobj() from cpufreq_policy_free()
cpufreq_policy_put_kobj() is actually part of freeing the policy and can
be called from cpufreq_policy_free() directly instead of a separate
call.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Mon, 8 Jun 2015 12:55:29 +0000 (18:25 +0530)]
cpufreq: Initialize policy->kobj while allocating policy
policy->kobj is required to be initialized once in the lifetime of a
policy. Currently we are initializing it from __cpufreq_add_dev() and
that doesn't look to be the best place for doing so as we have to do
this on special cases (like: !recover_policy).
We can initialize it from a more obvious place cpufreq_policy_alloc()
and that will make code look cleaner, specially the error handling part.
The error handling part of __cpufreq_add_dev() was doing almost the same
thing while recover_policy is true or false. Fix that as well by always
calling cpufreq_policy_put_kobj() with an additional parameter to skip
notification part of it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Wed, 10 Jun 2015 00:13:21 +0000 (02:13 +0200)]
cpufreq: Stop migrating sysfs files on hotplug
When we hot-unplug a cpu, we remove its sysfs cpufreq directory and if
the outgoing cpu was the owner of policy->kobj earlier then we migrate
the sysfs directory to under another online cpu.
There are few disadvantages this brings:
- Code Complexity
- Slower hotplug/suspend/resume
- sysfs file permissions are reset after all policy->cpus are offlined
- CPUFreq stats history lost after all policy->cpus are offlined
- Special management of sysfs stuff during suspend/resume
To overcome these, this patch modifies the way sysfs directories are
managed:
- Select sysfs kobjects owner while initializing policy and don't change
it during hotplugs. Track it with kobj_cpu created earlier.
- Create symlinks for all related CPUs (can be offline) instead of
affected CPUs on policy initialization and remove them only when the
policy is freed.
- Free policy structure only on the removal of cpufreq-driver and not
during hotplug/suspend/resume, detected by checking 'struct
subsys_interface *' (Valid only when called from
subsys_interface_unregister() while unregistering driver).
Apart from this, special care is taken to handle physical hoplug of CPUs
as we wouldn't remove sysfs links or remove policies on logical
hotplugs. Physical hotplug happens in the following sequence.
Hot removal:
- CPU is offlined first, ~ 'echo 0 >
/sys/devices/system/cpu/cpuX/online'
- Then its device is removed along with all sysfs files, cpufreq core
notified with cpufreq_remove_dev() callback from subsys-interface..
Hot addition:
- First the device along with its sysfs files is added, cpufreq core
notified with cpufreq_add_dev() callback from subsys-interface..
- CPU is onlined, ~ 'echo 1 > /sys/devices/system/cpu/cpuX/online'
We call the same routines with both hotplug and subsys callbacks, and we
sense physical hotplug with cpu_offline() check in subsys callback. We
can handle most of the stuff with regular hotplug callback paths and
add/remove cpufreq sysfs links or free policy from subsys callbacks.
Original-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Wed, 10 Jun 2015 00:11:45 +0000 (02:11 +0200)]
cpufreq: Don't allow updating inactive policies from sysfs
Later commits would change the way policies are managed today. Policies
wouldn't be freed on cpu hotplug (currently they aren't freed only for
suspend), and while the CPU is offline, the sysfs cpufreq files would
still be present.
User may accidentally try to update the sysfs files in following
directory: '/sys/devices/system/cpu/cpuX/cpufreq/'. And that would
result in undefined behavior as policy wouldn't be active then.
Apart from updating the store() routine, we also update __cpufreq_get()
which can call cpufreq_out_of_sync(). The later routine tries to update
policy->cur and starts notifying kernel about it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Doug Smythies [Tue, 2 Jun 2015 04:12:34 +0000 (21:12 -0700)]
intel_pstate: Force setting target pstate when required
During initialization and exit it is possible that the target pstate
might not actually be set. Furthermore, the result can be that the
driver and the processor are out of synch and, under some conditions,
the driver might never send the processor the proper target pstate.
This patch adds a bypass or do_checks flag to the call to
intel_pstate_set_pstate. If bypass, then specifically bypass clamp
checks and the do not send if it is the same as last time check. If
do_checks, then, and as before, do the current policy clamp checks,
and do not do actual send if the new target is the same as the old.
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Reported-by: Marien Zwart <marien.zwart@gmail.com>
Reported-by: Alex Lochmann <alexander.lochmann@tu-dortmund.de>
Reported-by: Piotr Ko?aczkowski <pkolaczk@gmail.com>
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
Tested-by: Marien Zwart <marien.zwart@gmail.com>
Tested-by: Doug Smythies <dsmythies@telus.net>
[ rjw: Dropped pointless symbol definitions, rebased ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Doug Smythies [Sun, 31 May 2015 14:46:47 +0000 (07:46 -0700)]
intel_pstate: change some inconsistent debug information
Commit
ce717613f3fb (intel_pstate: Turn per cpu printk into pr_debug)
turned per cpu printk into pr_debug. However, only half of the change
was done, introducing an inconsistency between entry and exit from
driver pstate control. This patch changes the exit message to pr_debug
also.
The various messages are inconsistent with respect to any identifier
text that can be used to help isolate the desired information from a
huge log. This patch makes a consistent identifier portion of the
string.
Amends:
ce717613f3fb (intel_pstate: Turn per cpu printk into pr_debug)
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Saravana Kannan [Mon, 18 May 2015 05:13:31 +0000 (10:43 +0530)]
cpufreq: Track cpu managing sysfs kobjects separately
In order to prepare for the next few commits, that will stop migrating
sysfs files on cpu hotplug, this patch starts managing sysfs-cpu
separately.
The behavior is still the same as we are still migrating sysfs files on
hotplug, later commits would change that.
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Shailendra Verma [Fri, 22 May 2015 17:18:22 +0000 (22:48 +0530)]
cpufreq: Fix for typos in two comments
Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Tue, 12 May 2015 06:52:51 +0000 (12:22 +0530)]
cpufreq: Mark policy->governor = NULL for inactive policies
Later commits would change the way policies are managed today. Policies
wouldn't be freed on cpu hotplug (currently they aren't freed on
suspend), and while the CPU is offline, the sysfs cpufreq files would
still be present.
Because we don't mark policy->governor as NULL, it still contains
pointer of the last used governor. And if the governor is removed, while
all the CPUs of a policy are hotplugged out, this pointer wouldn't be
valid anymore. And if we try to read the 'scaling_governor', etc. from
sysfs, it will result in kernel OOPs.
To prevent this, mark policy->governor as NULL for all inactive policies
while the governor is removed from kernel.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Tue, 12 May 2015 06:52:34 +0000 (12:22 +0530)]
cpufreq: Manage governor usage history with 'policy->last_governor'
History of which governor was used last is common to all CPUs within a
policy and maintaining it per-cpu isn't the best approach for sure.
Apart from wasting memory, this also increases the complexity of
managing this data structure as it has to be updated for all CPUs.
To make that somewhat simpler, lets store this information in a new
field 'last_governor' in struct cpufreq_policy and update it on removal
of last cpu of a policy.
As a side-effect it also solves an old problem, consider a system with
two clusters 0 & 1. And there is one policy per cluster.
Cluster 0: CPU0 and 1.
Cluster 1: CPU2 and 3.
- CPU2 is first brought online, and governor is set to performance
(default as cpufreq_cpu_governor wasn't set).
- Governor is changed to ondemand.
- CPU2 is taken offline and cpufreq_cpu_governor is updated for CPU2.
- CPU3 is brought online.
- Because cpufreq_cpu_governor wasn't set for CPU3, the default governor
performance is picked for CPU3.
This patch fixes the bug as we now have a single variable to update for
policy.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Tue, 12 May 2015 06:52:12 +0000 (12:22 +0530)]
cpufreq: Don't traverse all active policies to find policy for a cpu
We reach here while adding policy for a CPU and enter into the 'if'
block only if a policy already exists for the CPU.
As cpufreq_cpu_data is set for all policy->related_cpus now, when the
policy is first added, we can use that to find the CPU's policy instead
of traversing the list of all active policies.
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Fri, 8 May 2015 06:23:46 +0000 (11:53 +0530)]
cpufreq: Get rid of cpufreq_cpu_data_fallback
We can extract the same information from cpufreq_cpu_data as it is also
available for inactive policies now. And so don't need
cpufreq_cpu_data_fallback anymore.
Also add a WARN_ON() for the case where we try to restore from an active
policy.
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Fri, 8 May 2015 06:23:45 +0000 (11:53 +0530)]
cpufreq: Don't clear cpufreq_cpu_data and policy list for inactive policies
Now that we can check policy->cpus to find if policy is active or not,
we don't need to clean cpufreq_cpu_data and delete policy from the list
on light weight tear down of policies (like in suspend).
To make it consistent and clean, set cpufreq_cpu_data for all related
CPUs when the policy is first created and clean it only while it is
freed.
Also update cpufreq_cpu_get_raw() to check if cpu is part of
policy->cpus mask, so that we don't end up getting policies for offline
CPUs.
In order to make sure that no users of 'policy' are using an inactive
policy, use cpufreq_cpu_get_raw() instead of directly accessing
cpufreq_cpu_data.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Tue, 12 May 2015 06:50:11 +0000 (12:20 +0530)]
cpufreq: Create for_each_{in}active_policy()
policy->cpus is cleared unconditionally now on hotplug-out of a CPU and
it can be checked to know if a policy is active or not. Create helper
routines to iterate over all active/inactive policies, based on
policy->cpus field.
Replace all instances of for_each_policy() with for_each_active_policy()
to make them iterate only for active policies. (We haven't made changes
yet to keep inactive policies in the same list, but that will be
followed in a later patch).
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sudeep Holla [Wed, 13 May 2015 12:35:52 +0000 (13:35 +0100)]
cpufreq: arm_big_little: remove compile-time dependency on BIG_LITTLE
With the addition of switcher code, there's compile-time dependency on
BIG_LITTLE to get arm_big_little driver compiling on ARM64. Since ARM64
will never add support for bL switcher, it's better to remove the
dependency so that the driver can be reused on ARM64 platforms.
This patch adds stubs to remove BIG_LITTLE dependency in the driver.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Joe Konno [Tue, 12 May 2015 14:59:42 +0000 (07:59 -0700)]
intel_pstate: set BYT MSR with wrmsrl_on_cpu()
Commit
007bea098b86 (intel_pstate: Add setting voltage value for
baytrail P states.) introduced byt_set_pstate() with the assumption that
it would always be run by the CPU whose MSR is to be written by it. It
turns out, however, that is not always the case in practice, so modify
byt_set_pstate() to enforce the MSR write done by it to always happen on
the right CPU.
Fixes:
007bea098b86 (intel_pstate: Add setting voltage value for baytrail P states.)
Signed-off-by: Joe Konno <joe.konno@intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Thu, 19 Feb 2015 11:32:07 +0000 (17:02 +0530)]
cpufreq: Clear policy->cpus even for the last CPU
We clear policy->cpus mask while CPUs are hotplugged out. We do it for all CPUs
except the last CPU of the policy. I don't remember what the rationale behind
that was, but I couldn't think of anything that will break if we remove this
conditional clearing and always clear policy->cpus.
The benefit we get out of it is, we can know if a policy is active or not by
checking if this field is empty or not. That will be used by later commits.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Thu, 19 Feb 2015 11:32:06 +0000 (17:02 +0530)]
cpufreq: Keep a single path for adding managed CPUs
There are two cases when we may try to add CPUs we're already handling:
- On boot, the first cpu has marked all policy->cpus managed and so we
will find policy for all other policy->cpus later on.
- When a managed cpu is hotplugged out and later brought back in.
Currently, separate paths and checks take care of the two. While the
first one is detected by testing cpu against 'policy->cpus', the other
one is detected by testing cpu against 'policy->related_cpus'.
We can handle them both via a single path and there is no need to do
special checking for the first one.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
[ rjw: Changelog, comments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Thu, 19 Feb 2015 11:32:05 +0000 (17:02 +0530)]
cpufreq: Throw warning when we try to get policy for an invalid CPU
Simply returning here with an error is not enough. It shouldn't be allowed at
all to try calling cpufreq_cpu_get() for an invalid CPU.
Add a WARN here to make it clear that it wouldn't be acceptable at all.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Thu, 19 Feb 2015 11:32:04 +0000 (17:02 +0530)]
cpufreq: Merge __cpufreq_add_dev() and cpufreq_add_dev()
cpufreq_add_dev() is an unnecessary wrapper over __cpufreq_add_dev(). Merge
them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Thu, 19 Feb 2015 11:32:03 +0000 (17:02 +0530)]
cpufreq: Add doc style comment about cpufreq_cpu_{get|put}()
This clearly states what the code inside these routines is doing and how these
must be used.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Wang Long [Tue, 5 May 2015 01:22:26 +0000 (01:22 +0000)]
Documentation: cpufreq: delete duplicate description of sysfs interface 'scaling_driver'
The file 'Documentation/cpu-freq/user-guide.txt' has duplicate
description of sysfs interface 'scaling_driver'.
[first]
scaling_driver : this file shows what cpufreq driver is
used to set the frequency on this CPU
[second]
scaling_driver : Hardware driver for cpufreq.
Although this does not affect anything, I think we should only have
one. so delete the second one because the first one is described in
more detail.
Signed-off-by: Wang Long <long.wanglong@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Doug Smythies [Sun, 12 Apr 2015 04:10:26 +0000 (21:10 -0700)]
intel_pstate: Add tsc collection and keep previous target pstate
The intel_pstate driver is difficult to debug and investigate without tsc.
Also, it is likely use of tsc, and some version of C0 percentage,
will be re-introdcued in futute.
There have also been occasions where it is desirebale to know, and
confirm, the previous target pstate.
This patch brings back tsc, adds previous target pstate,
and adds both to the trace data collection.
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sudeep Holla [Mon, 27 Apr 2015 09:51:06 +0000 (10:51 +0100)]
cpufreq: arm_big_little: remove unused cpu-cluster.<n> clock name
The "cpu-cluster.<n>" used to get the cluster clock is not used by any
platform. Moreover __of_clk_get_by_name used in clk_get return error if
the "clock-names" in the DT doesn't match this string. When using DT,
it's not compulsory to specify the clock name unless there are multiple
clock input entries in the consumer.
This patch removes the unused clock string from the driver.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sudeep Holla [Mon, 27 Apr 2015 09:51:05 +0000 (10:51 +0100)]
cpufreq: arm_big_little: check if the frequency is set correctly
The actual frequency is set through "clk_change_rate" which is void
function. If the underlying hardware fails and returns error, the error
is lost in the clk layer. In order to track such failures, we need to
read back the frequency(just the cached value as clk_recalc called after
clk->ops->set_rate gets the frequency)
This patch adds check to see if the frequency is set correctly or if
they were any hardware failures and sends the appropriate errors to the
cpufreq core.
Reviewed-by: Michael Turquette <mike.turquette@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fabian Frederick [Fri, 1 May 2015 08:34:01 +0000 (10:34 +0200)]
cpufreq: pxa: make pxa_freqs arrays const
pxa255_run_freqs and pxa255_turbo_freqs are only read.
This patch updates arrays declaration, find_freq_tables()
and its callsites.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fabian Frederick [Fri, 1 May 2015 08:34:00 +0000 (10:34 +0200)]
cpufreq: pxa: replace typedef pxa_freqs_t by structure
typedef is not really useful here. Replace it by structure
to improve readability. typedef should only be used in some cases.
(See Documentation/CodingStyle Chapter 5 for details).
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Mon, 4 May 2015 02:22:23 +0000 (19:22 -0700)]
Linux 4.1-rc2
Linus Torvalds [Mon, 4 May 2015 01:23:53 +0000 (18:23 -0700)]
Merge tag 'for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Some miscellaneous bug fixes and some final on-disk and ABI changes
for ext4 encryption which provide better security and performance"
* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix growing of tiny filesystems
ext4: move check under lock scope to close a race.
ext4: fix data corruption caused by unwritten and delayed extents
ext4 crypto: remove duplicated encryption mode definitions
ext4 crypto: do not select from EXT4_FS_ENCRYPTION
ext4 crypto: add padding to filenames before encrypting
ext4 crypto: simplify and speed up filename encryption
Linus Torvalds [Mon, 4 May 2015 01:15:48 +0000 (18:15 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"One intel fix, one rockchip fix, and a bunch of radeon fixes for some
regressions from audio rework and vm stability"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915/chv: Implement WaDisableShadowRegForCpd
drm/radeon: fix userptr return value checking (v2)
drm/radeon: check new address before removing old one
drm/radeon: reset BOs address after clearing it.
drm/radeon: fix lockup when BOs aren't part of the VM on release
drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
drm/radeon: adjust pll when audio is not enabled
drm/radeon: only enable audio streams if the monitor supports it
drm/radeon: only mark audio as connected if the monitor supports it (v3)
drm/radeon/audio: don't enable packets until the end
drm/radeon: drop dce6_dp_enable
drm/radeon: fix ordering of AVI packet setup
drm/radeon: Use drm_calloc_ab for CS relocs
drm/rockchip: fix error check when getting irq
MAINTAINERS: add entry for Rockchip drm drivers
Dave Airlie [Sun, 3 May 2015 22:56:47 +0000 (08:56 +1000)]
Merge tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Just a single intel fix
* tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel:
drm/i915/chv: Implement WaDisableShadowRegForCpd
Dave Airlie [Sun, 3 May 2015 22:56:27 +0000 (08:56 +1000)]
Merge branch 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip into drm-fixes
one fix and maintainers update
* 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip:
drm/rockchip: fix error check when getting irq
MAINTAINERS: add entry for Rockchip drm drivers
Linus Torvalds [Sun, 3 May 2015 20:22:32 +0000 (13:22 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is three logical fixes (as 5 patches).
The 3ware class of drivers were causing an oops with multiqueue by
tearing down the command mappings after completing the command (where
the variables in the command used to tear down the mapping were
no-longer valid). There's also a fix for the qnap iscsi target which
was choking on us sending it commands that were too long and a fix for
the reworked aha1542 allocating GFP_KERNEL under a lock"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
3w-9xxx: fix command completion race
3w-xxxx: fix command completion race
3w-sas: fix command completion race
aha1542: Allocate memory before taking a lock
SCSI: add 1024 max sectors black list flag
Linus Torvalds [Sun, 3 May 2015 17:49:04 +0000 (10:49 -0700)]
Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave dmaengine fixes from Vinod Koul:
"Here are the fixes in dmaengine subsystem for rc2:
- privatecnt fix for slave dma request API by Christopher
- warn fix for PM ifdef in usb-dmac by Geert
- fix hardware dependency for xgene by Jean"
* 'next' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: increment privatecnt when using dma_get_any_slave_channel
dmaengine: xgene: Set hardware dependency
dmaengine: usb-dmac: Protect PM-only functions to kill warning
Linus Torvalds [Sun, 3 May 2015 17:28:36 +0000 (10:28 -0700)]
Merge tag 'powerpc-4.1-3' of git://git./linux/kernel/git/mpe/linux
Pull powerpc fixes from Michael Ellerman:
- build fix for SMP=n in book3s_xics.c
- fix for Daniel's pci_controller_ops on powernv.
- revert the TM syscall abort patch for now.
- CPU affinity fix from Nathan.
- two EEH fixes from Gavin.
- fix for CR corruption from Sam.
- selftest build fix.
* tag 'powerpc-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc/powernv: Restore non-volatile CRs after nap
powerpc/eeh: Delay probing EEH device during hotplug
powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()
powerpc/pseries: Correct cpu affinity for dlpar added cpus
selftests/powerpc: Fix the pmu install rule
Revert "powerpc/tm: Abort syscalls in active transactions"
powerpc/powernv: Fix early pci_controller_ops loading.
powerpc/kvm: Fix SMP=n build error in book3s_xics.c
Jan Kara [Sun, 3 May 2015 03:58:32 +0000 (23:58 -0400)]
ext4: fix growing of tiny filesystems
The estimate of necessary transaction credits in ext4_flex_group_add()
is too pessimistic. It reserves credit for sb, resize inode, and resize
inode dindirect block for each group added in a flex group although they
are always the same block and thus it is enough to account them only
once. Also the number of modified GDT block is overestimated since we
fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block.
Make the estimation more precise. That reduces number of requested
credits enough that we can grow 20 MB filesystem (which has 1 MB
journal, 79 reserved GDT blocks, and flex group size 16 by default).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Davide Italiano [Sun, 3 May 2015 03:21:15 +0000 (23:21 -0400)]
ext4: move check under lock scope to close a race.
fallocate() checks that the file is extent-based and returns
EOPNOTSUPP in case is not. Other tasks can convert from and to
indirect and extent so it's safe to check only after grabbing
the inode mutex.
Signed-off-by: Davide Italiano <dccitaliano@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Lukas Czerner [Sun, 3 May 2015 01:36:55 +0000 (21:36 -0400)]
ext4: fix data corruption caused by unwritten and delayed extents
Currently it is possible to lose whole file system block worth of data
when we hit the specific interaction with unwritten and delayed extents
in status extent tree.
The problem is that when we insert delayed extent into extent status
tree the only way to get rid of it is when we write out delayed buffer.
However there is a limitation in the extent status tree implementation
so that when inserting unwritten extent should there be even a single
delayed block the whole unwritten extent would be marked as delayed.
At this point, there is no way to get rid of the delayed extents,
because there are no delayed buffers to write out. So when a we write
into said unwritten extent we will convert it to written, but it still
remains delayed.
When we try to write into that block later ext4_da_map_blocks() will set
the buffer new and delayed and map it to invalid block which causes
the rest of the block to be zeroed loosing already written data.
For now we can fix this by simply not allowing to set delayed status on
written extent in the extent status tree. Also add WARN_ON() to make
sure that we notice if this happens in the future.
This problem can be easily reproduced by running the following xfs_io.
xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
-c "falloc 0 131072" \
-c "pwrite -S 0xbb 65536 2048" \
-c "fsync" /mnt/test/fff
echo 3 > /proc/sys/vm/drop_caches
xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff
This can be theoretically also reproduced by at random by running fsx,
but it's not very reliable, though on machines with bigger page size
(like ppc) this can be seen more often (especially xfstest generic/127)
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Chanho Park [Sat, 2 May 2015 14:29:22 +0000 (10:29 -0400)]
ext4 crypto: remove duplicated encryption mode definitions
This patch removes duplicated encryption modes which were already in
ext4.h. They were duplicated from commit
3edc18d and commit f542fb.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Herbert Xu [Sat, 2 May 2015 14:29:19 +0000 (10:29 -0400)]
ext4 crypto: do not select from EXT4_FS_ENCRYPTION
This patch adds a tristate EXT4_ENCRYPTION to do the selections
for EXT4_FS_ENCRYPTION because selecting from a bool causes all
the selected options to be built-in, even if EXT4 itself is a
module.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Sat, 2 May 2015 03:51:04 +0000 (20:51 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Receive packet length needs to be adjust by 2 on RX to accomodate
the two padding bytes in altera_tse driver. From Vlastimil Setka.
2) If rx frame is dropped due to out of memory in macb driver, we leave
the receive ring descriptors in an undefined state. From Punnaiah
Choudary Kalluri
3) Some netlink subsystems erroneously signal NLM_F_MULTI. That is
only for dumps. Fix from Nicolas Dichtel.
4) Fix mis-use of raw rt->rt_pmtu value in ipv4, one must always go via
the ipv4_mtu() helper. From Herbert Xu.
5) Fix null deref in bridge netfilter, and miscalculated lengths in
jump/goto nf_tables verdicts. From Florian Westphal.
6) Unhash ping sockets properly.
7) Software implementation of BPF divide did 64/32 rather than 64/64
bit divide. The JITs got it right. Fix from Alexei Starovoitov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
ipv4: Missing sk_nulls_node_init() in ping_unhash().
net: fec: Fix RGMII-ID mode
net/mlx4_en: Schedule napi when RX buffers allocation fails
netxen_nic: use spin_[un]lock_bh around tx_clean_lock
net/mlx4_core: Fix unaligned accesses
mlx4_en: Use correct loop cursor in error path.
cxgb4: Fix MC1 memory offset calculation
bnx2x: Delay during kdump load
net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_table
net: dsa: Fix scope of eeprom-length property
net: macb: Fix race condition in driver when Rx frame is dropped
hv_netvsc: Fix a bug in netvsc_start_xmit()
altera_tse: Correct rx packet length
mlx4: Fix tx ring affinity_mask creation
tipc: fix problem with parallel link synchronization mechanism
tipc: remove wrong use of NLM_F_MULTI
bridge/nl: remove wrong use of NLM_F_MULTI
bridge/mdb: remove wrong use of NLM_F_MULTI
net: sched: act_connmark: don't zap skb->nfct
trivial: net: systemport: bcmsysport.h: fix 0x0x prefix
...
Stefan Hajnoczi [Fri, 1 May 2015 23:12:29 +0000 (08:42 +0930)]
virtio: fix typo in vring_need_event() doc comment
Here the "other side" refers to the guest or host.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Fri, 1 May 2015 23:12:38 +0000 (08:42 +0930)]
virtio: pass baton to Michael Tsirkin
With my job change kernel work will be "own time"; I'm keeping lguest
and modules (and the virtio standards work), but virtio kernel has to
go.
This makes it clear that Michael is in charge. He's good, but having
me watch over his shoulder won't help.
Good luck Michael!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 2 May 2015 03:35:39 +0000 (20:35 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph RBD fix from Sage Weil.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: end I/O the entire obj_request on error
David S. Miller [Sat, 2 May 2015 02:02:47 +0000 (22:02 -0400)]
ipv4: Missing sk_nulls_node_init() in ping_unhash().
If we don't do that, then the poison value is left in the ->pprev
backlink.
This can cause crashes if we do a disconnect, followed by a connect().
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Wen Xu <hotdog3645@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Dryomov [Sat, 25 Apr 2015 12:56:15 +0000 (15:56 +0300)]
rbd: end I/O the entire obj_request on error
When we end I/O struct request with error, we need to pass
obj_request->length as @nr_bytes so that the entire obj_request worth
of bytes is completed. Otherwise block layer ends up confused and we
trip on
rbd_assert(more ^ (which == img_request->obj_request_count));
in rbd_img_obj_callback() due to more being true no matter what. We
already do it in most cases but we are missing some, in particular
those where we don't even get a chance to submit any obj_requests, due
to an early -ENOMEM for example.
A number of obj_request->xferred assignments seem to be redundant but
I haven't touched any of obj_request->xferred stuff to keep this small
and isolated.
Cc: Alex Elder <elder@linaro.org>
Cc: stable@vger.kernel.org # 3.10+
Reported-by: Shawn Edwards <lesser.evil@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Theodore Ts'o [Fri, 1 May 2015 20:56:50 +0000 (16:56 -0400)]
ext4 crypto: add padding to filenames before encrypting
This obscures the length of the filenames, to decrease the amount of
information leakage. By default, we pad the filenames to the next 4
byte boundaries. This costs nothing, since the directory entries are
aligned to 4 byte boundaries anyway. Filenames can also be padded to
8, 16, or 32 bytes, which will consume more directory space.
Change-Id: Ibb7a0fb76d2c48e2061240a709358ff40b14f322
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Fri, 1 May 2015 20:56:45 +0000 (16:56 -0400)]
ext4 crypto: simplify and speed up filename encryption
Avoid using SHA-1 when calculating the user-visible filename when the
encryption key is available, and avoid decrypting lots of filenames
when searching for a directory entry in a directory block.
Change-Id: If4655f144784978ba0305b597bfa1c8d7bb69e63
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Fri, 1 May 2015 14:46:21 +0000 (07:46 -0700)]
Merge branch 'for-linus-4.1' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"A few more btrfs fixes.
These range from corners Filipe found in the new free space cache
writeback to a grab bag of fixes from the list"
* 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: btrfs_release_extent_buffer_page didn't free pages of dummy extent
Btrfs: fill ->last_trans for delayed inode in btrfs_fill_inode.
btrfs: unlock i_mutex after attempting to delete subvolume during send
btrfs: check io_ctl_prepare_pages return in __btrfs_write_out_cache
btrfs: fix race on ENOMEM in alloc_extent_buffer
btrfs: handle ENOMEM in btrfs_alloc_tree_block
Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole
Btrfs: don't check for delalloc_bytes in cache_save_setup
Btrfs: fix deadlock when starting writeback of bg caches
Btrfs: fix race between start dirty bg cache writeout and bg deletion
Linus Torvalds [Fri, 1 May 2015 14:44:32 +0000 (07:44 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Not too much here, but we've addressed a couple of nasty issues in the
dma-mapping code as well as adding the halfword and byte variants of
load_acquire/store_release following on from the CSD locking bug that
you fixed in the core.
- fix perf devicetree warnings at probe time
- fix memory leak in __dma_free()
- ensure DMA buffers are always zeroed
- show IRQ trigger in /proc/interrupts (for parity with ARM)
- implement byte and halfword access for smp_{load_acquire,store_release}"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: perf: Fix the pmu node name in warning message
arm64: perf: don't warn about missing interrupt-affinity property for PPIs
arm64: add missing PAGE_ALIGN() to __dma_free()
arm64: dma-mapping: always clear allocated buffers
ARM64: Enable CONFIG_GENERIC_IRQ_SHOW_LEVEL
arm64: add missing data types in smp_load_acquire/smp_store_release
Sam Bobroff [Fri, 1 May 2015 06:50:34 +0000 (16:50 +1000)]
powerpc/powernv: Restore non-volatile CRs after nap
Patches
7cba160ad "powernv/cpuidle: Redesign idle states management"
and
77b54e9f2 "powernv/powerpc: Add winkle support for offline cpus"
use non-volatile condition registers (cr2, cr3 and cr4) early in the system
reset interrupt handler (system_reset_pSeries()) before it has been determined
if state loss has occurred. If state loss has not occurred, control returns via
the power7_wakeup_noloss() path which does not restore those condition
registers, leaving them corrupted.
Fix this by restoring the condition registers in the power7_wakeup_noloss()
case.
This is apparent when running a KVM guest on hardware that does not
support winkle or sleep and the guest makes use of secondary threads. In
practice this means Power7 machines, though some early unreleased Power8
machines may also be susceptible.
The secondary CPUs are taken off line before the guest is started and
they call pnv_smp_cpu_kill_self(). This checks support for sleep
states (in this case there is no support) and power7_nap() is called.
When the CPU is woken, power7_nap() returns and because the CPU is
still off line, the main while loop executes again. The sleep states
support test is executed again, but because the tested values cannot
have changed, the compiler has optimized the test away and instead we
rely on the result of the first test, which has been left in cr3
and/or cr4. With the result overwritten, the wrong branch is taken and
power7_winkle() is called on a CPU that does not support it, leading
to it stalling.
Fixes:
7cba160ad789 ("powernv/cpuidle: Redesign idle states management")
Fixes:
77b54e9f213f ("powernv/powerpc: Add winkle support for offline cpus")
[mpe: Massage change log a bit more]
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Thu, 30 Apr 2015 23:22:15 +0000 (09:22 +1000)]
powerpc/eeh: Delay probing EEH device during hotplug
Commit
1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH
devices in early stage, which is reasonable to pSeries platform.
However, it's wrong for PowerNV platform because the PE# isn't
determined until the resources (IO and MMIO) are assigned to
PE in hotplug case. So we have to delay probing EEH devices
for PowerNV platform until the PE# is assigned.
Fixes:
ff57b454ddb9 ("powerpc/eeh: Do probe on pci_dn")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Thu, 30 Apr 2015 23:14:11 +0000 (09:14 +1000)]
powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()
When asserting reset in pcibios_set_pcie_reset_state(), the PE
is enforced to (hardware) frozen state in order to drop unexpected
PCI transactions (except PCI config read/write) automatically by
hardware during reset, which would cause recursive EEH error.
However, the (software) frozen state EEH_PE_ISOLATED is missed.
When users get 0xFF from PCI config or MMIO read, EEH_PE_ISOLATED
is set in PE state retrival backend. Unfortunately, nobody (the
reset handler or the EEH recovery functinality in host) will clear
EEH_PE_ISOLATED when the PE has been passed through to guest.
The patch sets and clears EEH_PE_ISOLATED properly during reset
in function pcibios_set_pcie_reset_state() to fix the issue.
Fixes:
28158cd ("Enhance pcibios_set_pcie_reset_state()")
Reported-by: Carol L. Soto <clsoto@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Carol L. Soto <clsoto@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nathan Fontenot [Thu, 30 Apr 2015 01:42:06 +0000 (20:42 -0500)]
powerpc/pseries: Correct cpu affinity for dlpar added cpus
The incorrect ordering of operations during cpu dlpar add results in invalid
affinity for the cpu being added. The ibm,associativity property in the
device tree is populated with all zeroes for the added cpu which results in
invalid affinity mappings and all cpus appear to belong to node 0.
This occurs because rtas configure-connector is called prior to making the
rtas set-indicator calls. Phyp does not assign affinity information
for a cpu until the rtas set-indicator calls are made to set the isolation
and allocation state.
Correct the order of operations to make the rtas set-indicator
calls (done in dlpar_acquire_drc) before calling rtas configure-connector.
Fixes:
1a8061c46c46 ("powerpc/pseries: Add kernel based CPU DLPAR handling")
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Fri, 1 May 2015 01:10:09 +0000 (11:10 +1000)]
selftests/powerpc: Fix the pmu install rule
My patch to add install support for the powerpc selftests had a typo,
leading to the three tests in the pmu directory itself not being
installed.
Fixes:
6faeeea44b84 ("selftests: Add install support for the powerpc tests")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Thu, 30 Apr 2015 21:23:31 +0000 (14:23 -0700)]
Merge tag 'pm+acpi-4.1-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"Three regression fixes this time, one for a recent regression in the
cpuidle core affecting multiple systems, one for an inadvertently
added duplicate typedef in ACPICA that breaks compilation with GCC 4.5
and one for an ACPI Smart Battery Subsystem driver regression
introduced during the 3.18 cycle (stable-candidate).
Specifics:
- Fix for a regression in the cpuidle core introduced by one of the
recent commits in the clockevents_notify() removal series that put
a call to a function which had to be executed with disabled
interrupts into a code path running with enabled interrupts (Rafael
J Wysocki)
- Fix for a build problem in ACPICA (with GCC 4.5) introduced by one
of the recent ACPICA tools commits that added a duplicate typedef
to one of the ACPICA's header files by mistake (Olaf Hering)
- Fix for a regression in the ACPI SBS (Smart Battery Subsystem)
driver introduced during the 3.18 development cycle causing the
smart battery manager to be marked as not present when it should be
marked as present (Chris Bainbridge)"
* tag 'pm+acpi-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: Run tick_broadcast_exit() with disabled interrupts
ACPI / SBS: Enable battery manager when present
ACPICA: remove duplicate u8 typedef
Linus Torvalds [Thu, 30 Apr 2015 21:00:18 +0000 (14:00 -0700)]
Merge tag 'sound-4.1-rc2' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"One nice fix is Peter's patch to make the old good SB Audigy PCI to
work with 32bit DMA instead of 31bit. This allows the MIDI synth
running on modern machines again. Along with it, a few fixes for
emu10k1 have merged.
In ASoC side, there is one fix in the common code, but it's just
trivial additions of static inline functions for CONFIG_PM=n. The
rest are various device-specific small fixes.
Last but not least, a few HD-audio fixes are included, as usual, too"
* tag 'sound-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits)
ASoC: rt5677: fixed wrong DMIC ref clock
ALSA: emu10k1: Emu10k2 32 bit DMA mode
ALSA: emux: Fix mutex deadlock in OSS emulation
ASoC: Update email-id of Rajeev Kumar
ASoC: rt5645: Fix mask for setting RT5645_DMIC_2_DP_GPIO12 bit
ALSA: hda - Fix missing va_end() call in snd_hda_codec_pcm_new()
ALSA: emux: Fix mutex deadlock at unloading
ALSA: emu10k1: Fix card shortname string buffer overflow
ALSA: hda - Add mute-LED mode control to Thinkpad
ALSA: hda - Fix mute-LED fixed mode
ALSA: hda - Fix click noise at start on Dell XPS13
ASoC: rt5645: Add ACPI match ID
ASoC: rt5677: add register patch for PLL
ASoC: Intel: fix the makefile for atom code
ASoC: dapm: Enable autodisable on SOC_DAPM_SINGLE_TLV_AUTODISABLE
ASoC: add static inline funcs to fix a compiling issue
ASoC: Intel: sst_byt: remove kfree for memory allocated with devm_kzalloc
ASoC: samsung: s3c24xx-i2s: Fix return value check in s3c24xx_iis_dev_probe()
ASoC: tfa9879: Fix return value check in tfa9879_i2c_probe()
ASoC: fsl_ssi: Fix platform_get_irq() error handling
...
Markus Pargmann [Thu, 30 Apr 2015 15:07:50 +0000 (17:07 +0200)]
net: fec: Fix RGMII-ID mode
RGMII-ID uses an internal delay within the transmitter or receiver. This
feature is phy specific. The rest of the communication is normal RGMII.
So the fec driver has to check for all RGMII modes, not only
'PHY_INTERFACE_MODE_RGMII'.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Shamay [Thu, 30 Apr 2015 14:32:46 +0000 (17:32 +0300)]
net/mlx4_en: Schedule napi when RX buffers allocation fails
When system is out of memory, refilling of RX buffers fails while
the driver continue to pass the received packets to the kernel stack.
At some point, when all RX buffers deplete, driver may fall into a
sleep, and not recover when memory for new RX buffers is once again
availible. This is because hardware does not have valid descriptors,
so no interrupt will be generated for the driver to return to work
in napi context. Fix it by schedule the napi poll function from
stats_task delayed workqueue, as long as the allocations fail.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Camuso [Thu, 30 Apr 2015 11:51:27 +0000 (07:51 -0400)]
netxen_nic: use spin_[un]lock_bh around tx_clean_lock
While testing this driver with DEBUG_LOCKDEP and DEBUG_SPINLOCK
enabled did not produce any traces, it would be more prudent in the
case of tx_clean_lock to use spin_[un]lock_bh, since this lock is
manipulated in both the process and softirq contexts.
This patch was tested for functionality and regressions with netperf
and DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled.
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 29 Apr 2015 20:52:51 +0000 (16:52 -0400)]
net/mlx4_core: Fix unaligned accesses
Addresses the following kernel logs seen during boot:
Kernel unaligned access at TPC[
100ee150] mlx4_QUERY_HCA+0x80/0x248 [mlx4_core]
Kernel unaligned access at TPC[
100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Kernel unaligned access at TPC[
100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Kernel unaligned access at TPC[
100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Kernel unaligned access at TPC[
100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Signed-off-by: David Ahern <david.ahern@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Wed, 29 Apr 2015 22:59:35 +0000 (15:59 -0700)]
mlx4_en: Use correct loop cursor in error path.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Fixes:
9e311e7 ("net/mlx4_en: Use affinity hint")
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Thu, 30 Apr 2015 19:05:57 +0000 (21:05 +0200)]
Merge branches 'acpica', 'acpi-battery' and 'pm-cpuidle'
Takashi Iwai [Thu, 30 Apr 2015 17:08:06 +0000 (19:08 +0200)]
Merge tag 'asoc-v4.1-rc1' of git://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v4.1
A few fixes for v4.1, none earth shattering and mostly driver related
except for one change to fix !PM builds for Intel platforms which is
done by adding stubs in the core so other platforms don't run into the
same issue.
Linus Torvalds [Thu, 30 Apr 2015 16:44:04 +0000 (09:44 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm changes from Paolo Bonzini:
"Remove from guest code the handling of task migration during a pvclock
read; instead use the correct protocol in KVM.
This removes the need for task migration notifiers in core scheduler
code"
[ The scheduler people really hated the migration notifiers, so this was
kind of required - Linus ]
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86: pvclock: Really remove the sched notifier for cross-cpu migrations
kvm: x86: fix kvmclock update protocol
Linus Torvalds [Thu, 30 Apr 2015 16:39:52 +0000 (09:39 -0700)]
Merge tag 'dm-4.1-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper bugfixes from Mike Snitzer:
"Fix two bugs in the request-based DM blk-mq support that was added
during the 4.1 merge"
* tag 'dm-4.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
dm: only initialize the request_queue once
David Howells [Thu, 30 Apr 2015 13:58:43 +0000 (14:58 +0100)]
modsign: change default key details
Change default key details to be more obviously unspecified.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 30 Apr 2015 16:30:07 +0000 (09:30 -0700)]
Merge tag 'tty-4.1-rc2' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty/serial driver fixes for 4.1-rc2.
They include some minor fixes that resolve reported issues, and a new
device quirk.
All have been in linux-next succesfully"
* tag 'tty-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_pci: Add support for 16 port Exar boards
serial: samsung: fix serial console break
tty/serial: at91: maxburst was missing for dma transfers
serial: of-serial: Remove device_type = "serial" registration
serial: xilinx: Use platform_get_irq to get irq description structure
serial: core: Fix kernel-doc build warnings
tty: Re-add external interface for tty_set_termios()
Linus Torvalds [Thu, 30 Apr 2015 16:08:53 +0000 (09:08 -0700)]
Merge tag 'usb-4.1-rc2' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB fixes for 4.2-rc2. They revert one
problem patch, fix some minor things, and add some new quirks for
"broken" devices.
All have been in linux-next successfully"
* tag 'usb-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
cdc-acm: prevent infinite loop when parsing CDC headers.
Revert "usb: host: ehci-msm: Use devm_ioremap_resource instead of devm_ioremap"
usb: chipidea: otg: remove mutex unlock and lock while stop and start role
uas: Set max_sectors_240 quirk for ASM1053 devices
uas: Add US_FL_MAX_SECTORS_240 flag
uas: Allow uas_use_uas_driver to return usb-storage flags
Linus Torvalds [Thu, 30 Apr 2015 16:07:26 +0000 (09:07 -0700)]
Merge tag 'renesas-sh-drivers-for-v4.1' of git://git./linux/kernel/git/horms/renesas
Pull SH driver updates from Simon Horman:
- remove test for now unsupported sh7372 SoC
- disable PM runtime for multi-platform r8a73a4 and sh73a0 SoCs with
genpd
* tag 'renesas-sh-drivers-for-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
drivers: sh: Remove test for now unsupported sh7372
drivers: sh: Disable PM runtime for multi-platform r8a73a4 with genpd
drivers: sh: Disable PM runtime for multi-platform sh73a0 with genpd
Mike Snitzer [Wed, 29 Apr 2015 14:48:09 +0000 (10:48 -0400)]
dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
Commit
022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
before testing clone->q->mq_ops. It was an oversight to discontinue
that check for 1 of the 2 use-cases for free_rq_clone():
1) free_rq_clone() called when an unmapped original request is requeued
2) free_rq_clone() called in the request-based IO completion path
The clone->q check made sense for case #1 but not for #2. However, we
cannot just reinstate the check as it'd mask a serious bug in the IO
completion case #2 -- no in-flight request should have an uninitialized
request_queue (basic block layer refcounting _should_ ensure this).
The NULL pointer seen for case #1 is detailed here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html
Fix this free_rq_clone() NULL pointer by simply checking if the
mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
blk-mq) rather than checking clone->q->mq_ops. This avoids the need to
dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
uninitialized clone request is being completed.
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Christoph Hellwig [Thu, 30 Apr 2015 14:10:36 +0000 (10:10 -0400)]
dm: only initialize the request_queue once
Commit
bfebd1cdb4 ("dm: add full blk-mq support to request-based DM")
didn't properly account for the need to short-circuit re-initializing
DM's blk-mq request_queue if it was already initialized.
Otherwise, reloading a blk-mq request-based DM table (either manually
or via multipathd) resulted in errors, see:
https://www.redhat.com/archives/dm-devel/2015-April/msg00132.html
Fix is to only initialize the request_queue on the initial table load
(when the mapped_device type is assigned).
This is better than having dm_init_request_based_blk_mq_queue() return
early if the queue was already initialized because it elevates the
constraint to a more meaningful location in DM core. As such the
pre-existing early return in dm_init_request_based_queue() can now be
removed.
Fixes:
bfebd1cdb4 ("dm: add full blk-mq support to request-based DM")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Suzuki K. Poulose [Mon, 13 Apr 2015 09:17:55 +0000 (10:17 +0100)]
arm64: perf: Fix the pmu node name in warning message
With commit
d5efd9cc9cf2 ("arm64: pmu: add support for interrupt-affinity
property"), we print a warning when we find a PMU SPI with a missing
missing interrupt-affinity property in a pmu node. Unfortunately, we
pass the wrong (NULL) device node to of_node_full_name, resulting in
unhelpful messages such as:
hw perfevents: Failed to parse <no-node>/interrupt-affinity[0]
This patch fixes the name to that of the pmu node.
Fixes:
d5efd9cc9cf2 (arm64: pmu: add support for interrupt-affinity property)
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Fri, 17 Apr 2015 13:41:29 +0000 (14:41 +0100)]
arm64: perf: don't warn about missing interrupt-affinity property for PPIs
PPIs are affine by nature, so the interrupt-affinity property is not
used and therefore we shouldn't print a warning in its absence.
Reported-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Michael Ellerman [Thu, 30 Apr 2015 05:13:14 +0000 (15:13 +1000)]
Revert "powerpc/tm: Abort syscalls in active transactions"
This reverts commit
feba40362b11341bee6d8ed58d54b896abbd9f84.
Although the principle of this change is good, the implementation has a
few issues.
Firstly we can sometimes fail to abort a syscall because r12 may have
been clobbered by C code if we went down the virtual CPU accounting
path, or if syscall tracing was enabled.
Secondly we have decided that it is safer to abort the syscall even
earlier in the syscall entry path, so that we avoid the syscall tracing
path when we are transactional.
So that we have time to thoroughly test those changes we have decided to
revert this for this merge window and will merge the fixed version in
the next window.
NB. Rather than reverting the selftest we just drop tm-syscall from
TEST_PROGS so that it's not run by default.
Fixes:
feba40362b11 ("powerpc/tm: Abort syscalls in active transactions")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Dave Airlie [Thu, 30 Apr 2015 02:15:34 +0000 (12:15 +1000)]
Merge branch 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Fixes for 4.1 for radeon all destined for stable:
- fix fallout from the audio rework
- VM fixes
- other assorted bug fixes
* 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix userptr return value checking (v2)
drm/radeon: check new address before removing old one
drm/radeon: reset BOs address after clearing it.
drm/radeon: fix lockup when BOs aren't part of the VM on release
drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
drm/radeon: adjust pll when audio is not enabled
drm/radeon: only enable audio streams if the monitor supports it
drm/radeon: only mark audio as connected if the monitor supports it (v3)
drm/radeon/audio: don't enable packets until the end
drm/radeon: drop dce6_dp_enable
drm/radeon: fix ordering of AVI packet setup
drm/radeon: Use drm_calloc_ab for CS relocs
Forrest Liu [Mon, 9 Feb 2015 09:31:45 +0000 (17:31 +0800)]
Btrfs: btrfs_release_extent_buffer_page didn't free pages of dummy extent
btrfs_release_extent_buffer_page() can't handle dummy extent that
allocated by btrfs_clone_extent_buffer() properly. That is because
reference count of pages that allocated by btrfs_clone_extent_buffer()
was 2, 1 by alloc_page(), and another by attach_extent_buffer_page().
Running following command repeatly can check this memory leak problem
btrfs inspect-internal inode-resolve 256 /mnt/btrfs
Signed-off-by: Chien-Kuan Yeh <ckya@synology.com>
Signed-off-by: Forrest Liu <forrestl@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Tested-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Hariprasad Shenai [Wed, 29 Apr 2015 11:49:05 +0000 (17:19 +0530)]
cxgb4: Fix MC1 memory offset calculation
Commit
6559a7e8296002b4 ("cxgb4: Cleanup macros so they follow the same
style and look consistent") introduced a regression where reading MC1
memory in adapters where MC0 isn't present or MC0 size is not equal to MC1
size caused the adapter to crash due to incorrect computation of memoffset.
Fix is to read the size of MC0 instead of MC1 for offset calculation
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 29 Apr 2015 05:09:49 +0000 (08:09 +0300)]
bnx2x: Delay during kdump load
In a kdump environment interfaces might be re-loaded without a proper
unload sequence in the previous running kernel.
bnx2x management FW and driver maintains a `pulse' that notifies the FW
that the driver is still up and running.
Driver load on the kdump kernel should be performed only after the pulse
has been out-of-sync long enough for the management FW to identify that
the driver has crashed, on which point it will perform some necessary
cleanup of the HW.
In today's distros kdump loading is quite fast, sometimes too fast for our
FW to get out-of-sync. This patch delays the bnx2x's probe during kdump
to allow a proper re-load on the kdump kernel.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pai [Wed, 29 Apr 2015 18:24:23 +0000 (14:24 -0400)]
net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_table
This patch fixes a Kernel Panic in bonding driver debugfs file: rlb_hash_table.
$> modprobe bonding mode=6
$> cat /sys/kernel/debug/bonding/bond0/rlb_hash_table
This will crash the kernel. The struct alb_bond_info is initialized only when
the bonding interface is initialized (ip link set bond0 up) and not at the time
it is allocated. If we try to read the table before that, it'll result in a
kernel panic.
The patch applies against both net and net-next
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Wed, 29 Apr 2015 17:56:15 +0000 (10:56 -0700)]
net: dsa: Fix scope of eeprom-length property
eeprom-length is a switch property, not a dsa property, and thus
needs to be attached to the switch node, not to the dsa node.
Reported-by: Andrew Lunn <andrew@lunn.ch>
Fixes:
6793abb4e849 ("net: dsa: Add support for switch EEPROM access")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Punnaiah Choudary Kalluri [Wed, 29 Apr 2015 03:04:46 +0000 (08:34 +0530)]
net: macb: Fix race condition in driver when Rx frame is dropped
Under heavy Rx load, observed that the Hw is updating the USED bit
and it is not updating the received frame status to the BD control
field. This could be lack of resources for processing the BDs at high
data rates. Driver drops the frame associated with this BD but not
clearing the USED bit. So, this is causing hang condition as Hw
expects USED bit to be cleared for this BD.
Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KY Srinivasan [Wed, 29 Apr 2015 00:59:48 +0000 (17:59 -0700)]
hv_netvsc: Fix a bug in netvsc_start_xmit()
Commit
b08cc79155fc26d0d112b1470d1ece5034651a4b eliminated memory
allocation in the packet send path:
"hv_netvsc: Eliminate memory allocation in the packet send path
The network protocol used to communicate with the host is the remote ndis (rndis)
protocol. We need to decorate each outgoing packet with a rndis header and
additional rndis state (rndis per-packet state). To manage this state, we
currently allocate memory in the transmit path. Eliminate this allocation by
requesting additional head room in the skb."
This commit introduced a bug since it did not account for the case if the skb
was cloned. Fix this bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlastimil Setka [Tue, 28 Apr 2015 22:17:11 +0000 (00:17 +0200)]
altera_tse: Correct rx packet length
Altera TSE MAC rx DMA transfer starts with the 2 additional bytes for IP
payload alignment. This patch fixes tse_rx() function loop which reads DMA
rx status and extracts packet length from it. Status signalises a whole DMA
transfer length, which is 2 bytes longer than the packet itself.
Signed-off-by: Vlastimil Setka <setka@vsis.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Tue, 28 Apr 2015 21:49:29 +0000 (14:49 -0700)]
mlx4: Fix tx ring affinity_mask creation
By default, the number of tx queues is limited by the number of online cpus
in mlx4_en_get_profile(). However, this limit no longer holds after the
ethtool .set_channels method has been called. In that situation, the driver
may access invalid bits of certain cpumask variables when queue_index >=
nr_cpu_ids.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Acked-by: Ido Shamay <idos@mellanox.com>
Fixes:
d03a68f ("net/mlx4_en: Configure the XPS queue mapping on driver load")
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Tue, 28 Apr 2015 20:59:04 +0000 (16:59 -0400)]
tipc: fix problem with parallel link synchronization mechanism
Currently, we try to accumulate arrived packets in the links's
'deferred' queue during the parallel link syncronization phase.
This entails two problems:
- With an unlucky combination of arriving packets the algorithm
may go into a lockstep with the out-of-sequence handling function,
where the synch mechanism is adding a packet to the deferred queue,
while the out-of-sequence handling is retrieving it again, thus
ending up in a loop inside the node_lock scope.
- Even if this is avoided, the link will very often send out
unnecessary protocol messages, in the worst case leading to
redundant retransmissions.
We fix this by just dropping arriving packets on the upcoming link
during the synchronization phase, thus relying on the retransmission
protocol to resolve the situation once the two links have arrived to
a synchronized state.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Tue, 28 Apr 2015 16:33:50 +0000 (18:33 +0200)]
tipc: remove wrong use of NLM_F_MULTI
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
it is sent only at the end of a dump.
Libraries like libnl will wait forever for NLMSG_DONE.
Fixes:
35b9dd7607f0 ("tipc: add bearer get/dump to new netlink api")
Fixes:
7be57fc69184 ("tipc: add link get/dump to new netlink api")
Fixes:
46f15c6794fb ("tipc: add media get/dump to new netlink api")
CC: Richard Alpe <richard.alpe@ericsson.com>
CC: Jon Maloy <jon.maloy@ericsson.com>
CC: Ying Xue <ying.xue@windriver.com>
CC: tipc-discussion@lists.sourceforge.net
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Tue, 28 Apr 2015 16:33:49 +0000 (18:33 +0200)]
bridge/nl: remove wrong use of NLM_F_MULTI
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
it is sent only at the end of a dump.
Libraries like libnl will wait forever for NLMSG_DONE.
Fixes:
e5a55a898720 ("net: create generic bridge ops")
Fixes:
815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf")
CC: John Fastabend <john.r.fastabend@intel.com>
CC: Sathya Perla <sathya.perla@emulex.com>
CC: Subbu Seetharaman <subbu.seetharaman@emulex.com>
CC: Ajit Khaparde <ajit.khaparde@emulex.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: intel-wired-lan@lists.osuosl.org
CC: Jiri Pirko <jiri@resnulli.us>
CC: Scott Feldman <sfeldma@gmail.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: bridge@lists.linux-foundation.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Tue, 28 Apr 2015 16:33:48 +0000 (18:33 +0200)]
bridge/mdb: remove wrong use of NLM_F_MULTI
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
it is sent only at the end of a dump.
Libraries like libnl will wait forever for NLMSG_DONE.
Fixes:
37a393bc4932 ("bridge: notify mdb changes via netlink")
CC: Cong Wang <amwang@redhat.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: bridge@lists.linux-foundation.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Tue, 28 Apr 2015 11:33:21 +0000 (13:33 +0200)]
net: sched: act_connmark: don't zap skb->nfct
This action is meant to be passive, i.e. we should not alter
skb->nfct: If nfct is present just leave it alone.
Compile tested only.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antonio Ospite [Tue, 28 Apr 2015 11:11:29 +0000 (13:11 +0200)]
trivial: net: systemport: bcmsysport.h: fix 0x0x prefix
Fix the 0x0x prefix in an integer constant.
In this case, while at it, also fix a typo (s/unitcast/unicast/).
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>