Preeti U Murthy [Thu, 26 Mar 2015 13:02:44 +0000 (18:32 +0530)]
sched: Improve load balancing in the presence of idle CPUs
When a CPU is kicked to do nohz idle balancing, it wakes up to do load
balancing on itself, followed by load balancing on behalf of idle CPUs.
But it may end up with load after the load balancing attempt on itself.
This aborts nohz idle balancing. As a result several idle CPUs are left
without tasks till such a time that an ILB CPU finds it unfavorable to
pull tasks upon itself. This delays spreading of load across idle CPUs
and worse, clutters only a few CPUs with tasks.
The effect of the above problem was observed on an SMT8 POWER server
with 2 levels of numa domains. Busy loops equal to number of cores were
spawned. Since load balancing on fork/exec is discouraged across numa
domains, all busy loops would start on one of the numa domains. However
it was expected that eventually one busy loop would run per core across
all domains due to nohz idle load balancing. But it was observed that it
took as long as 10 seconds to spread the load across numa domains.
Further investigation showed that this was a consequence of the
following:
1. An ILB CPU was chosen from the first numa domain to trigger nohz idle
load balancing [Given the experiment, upto 6 CPUs per core could be
potentially idle in this domain.]
2. However the ILB CPU would call load_balance() on itself before
initiating nohz idle load balancing.
3. Given cores are SMT8, the ILB CPU had enough opportunities to pull
tasks from its sibling cores to even out load.
4. Now that the ILB CPU was no longer idle, it would abort nohz idle
load balancing
As a result the opportunities to spread load across numa domains were
lost until such a time that the cores within the first numa domain had
equal number of tasks among themselves. This is a pretty bad scenario,
since the cores within the first numa domain would have as many as 4
tasks each, while cores in the neighbouring numa domains would all
remain idle.
Fix this, by checking if a CPU was woken up to do nohz idle load
balancing, before it does load balancing upon itself. This way we allow
idle CPUs across the system to do load balancing which results in
quicker spread of load, instead of performing load balancing within the
local sched domain hierarchy of the ILB CPU alone under circumstances
such as above.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Jason Low <jason.low2@hp.com>
Cc: benh@kernel.crashing.org
Cc: daniel.lezcano@linaro.org
Cc: efault@gmx.de
Cc: iamjoonsoo.kim@lge.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: riel@redhat.com
Cc: srikar@linux.vnet.ibm.com
Cc: svaidy@linux.vnet.ibm.com
Cc: tim.c.chen@linux.intel.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/20150326130014.21532.17158.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Mon, 23 Mar 2015 13:19:05 +0000 (14:19 +0100)]
sched: Optimize freq invariant accounting
Currently the freq invariant accounting (in
__update_entity_runnable_avg() and sched_rt_avg_update()) get the
scale factor from a weak function call, this means that even for archs
that default on their implementation the compiler cannot see into this
function and optimize the extra scaling math away.
This is sad, esp. since its a 64-bit multiplication which can be quite
costly on some platforms.
So replace the weak function with #ifdef and __always_inline goo. This
is not quite as nice from an arch support PoV but should at least
result in compile time errors if done wrong.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Morten.Rasmussen@arm.com
Cc: Paul Turner <pjt@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/20150323131905.GF23123@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vincent Guittot [Fri, 27 Feb 2015 15:54:14 +0000 (16:54 +0100)]
sched: Move CFS tasks to CPUs with higher capacity
When a CPU is used to handle a lot of IRQs or some RT tasks, the remaining
capacity for CFS tasks can be significantly reduced. Once we detect such
situation by comparing cpu_capacity_orig and cpu_capacity, we trig an idle
load balance to check if it's worth moving its tasks on an idle CPU.
It's worth trying to move the task before the CPU is fully utilized to
minimize the preemption by irq or RT tasks.
Once the idle load_balance has selected the busiest CPU, it will look for an
active load balance for only two cases:
- There is only 1 task on the busiest CPU.
- We haven't been able to move a task of the busiest rq.
A CPU with a reduced capacity is included in the 1st case, and it's worth to
actively migrate its task if the idle CPU has got more available capacity for
CFS tasks. This test has been added in need_active_balance.
As a sidenote, this will not generate more spurious ilb because we already
trig an ilb if there is more than 1 busy cpu. If this cpu is the only one that
has a task, we will trig the ilb once for migrating the task.
The nohz_kick_needed function has been cleaned up a bit while adding the new
test
env.src_cpu and env.src_rq must be set unconditionnally because they are used
in need_active_balance which is called even if busiest->nr_running equals 1
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Morten.Rasmussen@arm.com
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1425052454-25797-12-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vincent Guittot [Fri, 27 Feb 2015 15:54:13 +0000 (16:54 +0100)]
sched: Add SD_PREFER_SIBLING for SMT level
Add the SD_PREFER_SIBLING flag for SMT level in order to ensure that
the scheduler will place at least one task per core.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: Morten.Rasmussen@arm.com
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1425052454-25797-11-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vincent Guittot [Tue, 3 Mar 2015 10:35:03 +0000 (11:35 +0100)]
sched: Remove unused struct sched_group_capacity::capacity_orig
The 'struct sched_group_capacity::capacity_orig' field is no longer used
in the scheduler so we can remove it.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Morten.Rasmussen@arm.com
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1425378903-5349-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vincent Guittot [Fri, 27 Feb 2015 15:54:11 +0000 (16:54 +0100)]
sched: Replace capacity_factor by usage
The scheduler tries to compute how many tasks a group of CPUs can handle by
assuming that a task's load is SCHED_LOAD_SCALE and a CPU's capacity is
SCHED_CAPACITY_SCALE.
'struct sg_lb_stats:group_capacity_factor' divides the capacity of the group
by SCHED_LOAD_SCALE to estimate how many task can run in the group. Then, it
compares this value with the sum of nr_running to decide if the group is
overloaded or not.
But the 'group_capacity_factor' concept is hardly working for SMT systems, it
sometimes works for big cores but fails to do the right thing for little cores.
Below are two examples to illustrate the problem that this patch solves:
1- If the original capacity of a CPU is less than SCHED_CAPACITY_SCALE
(640 as an example), a group of 3 CPUS will have a max capacity_factor of 2
(div_round_closest(3x640/1024) = 2) which means that it will be seen as
overloaded even if we have only one task per CPU.
2 - If the original capacity of a CPU is greater than SCHED_CAPACITY_SCALE
(1512 as an example), a group of 4 CPUs will have a capacity_factor of 4
(at max and thanks to the fix [0] for SMT system that prevent the apparition
of ghost CPUs) but if one CPU is fully used by rt tasks (and its capacity is
reduced to nearly nothing), the capacity factor of the group will still be 4
(div_round_closest(3*1512/1024) = 5 which is cap to 4 with [0]).
So, this patch tries to solve this issue by removing capacity_factor and
replacing it with the 2 following metrics:
- The available CPU's capacity for CFS tasks which is already used by
load_balance().
- The usage of the CPU by the CFS tasks. For the latter, utilization_avg_contrib
has been re-introduced to compute the usage of a CPU by CFS tasks.
'group_capacity_factor' and 'group_has_free_capacity' has been removed and replaced
by 'group_no_capacity'. We compare the number of task with the number of CPUs and
we evaluate the level of utilization of the CPUs to define if a group is
overloaded or if a group has capacity to handle more tasks.
For SD_PREFER_SIBLING, a group is tagged overloaded if it has more than 1 task
so it will be selected in priority (among the overloaded groups). Since [1],
SD_PREFER_SIBLING is no more concerned by the computation of 'load_above_capacity'
because local is not overloaded.
[1]
9a5d9ba6a363 ("sched/fair: Allow calculate_imbalance() to move idle cpus")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Morten.Rasmussen@arm.com
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1425052454-25797-9-git-send-email-vincent.guittot@linaro.org
[ Tidied up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vincent Guittot [Wed, 4 Mar 2015 07:48:47 +0000 (08:48 +0100)]
sched: Calculate CPU's usage statistic and put it into struct sg_lb_stats::group_usage
Monitor the usage level of each group of each sched_domain level. The usage is
the portion of cpu_capacity_orig that is currently used on a CPU or group of
CPUs. We use the utilization_load_avg to evaluate the usage level of each
group.
The utilization_load_avg only takes into account the running time of the CFS
tasks on a CPU with a maximum value of SCHED_LOAD_SCALE when the CPU is fully
utilized. Nevertheless, we must cap utilization_load_avg which can be
temporally greater than SCHED_LOAD_SCALE after the migration of a task on this
CPU and until the metrics are stabilized.
The utilization_load_avg is in the range [0..SCHED_LOAD_SCALE] to reflect the
running load on the CPU whereas the available capacity for the CFS task is in
the range [0..cpu_capacity_orig]. In order to test if a CPU is fully utilized
by CFS tasks, we have to scale the utilization in the cpu_capacity_orig range
of the CPU to get the usage of the latter. The usage can then be compared with
the available capacity (ie cpu_capacity) to deduct the usage level of a CPU.
The frequency scaling invariance of the usage is not taken into account in this
patch, it will be solved in another patch which will deal with frequency
scaling invariance on the utilization_load_avg.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Morten.Rasmussen@arm.com
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1425455327-13508-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vincent Guittot [Fri, 27 Feb 2015 15:54:09 +0000 (16:54 +0100)]
sched: Add struct rq::cpu_capacity_orig
This new field 'cpu_capacity_orig' reflects the original capacity of a CPU
before being altered by rt tasks and/or IRQ
The cpu_capacity_orig will be used:
- to detect when the capacity of a CPU has been noticeably reduced so we can
trig load balance to look for a CPU with better capacity. As an example, we
can detect when a CPU handles a significant amount of irq
(with CONFIG_IRQ_TIME_ACCOUNTING) but this CPU is seen as an idle CPU by
scheduler whereas CPUs, which are really idle, are available.
- evaluate the available capacity for CFS tasks
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Morten.Rasmussen@arm.com
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1425052454-25797-7-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vincent Guittot [Fri, 27 Feb 2015 15:54:08 +0000 (16:54 +0100)]
sched: Make scale_rt invariant with frequency
The average running time of RT tasks is used to estimate the remaining compute
capacity for CFS tasks. This remaining capacity is the original capacity scaled
down by a factor (aka scale_rt_capacity). This estimation of available capacity
must also be invariant with frequency scaling.
A frequency scaling factor is applied on the running time of the RT tasks for
computing scale_rt_capacity.
In sched_rt_avg_update(), we now scale the RT execution time like below:
rq->rt_avg += rt_delta * arch_scale_freq_capacity() >> SCHED_CAPACITY_SHIFT
Then, scale_rt_capacity can be summarized by:
scale_rt_capacity = SCHED_CAPACITY_SCALE * available / total
with available = total - rq->rt_avg
This has been been optimized in current code by:
scale_rt_capacity = available / (total >> SCHED_CAPACITY_SHIFT)
But we can also developed the equation like below:
scale_rt_capacity = SCHED_CAPACITY_SCALE - ((rq->rt_avg << SCHED_CAPACITY_SHIFT) / total)
and we can optimize the equation by removing SCHED_CAPACITY_SHIFT shift in
the computation of rq->rt_avg and scale_rt_capacity().
so rq->rt_avg += rt_delta * arch_scale_freq_capacity()
and
scale_rt_capacity = SCHED_CAPACITY_SCALE - (rq->rt_avg / total)
arch_scale_frequency_capacity() will be called in the hot path of the scheduler
which implies to have a short and efficient function.
As an example, arch_scale_frequency_capacity() should return a cached value that
is updated periodically outside of the hot path.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Morten.Rasmussen@arm.com
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1425052454-25797-6-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Morten Rasmussen [Wed, 4 Mar 2015 07:46:26 +0000 (08:46 +0100)]
sched: Make sched entity usage tracking scale-invariant
Apply frequency scale-invariance correction factor to usage tracking.
Each segment of the running_avg_sum geometric series is now scaled by the
current frequency so the utilization_avg_contrib of each entity will be
invariant with frequency scaling.
As a result, utilization_load_avg which is the sum of utilization_avg_contrib,
becomes invariant too. So the usage level that is returned by get_cpu_usage(),
stays relative to the max frequency as the cpu_capacity which is is compared against.
Then, we want the keep the load tracking values in a 32-bit type, which implies
that the max value of {runnable|running}_avg_sum must be lower than
2^32/88761=48388 (88761 is the max weigth of a task). As LOAD_AVG_MAX = 47742,
arch_scale_freq_capacity() must return a value less than
(48388/47742) << SCHED_CAPACITY_SHIFT = 1037 (SCHED_SCALE_CAPACITY = 1024).
So we define the range to [0..SCHED_SCALE_CAPACITY] in order to avoid overflow.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Morten.Rasmussen@arm.com
Cc: Paul Turner <pjt@google.com>
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1425455186-13451-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vincent Guittot [Fri, 27 Feb 2015 15:54:06 +0000 (16:54 +0100)]
sched: Remove frequency scaling from cpu_capacity
Now that arch_scale_cpu_capacity has been introduced to scale the original
capacity, the arch_scale_freq_capacity is no longer used (it was
previously used by ARM arch).
Remove arch_scale_freq_capacity from the computation of cpu_capacity.
The frequency invariance will be handled in the load tracking and not in
the CPU capacity. arch_scale_freq_capacity will be revisited for scaling
load with the current frequency of the CPUs in a later patch.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Morten.Rasmussen@arm.com
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1425052454-25797-4-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Morten Rasmussen [Fri, 27 Feb 2015 15:54:05 +0000 (16:54 +0100)]
sched: Track group sched_entity usage contributions
Add usage contribution tracking for group entities. Unlike
se->avg.load_avg_contrib, se->avg.utilization_avg_contrib for group
entities is the sum of se->avg.utilization_avg_contrib for all entities on the
group runqueue.
It is _not_ influenced in any way by the task group h_load. Hence it is
representing the actual cpu usage of the group, not its intended load
contribution which may differ significantly from the utilization on
lightly utilized systems.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Morten.Rasmussen@arm.com
Cc: Paul Turner <pjt@google.com>
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1425052454-25797-3-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vincent Guittot [Fri, 27 Feb 2015 15:54:04 +0000 (16:54 +0100)]
sched: Add sched_avg::utilization_avg_contrib
Add new statistics which reflect the average time a task is running on the CPU
and the sum of these running time of the tasks on a runqueue. The latter is
named utilization_load_avg.
This patch is based on the usage metric that was proposed in the 1st
versions of the per-entity load tracking patchset by Paul Turner
<pjt@google.com> but that has be removed afterwards. This version differs from
the original one in the sense that it's not linked to task_group.
The rq's utilization_load_avg will be used to check if a rq is overloaded or
not instead of trying to compute how many tasks a group of CPUs can handle.
Rename runnable_avg_period into avg_period as it is now used with both
runnable_avg_sum and running_avg_sum.
Add some descriptions of the variables to explain their differences.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Morten.Rasmussen@arm.com
Cc: Paul Turner <pjt@google.com>
Cc: dietmar.eggemann@arm.com
Cc: efault@gmx.de
Cc: kamalesh@linux.vnet.ibm.com
Cc: linaro-kernel@lists.linaro.org
Cc: nicolas.pitre@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1425052454-25797-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Steven Rostedt [Wed, 18 Mar 2015 18:49:46 +0000 (14:49 -0400)]
sched/rt: Use IPI to trigger RT task push migration instead of pulling
When debugging the latencies on a 40 core box, where we hit 300 to
500 microsecond latencies, I found there was a huge contention on the
runqueue locks.
Investigating it further, running ftrace, I found that it was due to
the pulling of RT tasks.
The test that was run was the following:
cyclictest --numa -p95 -m -d0 -i100
This created a thread on each CPU, that would set its wakeup in iterations
of 100 microseconds. The -d0 means that all the threads had the same
interval (100us). Each thread sleeps for 100us and wakes up and measures
its latencies.
cyclictest is maintained at:
git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git
What happened was another RT task would be scheduled on one of the CPUs
that was running our test, when the other CPU tests went to sleep and
scheduled idle. This caused the "pull" operation to execute on all
these CPUs. Each one of these saw the RT task that was overloaded on
the CPU of the test that was still running, and each one tried
to grab that task in a thundering herd way.
To grab the task, each thread would do a double rq lock grab, grabbing
its own lock as well as the rq of the overloaded CPU. As the sched
domains on this box was rather flat for its size, I saw up to 12 CPUs
block on this lock at once. This caused a ripple affect with the
rq locks especially since the taking was done via a double rq lock, which
means that several of the CPUs had their own rq locks held while trying
to take this rq lock. As these locks were blocked, any wakeups or load
balanceing on these CPUs would also block on these locks, and the wait
time escalated.
I've tried various methods to lessen the load, but things like an
atomic counter to only let one CPU grab the task wont work, because
the task may have a limited affinity, and we may pick the wrong
CPU to take that lock and do the pull, to only find out that the
CPU we picked isn't in the task's affinity.
Instead of doing the PULL, I now have the CPUs that want the pull to
send over an IPI to the overloaded CPU, and let that CPU pick what
CPU to push the task to. No more need to grab the rq lock, and the
push/pull algorithm still works fine.
With this patch, the latency dropped to just 150us over a 20 hour run.
Without the patch, the huge latencies would trigger in seconds.
I've created a new sched feature called RT_PUSH_IPI, which is enabled
by default.
When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks
and having the pulling CPU do the work is implemented. When RT_PUSH_IPI
is enabled, the IPI is sent to the overloaded CPU to do a push.
To enabled or disable this at run time:
# mount -t debugfs nodev /sys/kernel/debug
# echo RT_PUSH_IPI > /sys/kernel/debug/sched_features
or
# echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features
Update: This original patch would send an IPI to all CPUs in the RT overload
list. But that could theoretically cause the reverse issue. That is, there
could be lots of overloaded RT queues and one CPU lowers its priority. It would
then send an IPI to all the overloaded RT queues and they could then all try
to grab the rq lock of the CPU lowering its priority, and then we have the
same problem.
The latest design sends out only one IPI to the first overloaded CPU. It tries to
push any tasks that it can, and then looks for the next overloaded CPU that can
push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable
tasks that have priorities greater than the source CPU are covered. In case the
source CPU lowers its priority again, a flag is set to tell the IPI traversal to
restart with the first RT overloaded CPU after the source CPU.
Parts-suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Joern Engel <joern@purestorage.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Steven Rostedt [Thu, 19 Mar 2015 14:18:51 +0000 (10:18 -0400)]
irq_work: Fix build failure when CONFIG_IRQ_WORK is not defined
When CONFIG_IRQ_WORK is not defined (difficult to do, as it also
requires CONFIG_PRINTK not to be defined), we get a build failure:
kernel/built-in.o: In function `flush_smp_call_function_queue':
kernel/smp.c:263: undefined reference to `irq_work_run'
kernel/smp.c:263: undefined reference to `irq_work_run'
Makefile:933: recipe for target 'vmlinux' failed
Simplest thing to do is to make irq_work_run() a nop when not set.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150319101851.4d224d9b@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Mon, 23 Mar 2015 09:50:29 +0000 (10:50 +0100)]
Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Brian Silverman [Thu, 19 Feb 2015 00:23:56 +0000 (16:23 -0800)]
sched: Fix RLIMIT_RTTIME when PI-boosting to RT
When non-realtime tasks get priority-inheritance boosted to a realtime
scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
counter used for checking this (the same one used for SCHED_RR
timeslices) was not getting reset. This meant that tasks running with a
non-realtime scheduling class which are repeatedly boosted to a realtime
one, but never block while they are running realtime, eventually hit the
timeout without ever running for a time over the limit. This patch
resets the realtime timeslice counter when un-PI-boosting from an RT to
a non-RT scheduling class.
I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
mutex which induces priority boosting and spins while boosted that gets
killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
does happen eventually with PREEMPT_VOLUNTARY kernels.
Signed-off-by: Brian Silverman <brian@peloton-tech.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: austin@peloton-tech.com
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 22 Mar 2015 23:50:21 +0000 (16:50 -0700)]
Linux 4.0-rc5
Linus Torvalds [Sun, 22 Mar 2015 23:38:19 +0000 (16:38 -0700)]
Merge tag 'md/4.0-rc4-fix' of git://neil.brown.name/md
Pull bugfix for md from Neil Brown:
"One fix for md in 4.0-rc4
Regression in recent patch causes crash on error path"
* tag 'md/4.0-rc4-fix' of git://neil.brown.name/md:
md: fix problems with freeing private data after ->run failure.
Linus Torvalds [Sun, 22 Mar 2015 19:07:47 +0000 (12:07 -0700)]
Merge tag 'driver-core-4.0-rc5' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are two bugfixes for things reported. One regression in kernfs,
and another issue fixed in the LZ4 code that was fixed in the
"upstream" codebase that solves a reported kernel crash
Both have been in linux-next for a while"
* tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
LZ4 : fix the data abort issue
kernfs: handle poll correctly on 'direct_read' files.
Linus Torvalds [Sun, 22 Mar 2015 19:03:14 +0000 (12:03 -0700)]
Merge tag 'char-misc-4.0-rc5' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are three fixes for 4.0-rc5 that revert 3 PCMCIA patches that
were merged in 4.0-rc1 that cause regressions. So let's revert them
for now and they will be reworked and resent sometime in the future.
All have been tested in linux-next for a while"
* tag 'char-misc-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "pcmcia: add a new resource manager for non ISA systems"
Revert "pcmcia: fix incorrect bracketing on a test"
Revert "pcmcia: add missing include for new pci resource handler"
Linus Torvalds [Sun, 22 Mar 2015 18:59:02 +0000 (11:59 -0700)]
Merge tag 'staging-4.0-rc5' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are four small staging driver fixes, all for the vt6656 and
vt6655 drivers, that resolve some reported issues with them.
All of these patches have been in linux next for a while"
* tag 'staging-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
vt6655: Fix late setting of byRFType.
vt6655: RFbSetPower fix missing rate RATE_12M
staging: vt6656: vnt_rf_setpower: fix missing rate RATE_12M
staging: vt6655: vnt_tx_packet fix dma_idx selection.
Linus Torvalds [Sun, 22 Mar 2015 18:54:29 +0000 (11:54 -0700)]
Merge tag 'tty-4.0-rc5' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fix from Greg KH:
"Here's a single 8250 serial driver that fixes a reported deadlock with
the serial console and the tty driver.
It's been in linux-next for a while now"
* tag 'tty-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_dw: Fix deadlock in LCR workaround
Linus Torvalds [Sun, 22 Mar 2015 18:33:55 +0000 (11:33 -0700)]
Merge tag 'usb-4.0-rc5' of git://git./linux/kernel/git/gregkh/usb
Pull USB / PHY driver fixes from Greg KH:
"Here's a number of USB and PHY driver fixes for 4.0-rc5.
The largest thing here is a revert of a gadget function driver patch
that removes 500 lines of code. Other than that, it's a number of
reported bugs fixes and new quirk/id entries.
All have been in linux-next for a while"
* tag 'usb-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
usb: common: otg-fsm: only signal connect after switching to peripheral
uas: Add US_FL_NO_ATA_1X for Initio Corporation controllers / devices
USB: ehci-atmel: rework clk handling
MAINTAINERS: add entry for USB OTG FSM
usb: chipidea: otg: add a_alt_hnp_support response for B device
phy: omap-usb2: Fix missing clk_prepare call when using old dt name
phy: ti/omap: Fix modalias
phy: core: Fixup return value of phy_exit when !pm_runtime_enabled
phy: miphy28lp: Convert to devm_kcalloc and fix wrong sizof
phy: miphy365x: Convert to devm_kcalloc and fix wrong sizeof
phy: twl4030-usb: Remove redundant assignment for twl->linkstat
phy: exynos5-usbdrd: Fix off-by-one valid value checking for args->args[0]
phy: Find the right match in devm_phy_destroy()
phy: rockchip-usb: Fixup rockchip_usb_phy_power_on failure path
phy: ti-pipe3: Simplify ti_pipe3_dpll_wait_lock implementation
phy: samsung-usb2: Remove NULL terminating entry from phys array
phy: hix5hd2-sata: Check return value of platform_get_resource
phy: exynos-dp-video: Kill exynos_dp_video_phy_pwr_isol function
Revert "usb: gadget: zero: Add support for interrupt EP"
Revert "xhci: Clear the host side toggle manually when endpoint is 'soft reset'"
...
Linus Torvalds [Sat, 21 Mar 2015 20:05:37 +0000 (13:05 -0700)]
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave dmaengine fixes from Vinod Koul:
"Four fixes for dw, pl08x, imx-sdma and at_hdmac driver. Nothing
unusual here, simple fixes to these drivers"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: pl08x: Define capabilities for generic capabilities reporting
dmaengine: dw: append MODULE_ALIAS for platform driver
dmaengine: imx-sdma: switch to dynamic context mode after script loaded
dmaengine: at_hdmac: Fix calculation of the residual bytes
Linus Torvalds [Sat, 21 Mar 2015 19:51:36 +0000 (12:51 -0700)]
Merge tag 'pm+acpi-4.0-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes for recent regressions (PCI/ACPI resources and at91
RTC locking), a stable-candidate powercap RAPL driver fix and two ARM
cpuidle fixes (one stable-candidate too).
Specifics:
- Revert a recent PCI commit related to IRQ resources management that
introduced a regression for drivers attempting to bind to devices
whose previous drivers did not balance pci_enable_device() and
pci_disable_device() as expected (Rafael J Wysocki).
- Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a
recent commit related to wakeup interrupt handling (Dan Carpenter).
- Allow the power capping RAPL (Running-Average Power Limit) driver
to use different energy units for domains within one CPU package
which is necessary to handle Intel Haswell EP processors correctly
(Jacob Pan).
- Improve the cpuidle mvebu driver's handling of Armada XP SoCs by
updating the target residency and exit latency numbers for those
chips (Sebastien Rannou).
- Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice
in a row before cpu_pm_exit() is called on the same CPU which
breaks the core's assumptions regarding the usage of those
functions (Gregory Clement)"
* tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "x86/PCI: Refine the way to release PCI IRQ resources"
rtc: at91rm9200: double locking bug in at91_rtc_interrupt()
powercap / RAPL: handle domains with different energy units
cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs
cpuidle: mvebu: Fix the CPU PM notifier usage
Linus Torvalds [Sat, 21 Mar 2015 19:41:50 +0000 (12:41 -0700)]
Merge git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"A bunch of fixes across drivers:
radeon:
disable two ended allocation for now, it breaks some stuff
amdkfd:
misc fixes
nouveau:
fix irq loop problem, add basic support for GM206 (new hw)
i915:
fix some WARNs people were seeing
exynos:
fix some iommu interactions causing boot failures"
* git://people.freedesktop.org/~airlied/linux:
drm/radeon: drop ttm two ended allocation
drm/exynos: fix the initialization order in FIMD
drm/exynos: fix typo config name correctly.
drm/exynos: Check for NULL dereference of crtc
drm/exynos: IS_ERR() vs NULL bug
drm/exynos: remove unused files
drm/i915: Make sure the primary plane is enabled before reading out the fb state
drm/nouveau/bios: fix i2c table parsing for dcb 4.1
drm/nouveau/device/gm100: Basic GM206 bring up (as copy of GM204)
drm/nouveau/device: post write to NV_PMC_BOOT_1 when flipping endian switch
drm/nouveau/gr/gf100: fix some accidental or'ing of buffer addresses
drm/nouveau/fifo/nv04: remove the loop from the interrupt handler
drm/radeon: Changing number of compute pipe lines
drm/amdkfd: Fix SDMA queue init. in non-HWS mode
drm/amdkfd: destroy mqd when destroying kernel queue
drm/i915: Ensure plane->state->fb stays in sync with plane->fb
Linus Torvalds [Sat, 21 Mar 2015 19:33:01 +0000 (12:33 -0700)]
Merge tag 'devicetree-fixes-for-4.0-part2' of git://git./linux/kernel/git/robh/linux
Pull more DeviceTree fixes vfom Rob Herring:
- revert setting stdout-path as preferred console. This caused
regressions in PowerMACs and other systems.
- yet another fix for stdout-path option parsing.
- fix error path handling in of_irq_parse_one
* tag 'devicetree-fixes-for-4.0-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
Revert "of: Fix premature bootconsole disable with 'stdout-path'"
of: handle both '/' and ':' in path strings
of: unittest: Add option string test case with longer path
of/irq: Fix of_irq_parse_one() returned error codes
Linus Torvalds [Sat, 21 Mar 2015 18:24:38 +0000 (11:24 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
"Here are current target-pending fixes for v4.0-rc5 code that have made
their way into the queue over the last weeks.
The fixes this round include:
- Fix long-standing iser-target logout bug related to early
conn_logout_comp completion, resulting in iscsi_conn use-after-tree
OOpsen. (Sagi + nab)
- Fix long-standing tcm_fc bug in ft_invl_hw_context() failure
handing for DDP hw offload. (DanC)
- Fix incorrect use of unprotected __transport_register_session() in
tcm_qla2xxx + other single local se_node_acl fabrics. (Bart)
- Fix reference leak in target_submit_cmd() -> target_get_sess_cmd()
for ack_kref=1 failure path. (Bart)
- Fix pSCSI backend ->get_device_type() statistics OOPs with
un-configured device. (Olaf + nab)
- Fix virtual LUN=0 target_configure_device failure OOPs at modprobe
time. (Claudio + nab)
- Fix FUA write false positive failure regression in v4.0-rc1 code.
(Christophe Vu-Brugier + HCH)"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: do not reject FUA CDBs when write cache is enabled but emulate_write_cache is 0
target: Fix virtual LUN=0 target_configure_device failure OOPs
target/pscsi: Fix NULL pointer dereference in get_device_type
tcm_fc: missing curly braces in ft_invl_hw_context()
target: Fix reference leak in target_get_sess_cmd() error path
loop/usb/vhost-scsi/xen-scsiback: Fix use of __transport_register_session
tcm_qla2xxx: Fix incorrect use of __transport_register_session
iscsi-target: Avoid early conn_logout_comp for iser connections
Revert "iscsi-target: Avoid IN_LOGOUT failure case for iser-target"
target: Disallow changing of WRITE cache/FUA attrs after export
Linus Torvalds [Sat, 21 Mar 2015 18:15:13 +0000 (11:15 -0700)]
Merge tag 'dm-4.0-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull devicemapper fixes from Mike Snitzer:
"A handful of stable fixes for DM:
- fix thin target to always zero-fill reads to unprovisioned blocks
- fix to interlock device destruction's suspend from internal
suspends
- fix 2 snapshot exception store handover bugs
- fix dm-io to cope with DISCARD and WRITE_SAME capabilities changing"
* tag 'dm-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm io: deal with wandering queue limits when handling REQ_DISCARD and REQ_WRITE_SAME
dm snapshot: suspend merging snapshot when doing exception handover
dm snapshot: suspend origin when doing exception handover
dm: hold suspend_lock while suspending device during device deletion
dm thin: fix to consistently zero-fill reads to unprovisioned blocks
Linus Torvalds [Sat, 21 Mar 2015 17:53:37 +0000 (10:53 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Most of these are fixing extent reservation accounting, or corners
with tree writeback during commit.
Josef's set does add a test, which isn't strictly a fix, but it'll
keep us from making this same mistake again"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix outstanding_extents accounting in DIO
Btrfs: add sanity test for outstanding_extents accounting
Btrfs: just free dummy extent buffers
Btrfs: account merges/splits properly
Btrfs: prepare block group cache before writing
Btrfs: fix ASSERT(list_empty(&cur_trans->dirty_bgs_list)
Btrfs: account for the correct number of extents for delalloc reservations
Btrfs: fix merge delalloc logic
Btrfs: fix comp_oper to get right order
Btrfs: catch transaction abortion after waiting for it
btrfs: fix sizeof format specifier in btrfs_check_super_valid()
Linus Torvalds [Sat, 21 Mar 2015 17:41:15 +0000 (10:41 -0700)]
Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux
Pull nfsd bufix from Bruce Fields:
"This is a fix for a crash easily triggered by 4.1 activity to a server
built with CONFIG_NFSD_PNFS.
There are some more bugfixes queued up that I intend to pass along
next week, but this is the most critical"
* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
Subject: nfsd: don't recursively call nfsd4_cb_layout_fail
Linus Torvalds [Sat, 21 Mar 2015 17:36:44 +0000 (10:36 -0700)]
Merge tag 'upstream-4.0-rc5' of git://git.infradead.org/linux-ubifs
Pull UBI fix from Artem Bityutskiy:
"This fixes a bug introduced during the v4.0 merge window where we
forgot to put braces where they should be"
* tag 'upstream-4.0-rc5' of git://git.infradead.org/linux-ubifs:
UBI: fix missing brace control flow
Linus Torvalds [Sat, 21 Mar 2015 17:24:10 +0000 (10:24 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- mm switching fix where the kernel pgd ends up in the user TTBR0 after
returning from an EFI run-time services call
- fix __GFP_ZERO handling for atomic pool and CMA DMA allocations (the
generic code does get the gfp flags, so it's left with the arch code
to memzero accordingly)
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Honor __GFP_ZERO in dma allocations
arm64: efi: don't restore TTBR0 if active_mm points at init_mm
Linus Torvalds [Sat, 21 Mar 2015 17:03:22 +0000 (10:03 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Another few ARM fixes. Fabrice fixed the L2 cache DT parsing to allow
prefetch configuration to be specified even when the cache size
parsing fails.
Laura noticed that the setting of page attributes wasn't working for
modules due to is_module_addr() always returning false.
Marc Gonzalez (aka Mason) noticed a potential latent bug with the way
we read one of the CPUID registers (where we could attempt to read a
non-present CPUID register which may fault.)
I've fixed an issue where 32-bit DMA masks were failing with memory
which extended to the top of physical address space, and I've also
added debugging output of the page tables when we hit a data access
exception which we don't specifically handle - prompted by the lack of
information in a bug report"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8313/1: Use read_cpuid_ext() macro instead of inline asm
ARM: 8311/1: Don't use is_module_addr in setting page attributes
ARM: 8310/1: l2c: Fix prefetch settings dt parsing
ARM: dump pgd, pmd and pte states on unhandled data abort faults
ARM: dma-api: fix off-by-one error in __dma_supported()
Rafael J. Wysocki [Fri, 20 Mar 2015 23:39:12 +0000 (00:39 +0100)]
Merge branches 'pm-cpuidle', 'powercap', 'irq-pm' and 'acpi-resources'
* pm-cpuidle:
cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs
cpuidle: mvebu: Fix the CPU PM notifier usage
* powercap:
powercap / RAPL: handle domains with different energy units
* irq-pm:
rtc: at91rm9200: double locking bug in at91_rtc_interrupt()
* acpi-resources:
Revert "x86/PCI: Refine the way to release PCI IRQ resources"
NeilBrown [Fri, 13 Mar 2015 00:51:18 +0000 (11:51 +1100)]
md: fix problems with freeing private data after ->run failure.
If ->run() fails, it can either free the data structures it
allocated, or leave that task to ->free() which will be called
on failures.
However:
md.c calls ->free() even if ->private_data is NULL, which
causes problems in some personalities.
raid0.c frees the data, but doesn't clear ->private_data,
which will become a problem when we fix md.c
So better fix both these issues at once.
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Fixes:
5aa61f427e4979be733e4847b9199ff9cc48a47e
URL: https://bugzilla.kernel.org/show_bug.cgi?id=94381
Signed-off-by: NeilBrown <neilb@suse.de>
Suzuki K. Poulose [Thu, 19 Mar 2015 18:17:09 +0000 (18:17 +0000)]
arm64: Honor __GFP_ZERO in dma allocations
Current implementation doesn't zero out the pages allocated.
Honor the __GFP_ZERO flag and zero out if set.
Cc: <stable@vger.kernel.org> # v3.14+
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Will Deacon [Thu, 19 Mar 2015 15:43:00 +0000 (15:43 +0000)]
arm64: efi: don't restore TTBR0 if active_mm points at init_mm
init_mm isn't a normal mm: it has swapper_pg_dir as its pgd (which
contains kernel mappings) and is used as the active_mm for the idle
thread.
When restoring the pgd after an EFI call, we write current->active_mm
into TTBR0. If the current task is actually the idle thread (e.g. when
initialising the EFI RTC before entering userspace), then the TLB can
erroneously populate itself with junk global entries as a result of
speculative table walks.
When we do eventually return to userspace, the task can end up hitting
these junk mappings leading to lockups, corruption or crashes.
This patch fixes the problem in the same way as the CPU suspend code by
ensuring that we never switch to the init_mm in efi_set_pgd and instead
point TTBR0 at the zero page. A check is also added to cpu_switch_mm to
BUG if we get passed swapper_pg_dir.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes:
f3cdfd239da5 ("arm64/efi: move SetVirtualAddressMap() to UEFI stub")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Rafael J. Wysocki [Fri, 20 Mar 2015 13:56:19 +0000 (14:56 +0100)]
Revert "x86/PCI: Refine the way to release PCI IRQ resources"
Commit
b4b55cda5874 (Refine the way to release PCI IRQ resources)
introduced a regression in the PCI IRQ resource management by causing
the IRQ resource of a device, established when pci_enabled_device()
is called on a fully disabled device, to be released when the driver
is unbound from the device, regardless of the enable_cnt.
This leads to the situation that an ill-behaved driver can now make a
device unusable to subsequent drivers by an imbalance in their use of
pci_enable/disable_device(). That is a serious problem for secondary
drivers like vfio-pci, which are innocent of the transgressions of
the previous driver.
Since the solution of this problem is not immediate and requires
further discussion, revert commit
b4b55cda5874 and the issue it was
supposed to address (a bug related to xen-pciback) will be taken
care of in a different way going forward.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Dave Airlie [Fri, 20 Mar 2015 07:32:21 +0000 (17:32 +1000)]
Merge tag 'drm-intel-fixes-2015-03-19' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Backporting a couple of plane related fixes from drm-next to v4.0.
* tag 'drm-intel-fixes-2015-03-19' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Make sure the primary plane is enabled before reading out the fb state
drm/i915: Ensure plane->state->fb stays in sync with plane->fb
Dave Airlie [Fri, 20 Mar 2015 07:32:01 +0000 (17:32 +1000)]
Merge tag 'drm-amdkfd-fixes-2015-03-19' of git://people.freedesktop.org/~gabbayo/linux into drm-fixes
- Fixing SDMA initialization when in non-HWS mode (debug mode)
- Memory leak fix when destroying kernel queue
- Fix number of available compute pipelines according to new firmware
* tag 'drm-amdkfd-fixes-2015-03-19' of git://people.freedesktop.org/~gabbayo/linux:
drm/radeon: Changing number of compute pipe lines
drm/amdkfd: Fix SDMA queue init. in non-HWS mode
drm/amdkfd: destroy mqd when destroying kernel queue
Christophe Vu-Brugier [Thu, 19 Mar 2015 13:30:13 +0000 (14:30 +0100)]
target: do not reject FUA CDBs when write cache is enabled but emulate_write_cache is 0
A check that rejects a CDB with FUA bit set if no write cache is
emulated was added by the following commit:
fde9f50 target: Add sanity checks for DPO/FUA bit usage
The condition is as follows:
if (!dev->dev_attrib.emulate_fua_write ||
!dev->dev_attrib.emulate_write_cache)
However, this check is wrong if the backend device supports WCE but
"emulate_write_cache" is disabled.
This patch uses se_dev_check_wce() (previously named
spc_check_dev_wce) to invoke transport->get_write_cache() if the
device has a write cache or check the "emulate_write_cache" attribute
otherwise.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christophe Vu-Brugier <cvubrugier@fastmail.fm>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Nicholas Bellinger [Thu, 5 Mar 2015 03:28:24 +0000 (03:28 +0000)]
target: Fix virtual LUN=0 target_configure_device failure OOPs
This patch fixes a NULL pointer dereference triggered by a late
target_configure_device() -> alloc_workqueue() failure that results
in target_free_device() being called with DF_CONFIGURED already set,
which subsequently OOPses in destroy_workqueue() code.
Currently this only happens at modprobe target_core_mod time when
core_dev_setup_virtual_lun0() -> target_configure_device() fails,
and the explicit target_free_device() gets called.
To address this bug originally introduced by commit
0fd97ccf45, go
ahead and move DF_CONFIGURED to end of target_configure_device()
code to handle this special failure case.
Reported-by: Claudio Fleiner <cmf@daterainc.com>
Cc: Claudio Fleiner <cmf@daterainc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org> # v3.7+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Nicholas Bellinger [Fri, 27 Feb 2015 11:54:13 +0000 (03:54 -0800)]
target/pscsi: Fix NULL pointer dereference in get_device_type
This patch fixes a NULL pointer dereference OOPs with pSCSI backends
within target_core_stat.c code. The bug is caused by a configfs attr
read if no pscsi_dev_virt->pdv_sd has been configured.
Reported-by: Olaf Hering <olaf@aepfle.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Dan Carpenter [Wed, 25 Feb 2015 13:21:03 +0000 (16:21 +0300)]
tcm_fc: missing curly braces in ft_invl_hw_context()
This patch adds a missing set of conditional check braces in
ft_invl_hw_context() originally introduced by commit
dcd998ccd
when handling DDP failures in ft_recv_write_data() code.
commit
dcd998ccdbf74a7d8fe0f0a44e85da1ed5975946
Author: Kiran Patil <kiran.patil@intel.com>
Date: Wed Aug 3 09:20:01 2011 +0000
tcm_fc: Handle DDP/SW fc_frame_payload_get failures in ft_recv_write_data
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Kiran Patil <kiran.patil@intel.com>
Cc: <stable@vger.kernel.org> # 3.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Bart Van Assche [Wed, 18 Feb 2015 14:33:58 +0000 (15:33 +0100)]
target: Fix reference leak in target_get_sess_cmd() error path
This patch fixes a se_cmd->cmd_kref leak buf when se_sess->sess_tearing_down
is true within target_get_sess_cmd() submission path code.
This se_cmd reference leak can occur during active session shutdown when
ack_kref=1 is passed by target_submit_cmd_[map_sgls,tmr]() callers.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: <stable@vger.kernel.org> # 3.6+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Bart Van Assche [Thu, 12 Feb 2015 10:48:49 +0000 (11:48 +0100)]
loop/usb/vhost-scsi/xen-scsiback: Fix use of __transport_register_session
This patch changes loopback, usb-gadget, vhost-scsi and xen-scsiback
fabric code to invoke transport_register_session() instead of the
unprotected flavour, to ensure se_tpg->session_lock is taken when
adding new session list nodes to se_tpg->tpg_sess_list.
Note that since these four fabric drivers already hold their own
internal TPG mutexes when accessing se_tpg->tpg_sess_list, and
consist of a single se_session created through configfs attribute
access, no list corruption can currently occur.
So for correctness sake, go ahead and use the se_tpg->session_lock
protected version for these four fabric drivers.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Bart Van Assche [Fri, 20 Mar 2015 05:25:16 +0000 (22:25 -0700)]
tcm_qla2xxx: Fix incorrect use of __transport_register_session
This patch fixes the incorrect use of __transport_register_session()
in tcm_qla2xxx_check_initiator_node_acl() code, that does not perform
explicit se_tpg->session_lock when accessing se_tpg->tpg_sess_list
to add new se_sess nodes.
Given that tcm_qla2xxx_check_initiator_node_acl() is not called with
qla_hw->hardware_lock held for all accesses of ->tpg_sess_list, the
code should be using transport_register_session() instead.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: <stable@vger.kernel.org> # 3.5+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Nicholas Bellinger [Mon, 23 Feb 2015 08:57:51 +0000 (00:57 -0800)]
iscsi-target: Avoid early conn_logout_comp for iser connections
This patch fixes a iser specific logout bug where early complete()
of conn->conn_logout_comp in iscsit_close_connection() was causing
isert_wait4logout() to complete too soon, triggering a use after
free NULL pointer dereference of iscsi_conn memory.
The complete() was originally added for traditional iscsi-target
when a ISCSI_LOGOUT_OP failed in iscsi_target_rx_opcode(), but given
iser-target does not wait in logout failure, this special case needs
to be avoided.
Reported-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Slava Shwartsman <valyushash@gmail.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Nicholas Bellinger [Thu, 26 Feb 2015 06:56:37 +0000 (22:56 -0800)]
Revert "iscsi-target: Avoid IN_LOGOUT failure case for iser-target"
This reverts commit
72859d91d93319c00a18c29f577e56bf73a8654a.
The original patch was wrong, iscsit_close_connection() still needs
to release iscsi_conn during both normal + exception IN_LOGOUT status
with ib_isert enabled.
The original OOPs is due to completing conn_logout_comp early within
iscsit_close_connection(), causing isert_wait4logout() to complete
instead of waiting for iscsit_logout_post_handler_*() to be called.
Reported-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Nicholas Bellinger [Mon, 23 Feb 2015 06:17:13 +0000 (22:17 -0800)]
target: Disallow changing of WRITE cache/FUA attrs after export
Now that incoming FUA=1 bit check is enforced for backends with FUA or
WCE disabled, go ahead and disallow the changing of related backend
attributes when active fabric exports exist.
This is required to avoid potential failures with existing initiator
LUN registrations that have been previously created with FUA=1.
Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Linus Torvalds [Thu, 19 Mar 2015 23:43:10 +0000 (16:43 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"An update to Synaptics driver that makes it usable with the 2015
lineup from Lenovo"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Revert "Input: synaptics - use dmax in input_mt_assign_slots"
Input: synaptics - remove X250 from the topbuttonpad list
Input: synaptics - remove X1 Carbon 3rd gen from the topbuttonpad list
Input: synaptics - re-route tracksticks buttons on the Lenovo 2015 series
Input: synaptics - remove TOPBUTTONPAD property for Lenovos 2015
Input: synaptics - retrieve the extended capabilities in query $10
Input: synaptics - do not retrieve the board id on old firmwares
Input: synaptics - handle spurious release of trackstick buttons
Input: synaptics - fix middle button on Lenovo 2015 products
Input: synaptics - skip quirks when post-2013 dimensions
Input: synaptics - support min/max board id in min_max_pnpid_table
Input: synaptics - remove obsolete min/max quirk for X240
Input: synaptics - query min dimensions for fw v8.1
Input: synaptics - log queried and quirked dimension values
Input: synaptics - split synaptics_resolution(), query first
Linus Torvalds [Thu, 19 Mar 2015 23:36:24 +0000 (16:36 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"This fixes bugs in zero-copy splice to the fuse device"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: explicitly set /dev/fuse file's private_data
fuse: set stolen page uptodate
fuse: notify: don't move pages
Linus Torvalds [Thu, 19 Mar 2015 23:27:36 +0000 (16:27 -0700)]
Merge branch 'overlayfs-next' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"This fixes minor issues with the multi-layer update in v4.0"
* 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: upper fs should not be R/O
ovl: check lowerdir amount for non-upper mount
ovl: print error message for invalid mount options
Linus Torvalds [Thu, 19 Mar 2015 23:18:30 +0000 (16:18 -0700)]
Merge tag 'mmc-v4.0-rc4' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fix from Ulf Hansson:
"MMC core: fix error path in mmc_pwrseq_simple_alloc()"
* tag 'mmc-v4.0-rc4' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: pwrseq_simple: fix error path in mmc_pwrseq_simple_alloc
Linus Torvalds [Thu, 19 Mar 2015 22:52:28 +0000 (15:52 -0700)]
Merge tag 'pinctrl-v4.0-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here is a slew of pin control fixes I've accumulated for the v4.0
kernel. Nothing special, just driver fixes (mainly embedded Intel it
seems) and a misunderstanding regarding the stub functions was
reverted:
- Fix up consumer return values on pin control stubs.
- Four patches fixing up the interrupt handling and sleep context
save in the Baytrail driver.
- Make default output directions work properly in the Cherryview
driver.
- Fix interrupt locking in the AT91 driver.
- Fix setting interrupt generating lines as input in the sunxi
driver"
* tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sun4i: GPIOs configured as irq must be set to input before reading
pinctrl: at91: move lock/unlock_as_irq calls into request/release
pinctrl: update direction_output function of cherryview driver
pinctrl: baytrail: Save pin context over system sleep
pinctrl: baytrail: Rework interrupt handling
pinctrl: baytrail: Clear interrupt triggering from pins that are in GPIO mode
pinctrl: baytrail: Relax GPIO request rules
Revert "pinctrl: consumer: use correct retval for placeholder functions"
Linus Torvalds [Thu, 19 Mar 2015 22:24:28 +0000 (15:24 -0700)]
Merge tag 'nios2-fixes-v4.0-rc5' of git://git.rocketboards.org/linux-socfpga-next
Pull two arch/nios2 fixes from Ley Foon Tan:
- Remove ucontext.h from exported arch headers
- nios2: mm: do not invoke OOM killer on kernel fault OOM
* tag 'nios2-fixes-v4.0-rc5' of git://git.rocketboards.org/linux-socfpga-next:
nios2: mm: do not invoke OOM killer on kernel fault OOM
nios2: Remove ucontext.h from exported arch headers
Linus Torvalds [Thu, 19 Mar 2015 20:16:49 +0000 (13:16 -0700)]
Merge git://git./linux/kernel/git/davem/ide
Pull IDE fix from David Miller:
"Just one fix to convert a by-hand conversion of jiffies to msecs, from
Nicholas McGuire"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
ide_tape: convert jiffies with jiffies_to_msecs
Linus Torvalds [Thu, 19 Mar 2015 20:11:55 +0000 (13:11 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
1) Some command cases of semtimedop() not even handled due to miscoded
comparison on sparc64. From Rob Gardner.
2) Due to two bugs, /proc/kcore wan't working properly on sparc.
3) Make sure fatal traps stop all running cpus, from Dave Kleikamp.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Fix /proc/kcore
sparc: semtimedop() unreachable due to comparison error
sparc: io_64.h: Replace io function-link macros
sparc64: fatal trap should stop all cpus
arch: sparc: kernel: starfire.c: Remove unused function
arch: sparc: kernel: traps_64.c: Remove some unused functions
Christoph Hellwig [Thu, 5 Mar 2015 13:17:31 +0000 (14:17 +0100)]
Subject: nfsd: don't recursively call nfsd4_cb_layout_fail
Due to a merge error when creating
c5c707f9 ("nfsd: implement pNFS
layout recalls"), we recursively call nfsd4_cb_layout_fail from itself,
leading to stack overflows.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes:
c5c707f9 ("nfsd: implement pNFS layout recalls")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
fs/nfsd/nfs4layouts.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index
3c1bfa1..
1028a06 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -587,8 +587,6 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)
rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, sizeof(addr_str));
- nfsd4_cb_layout_fail(ls);
-
printk(KERN_WARNING
"nfsd: client %s failed to respond to layout recall. "
" Fencing..\n", addr_str);
--
1.9.1
Linus Torvalds [Thu, 19 Mar 2015 18:19:44 +0000 (11:19 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix packet header offset calculation in _decode_session6(), from
Hajime Tazaki.
2) Fix route leak in error paths of xfrm_lookup(), from Huaibin Wang.
3) Be sure to clear state properly when scans fail in iwlwifi mvm code,
from Luciano Coelho.
4) iwlwifi tries to stop scans that aren't actually running, also from
Luciano Coelho.
5) mac80211 should drop mesh frames that are not encrypted, fix from
Bob Copeland.
6) Add new device ID to b43 wireless driver for BCM432228 chips, from
Rafał Miłecki.
7) Fix accidental addition of members after variable sized array in
struct tc_u_hnode, from WANG Cong.
8) Don't re-enable interrupts until after we call napi_complete() in
ibmveth and WIZnet drivers, frm Yongbae Park.
9) Fix regression in vlan tag handling of fec driver, from Fugang Duan.
10) If a network namespace change fails during rtnl_newlink(), we don't
unwind the device registry properly.
11) Fix two TCP regressions, from Neal Cardwell:
- Don't allow snd_cwnd_cnt to accumulate huge values due to missing
test in tcp_cong_avoid_ai().
- Restore CUBIC back to advancing cwnd by 1.5x packets per RTT.
12) Fix performance regression in xne-netback involving push TX
notifications, from David Vrabel.
13) __skb_tstamp_tx() can be called with a NULL sk pointer, do not
dereference blindly. From Willem de Bruijn.
14) Fix potential stack overflow in RDS protocol stack, from Arnd
Bergmann.
15) VXLAN_VID_MASK used incorrectly in new remote checksum offload
support of VXLAN driver. Fix from Alexey Kodanev.
16) Fix too small netlink SKB allocation in inet_diag layer, from Eric
Dumazet.
17) ieee80211_check_combinations() does not count interfaces correctly,
from Andrei Otcheretianski.
18) Hardware feature determination in bxn2x driver references a piece of
software state that actually isn't initialized yet, fix from Michal
Schmidt.
19) inet_csk_wait_for_connect() needs a sched_annotate_sleep()
annoation, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
Revert "net: cx82310_eth: use common match macro"
net/mlx4_en: Set statistics bitmap at port init
IB/mlx4: Saturate RoCE port PMA counters in case of overflow
net/mlx4_en: Fix off-by-one in ethtool statistics display
IB/mlx4: Verify net device validity on port change event
act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome
Revert "smc91x: retrieve IRQ and trigger flags in a modern way"
inet: Clean up inet_csk_wait_for_connect() vs. might_sleep()
ip6_tunnel: fix error code when tunnel exists
netdevice.h: fix ndo_bridge_* comments
bnx2x: fix encapsulation features on 57710/57711
mac80211: ignore CSA to same channel
nl80211: ignore HT/VHT capabilities without QoS/WMM
mac80211: ask for ECSA IE to be considered for beacon parse CRC
mac80211: count interfaces correctly for combination checks
isdn: icn: use strlcpy() when parsing setup options
rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
caif: fix MSG_OOB test in caif_seqpkt_recvmsg()
bridge: reset bridge mtu after deleting an interface
can: kvaser_usb: Fix tx queue start/stop race conditions
...
Tom Van Braeckel [Mon, 12 Jan 2015 04:22:16 +0000 (05:22 +0100)]
fuse: explicitly set /dev/fuse file's private_data
The misc subsystem (which is used for /dev/fuse) initializes private_data to
point to the misc device when a driver has registered a custom open file
operation, and initializes it to NULL when a custom open file operation has
*not* been provided.
This subtle quirk is confusing, to the point where kernel code registers
*empty* file open operations to have private_data point to the misc device
structure. And it leads to bugs, where the addition or removal of a custom open
file operation surprisingly changes the initial contents of a file's
private_data structure.
So to simplify things in the misc subsystem, a patch [1] has been proposed to
*always* set the private_data to point to the misc device, instead of only
doing this when a custom open file operation has been registered.
But before this patch can be applied we need to modify drivers that make the
assumption that a misc device file's private_data is initialized to NULL
because they didn't register a custom open file operation, so they don't rely
on this assumption anymore. FUSE uses private_data to store the fuse_conn and
errors out if this is not initialized to NULL at mount time.
Hence, we now set a file's private_data to NULL explicitly, to be independent
of whatever value the misc subsystem initializes it to by default.
[1] https://lkml.org/lkml/2014/12/4/939
Reported-by: Giedrius Statkevicius <giedriuswork@gmail.com>
Reported-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Tom Van Braeckel <tomvanbraeckel@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Peter Hurley [Tue, 17 Mar 2015 20:46:33 +0000 (16:46 -0400)]
Revert "of: Fix premature bootconsole disable with 'stdout-path'"
This reverts commit
2fa645cb2703d9b3786d850db815414dfeefa51d.
The assumption that at least 1 preferred console will be registered
when the stdout-path property is set is invalid, which can result
in _no_ consoles.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Brian Norris [Tue, 17 Mar 2015 19:30:31 +0000 (12:30 -0700)]
of: handle both '/' and ':' in path strings
Commit
106937e8ccdc ("of: fix handling of '/' in options for
of_find_node_by_path()") caused a regression in OF handling of
stdout-path. While it fixes some cases which have '/' after the ':', it
breaks cases where there is more than one '/' *before* the ':'.
For example, it breaks this boot string
stdout-path = "/rdb/serial@
f040ab00:115200";
So rather than doing sequentialized checks (first for '/', then for ':';
or vice versa), to get the correct behavior we need to check for the
first occurrence of either one of them.
It so happens that the handy strcspn() helper can do just that.
Fixes:
106937e8ccdc ("of: fix handling of '/' in options for of_find_node_by_path()")
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: stable@vger.kernel.org # 3.19
Acked-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Brian Norris [Tue, 17 Mar 2015 19:30:32 +0000 (12:30 -0700)]
of: unittest: Add option string test case with longer path
There were regressions seen with commit
106937e8ccdc ("of: fix handling
of '/' in options for of_find_node_by_path()"), where we couldn't handle
extra '/' before the ':'. Let's test for this now.
Confirmed that this test fails without the previous patch and passes
when patched. All other tests pass.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Laurent Pinchart [Tue, 17 Mar 2015 22:21:32 +0000 (00:21 +0200)]
of/irq: Fix of_irq_parse_one() returned error codes
The error code paths that require cleanup use a goto to jump to the
cleanup code and return an error code. However, the error code variable
res, which is initialized to -EINVAL when declared, is then overwritten
with the return value of of_parse_phandle_with_args(), and reused as the
return code from of_irq_parse_one(). This leads to an undetermined error
being returned instead of the expected -EINVAL value. Fix it.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Rob Herring <robh@kernel.org>
NeilBrown [Sat, 21 Feb 2015 04:15:16 +0000 (15:15 +1100)]
mmc: pwrseq_simple: fix error path in mmc_pwrseq_simple_alloc
The current error-path code (when gpiod_get_index() reports
an error) can never free pwrseq->reset_gpios[0], but might
try to tree pwrseq->reset_gpios[-1], which has unfortunate
consequences.
Signed-off-by: NeilBrown <neil@brown.name>
Fixes:
934f1f48330ed695927a51fa068dc5d673f2da19
Acked-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Dave Airlie [Thu, 19 Mar 2015 04:02:15 +0000 (14:02 +1000)]
Merge branch 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
single radeon fix.
* 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: drop ttm two ended allocation
Dave Airlie [Thu, 19 Mar 2015 04:01:42 +0000 (14:01 +1000)]
Merge branch 'exynos-drm-fixes' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes
Some urgent regression fixes to booting failures Exynos DRM occured.
Summary:
- Fix two urgent null pointer dereference bugs in case of enabling
or disabling IOMMU. There was two cases to these issues.
One is that plane->crtc is accessed by exynos_disable_plane()
when device tree binding is broken so device driver tries
to release, which means that the mode set operation isn't invoked yet
so plane->crtc is still NULL and exynos_disable_plane() will access
NULL pointer. This issue is fixed by checking if the plane->crtc
is NULL or not in exynos_disable_plane()
Other is that fimd_wait_for_vblank() is called to avoid from page fault
with IOMMU before the ctx object is created. At this time,
fimd_wait_for_vblank() tries to access ctx->crtc but the ctx->crtc
is still NULL because exynos_drm_crtc_create() isn't called yet.
This issue is fixed by creating a crtc object and setting it to
ctx->crtc prior to fimd_wait_for_vblank() call.
For more details, you can refer to below an e-mail thread,
http://www.spinics.net/lists/linux-samsung-soc/msg42436.html
- Remove unnecessary file not used and fix trivial issues.
* 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: fix the initialization order in FIMD
drm/exynos: fix typo config name correctly.
drm/exynos: Check for NULL dereference of crtc
drm/exynos: IS_ERR() vs NULL bug
drm/exynos: remove unused files
Nicholas Mc Guire [Tue, 3 Mar 2015 10:52:51 +0000 (05:52 -0500)]
ide_tape: convert jiffies with jiffies_to_msecs
Use jiffies_to_msecs for converting jiffies as it handles all of the corner
cases reliably and also helps readability. The printk format is fixed up
as jiffies_to_msecs returns unsigned int not unsigned long.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Zary [Wed, 18 Mar 2015 22:01:01 +0000 (23:01 +0100)]
Revert "net: cx82310_eth: use common match macro"
This reverts commit
11ad714b98f6d9ca0067568442afe3e70eb94845 because
it breaks cx82310_eth.
The custom USB_DEVICE_CLASS macro matches
bDeviceClass, bDeviceSubClass and bDeviceProtocol
but the common USB_DEVICE_AND_INTERFACE_INFO matches
bInterfaceClass, bInterfaceSubClass and bInterfaceProtocol instead, which are
not specified.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Mar 2015 02:15:28 +0000 (19:15 -0700)]
sparc: Fix /proc/kcore
/proc/kcore investigates the "System RAM" elements in /proc/iomem to
initialize it's memory tables. Therefore we have to register them
before it tries to do so. kcore uses device_initcall() so let's
use arch_initcall() for the registry.
Also we need ARCH_PROC_KCORE_TEXT to get the virtual addresses of
the kernel image correct.
Reported-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2015 19:17:17 +0000 (15:17 -0400)]
Merge branch 'mlx4-net'
Or Gerlitz says:
====================
mlx4 driver fixes for 4.0-rc
Just few small fixes for the 4.0 rc cycle.
The fix from Moni addresses an issue from 4.0-rc1 so we
just need it for net.
Eran's fix for off-by-one should go to 3.19.y too.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Wed, 18 Mar 2015 14:51:38 +0000 (16:51 +0200)]
net/mlx4_en: Set statistics bitmap at port init
Port statistics bitmap will now be initialized at port init. Even before
starting the port, statistics are visible to the user and must be properly masked.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Majd Dibbiny [Wed, 18 Mar 2015 14:51:37 +0000 (16:51 +0200)]
IB/mlx4: Saturate RoCE port PMA counters in case of overflow
For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of
overflow, according to the IB spec, we have to saturate a counter to its
max value, do that.
Fixes:
c37791349cc7 ('IB/mlx4: Support PMA counters for IBoE')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Wed, 18 Mar 2015 14:51:36 +0000 (16:51 +0200)]
net/mlx4_en: Fix off-by-one in ethtool statistics display
NUM_PORT_STATS was 9 instead of 10, which caused off-by-one bug when
displaying the statistics starting from tx_chksum_offload in ethtool.
Fixes:
f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Wed, 18 Mar 2015 14:51:35 +0000 (16:51 +0200)]
IB/mlx4: Verify net device validity on port change event
Processing an event is done in a different context from the one when
the event was dispatched. This requires a check that the slave
net device is still valid when the event is being processed. The check is done
under the iboe lock which ensure correctness.
Fixes:
a57500903093 ('IB/mlx4: Add port aggregation support')
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 18 Mar 2015 18:17:03 +0000 (11:17 -0700)]
Merge tag 'sound-4.0-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This is a collection of many small fixes. Most of fixes are for ASoC
drivers, including the fixes of wrong field usages for boolean kctls.
In addition, there is a fix in ASoC core for adding proper locks for
component lists, and a fix for a HD-audio regression by the previous
mono channel fix"
* tag 'sound-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ALSA: hda - Treat stereo-to-mono mix properly
ASoC: wm9713: Fix wrong value references for boolean kctl
ASoC: wm9712: Fix wrong value references for boolean kctl
ASoC: wm8960: Fix wrong value references for boolean kctl
ASoC: wm8955: Fix wrong value references for boolean kctl
ASoC: wm8904: Fix wrong value references for boolean kctl
ASoC: wm8903: Fix wrong value references for boolean kctl
ASoC: wm8731: Fix wrong value references for boolean kctl
ASoC: wm2000: Fix wrong value references for boolean kctl
ASoC: tas5086: Fix wrong value references for boolean kctl
ASoC: pcm1681: Fix wrong value references for boolean kctl
ASoC: es8238: Fix wrong value references for boolean kctl
ASoC: cs4271: Fix wrong value references for boolean kctl
ASoC: ak4641: Fix wrong value references for boolean kctl
ASoC: adav80x: Fix wrong value references for boolean kctl
ASoC: Fix component lists locking
ASoC: Intel: remove conflicts when load/unload multiple firmware images
ASoC: rt286: Change the DMI mapping for Dino
ASoC: sgtl5000: remove useless register write clearing CHRGPUMP_POWERUP
ASoC: fsl_ssi: Don't try to round-up for PM divisor calculation
...
Linus Torvalds [Wed, 18 Mar 2015 18:10:41 +0000 (11:10 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"Fix a bug in the ARM XTS implementation that can cause failures in
decrypting encrypted disks, and fix is a memory overwrite bug that can
cause a crash which can be triggered from userspace"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: aesni - fix memory usage in GCM decryption
crypto: arm/aes update NEON AES module to latest OpenSSL version
Linus Torvalds [Wed, 18 Mar 2015 17:46:39 +0000 (10:46 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/livepatching
Pull livepatching fix from Jiri Kosina:
- fix for potential race with module loading, from Petr Mladek.
The race is very unlikely to be seen in real world and has been found
by code inspection, but should be fixed for 4.0 anyway.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Fix subtle race with coming and going modules
Linus Torvalds [Wed, 18 Mar 2015 17:42:19 +0000 (10:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- fixes for pen pen proximity / touch events in wacom driver, from Ping
Cheng and Benjamin Tissoires
- two new device-specific quirks from Oliver Neukum and Forest
Wilkinson
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: wacom: check for wacom->shared before following the pointer
HID: tivo: enable all buttons on the TiVo Slide Pro remote
HID: add ALWAYS_POLL quirk for a Logitech 0xc007
HID: wacom: rely on actual touch down count to decide touch_down
HID: wacom: do not send pen events before touch is up/forced out
Mark Brown [Tue, 17 Mar 2015 23:25:36 +0000 (23:25 +0000)]
dmaengine: pl08x: Define capabilities for generic capabilities reporting
Ensure that clients can automatically configure themselves and avoid a
nasty warning at boot by providing capability information.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Peter Chen [Thu, 12 Mar 2015 01:47:53 +0000 (09:47 +0800)]
usb: common: otg-fsm: only signal connect after switching to peripheral
We should signal connect (pull up dp) after we have already
at peripheral mode, otherwise, the dp may be toggled due to
we reset controller or do disconnect during the initialization
for peripheral, then, the host may be confused during the
enumeration, eg, it finds the reset can't succeed, but the
device is still there, see below error message.
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: Cannot enable port 1. Maybe the USB cable is bad?
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: Cannot enable port 1. Maybe the USB cable is bad?
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: Cannot enable port 1. Maybe the USB cable is bad?
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: Cannot enable port 1. Maybe the USB cable is bad?
hub 1-0:1.0: unable to enumerate USB device on port 1
Fixes: the issue existed when the otg fsm code was added.
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hans de Goede [Mon, 16 Mar 2015 14:18:13 +0000 (15:18 +0100)]
uas: Add US_FL_NO_ATA_1X for Initio Corporation controllers / devices
A new uas compatible controller has shown up in some people's devices from
the manufacturer Initio Corporation, this controller needs the US_FL_NO_ATA_1X
quirk to work properly with uas, so add it to the uas quirks table.
Reported-and-tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: stable@vger.kernel.org # 3.16
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alex Deucher [Tue, 17 Mar 2015 15:53:33 +0000 (11:53 -0400)]
drm/radeon: drop ttm two ended allocation
radeon_bo_create() calls radeon_ttm_placement_from_domain()
before ttm_bo_init() is called. radeon_ttm_placement_from_domain()
uses the ttm bo size to determine when to select top down
allocation but since the ttm bo is not initialized yet the
check is always false. It only took effect when buffers
were validated later. It also seemed to regress suspend
and resume on some systems possibly due to it not
taking effect in radeon_bo_create().
radeon_bo_create() and radeon_ttm_placement_from_domain()
need to be reworked substantially for this to be optimally
effective. Re-enable it at that point.
Noticed-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Boris Brezillon [Tue, 17 Mar 2015 16:15:46 +0000 (17:15 +0100)]
USB: ehci-atmel: rework clk handling
The EHCI IP only needs the UTMI/UPLL (uclk) and the peripheral (iclk)
clocks to work properly. Remove the useless system clock (fclk).
Avoid calling set_rate on the fixed rate UTMI/IPLL clock and remove
useless IS_ENABLED(CONFIG_COMMON_CLK) tests (all at91 platforms have been
moved to the CCF).
This patch also fixes a bug introduced by
3440ef1 (ARM: at91/dt: fix USB
high-speed clock to select UTMI), which was leaving the usb clock
uninitialized and preventing the OHCI driver from setting the usb clock
rate to 48MHz.
This bug was caused by several things:
1/ usb clock drivers set the CLK_SET_RATE_GATE flag, which means the rate
cannot be changed once the clock is prepared
2/ The EHCI driver was retrieving and preparing/enabling the uhpck
clock which was in turn preparing its parent clock (the usb clock),
thus preventing any rate change because of 1/
Fixes:
3440ef169100 ("ARM: at91/dt: fix USB high-speed clock to select UTMI")
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hyungwon Hwang [Thu, 12 Mar 2015 04:36:02 +0000 (13:36 +0900)]
drm/exynos: fix the initialization order in FIMD
Since commit
0f04cf8df0b20a97369cb634663fef0578cbf273 ("drm/exynos:
fix wrong pipe calculation for crtc"), fimd_clear_channel() can be
called when is_drm_iommu_supported() returns true. In this case,
the kernel is going to be panicked because crtc is not set yet.
[ 1.211156] [drm] Initialized drm 1.1.0
20060810
[ 1.216785] Unable to handle kernel NULL pointer dereference at virtual address
00000350
[ 1.223415] pgd =
c0004000
[ 1.226086] [
00000350] *pgd=
00000000
[ 1.229649] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 1.234940] Modules linked in:
[ 1.237982] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
4.0.0-rc1-00062-g7a7cc79-dirty #123
[ 1.246136] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 1.252214] task:
ee8c8000 ti:
ee8d0000 task.ti:
ee8d0000
[ 1.257606] PC is at fimd_wait_for_vblank+0x8/0xc8
[ 1.262370] LR is at fimd_bind+0x138/0x1a8
[ 1.266450] pc : [<
c02fb63c>] lr : [<
c02fb834>] psr:
20000113
[ 1.266450] sp :
ee8d1d28 ip :
00000000 fp :
00000000
[ 1.277906] r10:
00000001 r9 :
c09d693c r8 :
c0a2d6a8
[ 1.283114] r7 :
00000034 r6 :
00000001 r5 :
ee0bb400 r4 :
ee244c10
[ 1.289624] r3 :
00000000 r2 :
00000000 r1 :
00000001 r0 :
00000000
[ 1.296135] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 1.303426] Control:
10c5387d Table:
4000404a DAC:
00000015
[ 1.309154] Process swapper/0 (pid: 1, stack limit = 0xee8d0210)
[ 1.315143] Stack: (0xee8d1d28 to 0xee8d2000)
[ 1.319486] 1d20:
00000000 c0113d18 ee0bb400 ee0bb400 ee245c30 eebbe210
[ 1.327645] 1d40:
ee008a40 ee244c10 ee0bb400 00000001 00000034 c02fb834 00000000 c030a858
[ 1.335804] 1d60:
ee244a10 eeb60780 ee008a40 eeb60740 ee0bb400 c03030d0 00000000 00000000
[ 1.343963] 1d80:
ee244a10 ee0bb400 00000000 eeb60740 eeb60810 00000000 00000000 c02f6ba4
[ 1.352123] 1da0:
ee0bb400 00000000 00000000 c02e0500 ee244a00 c0a04a14 ee0bb400 c02e1de4
[ 1.360282] 1dc0:
00000000 c030a858 00000002 eeb60820 eeb60820 00000002 eeb60780 c03033d4
[ 1.368441] 1de0:
c06e9cec 00000000 ee244a10 eeb60780 c0a056f8 c03035fc c0a04b24 c0a04b24
[ 1.376600] 1e00:
ee244a10 00000001 c0a049d0 c02f6d34 c0ad462c eeba0790 00000000 ee244a10
[ 1.384759] 1e20:
ffffffed c0a049d0 00000000 c03090b0 ee244a10 c0ad462c c0a2d840 c03077a0
[ 1.392919] 1e40:
eeb5e880 c024b738 000008db ee244a10 c0a049d0 ee244a44 00000000 c09e71d8
[ 1.401078] 1e60:
000000c6 c0307a6c c0a049d0 00000000 c03079e0 c0305ea8 ee826e5c ee1dc7b4
[ 1.409237] 1e80:
c0a049d0 eeb5e880 c0a058a8 c0306e2c c0896204 c0a049d0 c06e9d10 c0a049d0
[ 1.417396] 1ea0:
c06e9d10 c0ad4600 00000000 c0308360 00000000 00000003 c06e9d10 c02f6e14
[ 1.425555] 1ec0:
00000000 c0896204 ffffffff 00000000 00000000 00000000 00000000 00000000
[ 1.433714] 1ee0:
00000000 00000000 c02f6d5c c02f6d5c 00000000 eeb5d740 c09e71d8 c0008a30
[ 1.441874] 1f00:
ef7fca5e 00000000 00000000 00000066 00000000 ee8d1f28 c003ff1c c02514e8
[ 1.450033] 1f20:
60000113 ffffffff c093906c ef7fca5e 000000c6 c004018c 00000000 c093906c
[ 1.458192] 1f40:
c08a9690 c093840c 00000006 00000006 c09eb2ac c09c0d74 00000006 c09c0d54
[ 1.466351] 1f60:
c0a3d680 c09745a0 c09d693c 000000c6 00000000 c0974db4 00000006 00000006
[ 1.474510] 1f80:
c09745a0 ffffffff 00000000 c0692e00 00000000 00000000 00000000 00000000
[ 1.482669] 1fa0:
00000000 c0692e08 00000000 c000f040 00000000 00000000 00000000 00000000
[ 1.490828] 1fc0:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1.498988] 1fe0:
00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff
[ 1.507159] [<
c02fb63c>] (fimd_wait_for_vblank) from [<
c02fb834>] (fimd_bind+0x138/0x1a8)
[ 1.515313] [<
c02fb834>] (fimd_bind) from [<
c03030d0>] (component_bind_all+0xc4/0x20c)
[ 1.523209] [<
c03030d0>] (component_bind_all) from [<
c02f6ba4>] (exynos_drm_load+0xa0/0x140)
[ 1.531632] [<
c02f6ba4>] (exynos_drm_load) from [<
c02e0500>] (drm_dev_register+0xa0/0xf4)
[ 1.539788] [<
c02e0500>] (drm_dev_register) from [<
c02e1de4>] (drm_platform_init+0x44/0xcc)
[ 1.548121] [<
c02e1de4>] (drm_platform_init) from [<
c03033d4>] (try_to_bring_up_master.part.1+0xc8/0x104)
[ 1.557668] [<
c03033d4>] (try_to_bring_up_master.part.1) from [<
c03035fc>] (component_master_add_with_match+0xd0/0x118)
[ 1.568431] [<
c03035fc>] (component_master_add_with_match) from [<
c02f6d34>] (exynos_drm_platform_probe+0xf0/0x118)
[ 1.578847] [<
c02f6d34>] (exynos_drm_platform_probe) from [<
c03090b0>] (platform_drv_probe+0x48/0x98)
[ 1.588052] [<
c03090b0>] (platform_drv_probe) from [<
c03077a0>] (driver_probe_device+0x140/0x380)
[ 1.596902] [<
c03077a0>] (driver_probe_device) from [<
c0307a6c>] (__driver_attach+0x8c/0x90)
[ 1.605321] [<
c0307a6c>] (__driver_attach) from [<
c0305ea8>] (bus_for_each_dev+0x54/0x88)
[ 1.613480] [<
c0305ea8>] (bus_for_each_dev) from [<
c0306e2c>] (bus_add_driver+0xec/0x200)
[ 1.621640] [<
c0306e2c>] (bus_add_driver) from [<
c0308360>] (driver_register+0x78/0xf4)
[ 1.629625] [<
c0308360>] (driver_register) from [<
c02f6e14>] (exynos_drm_init+0xb8/0x11c)
[ 1.637785] [<
c02f6e14>] (exynos_drm_init) from [<
c0008a30>] (do_one_initcall+0xac/0x1ec)
[ 1.645950] [<
c0008a30>] (do_one_initcall) from [<
c0974db4>] (kernel_init_freeable+0x194/0x268)
[ 1.654626] [<
c0974db4>] (kernel_init_freeable) from [<
c0692e08>] (kernel_init+0x8/0xe4)
[ 1.662699] [<
c0692e08>] (kernel_init) from [<
c000f040>] (ret_from_fork+0x14/0x34)
[ 1.670246] Code:
eaffffd5 c09df884 e92d40f0 e24dd01c (
e5905350)
[ 1.676408] ---[ end trace
804468492f306a6f ]---
[ 1.680948] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 1.680948]
[ 1.690035] CPU1: stopping
[ 1.692727] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D
4.0.0-rc1-00062-g7a7cc79-dirty #123
[ 1.702097] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 1.708192] [<
c0016c84>] (unwind_backtrace) from [<
c00129bc>] (show_stack+0x10/0x14)
[ 1.715908] [<
c00129bc>] (show_stack) from [<
c0696f58>] (dump_stack+0x78/0xc8)
[ 1.723108] [<
c0696f58>] (dump_stack) from [<
c0015020>] (handle_IPI+0x16c/0x2b4)
[ 1.730485] [<
c0015020>] (handle_IPI) from [<
c00086bc>] (gic_handle_irq+0x64/0x6c)
[ 1.738036] [<
c00086bc>] (gic_handle_irq) from [<
c00134c0>] (__irq_svc+0x40/0x74)
[ 1.745498] Exception stack(0xee8fdf98 to 0xee8fdfe0)
[ 1.750533] df80:
00000000 00000000
[ 1.758695] dfa0:
ee8fdfe8 c0021780 c09df938 00000015 10c0387d c0a3d988 4000406a c09df8d4
[ 1.766853] dfc0:
c0a27a74 c09df940 01000000 ee8fdfe0 c00101c0 c00101c4 60000113 ffffffff
[ 1.775015] [<
c00134c0>] (__irq_svc) from [<
c00101c4>] (arch_cpu_idle+0x30/0x3c)
[ 1.782397] [<
c00101c4>] (arch_cpu_idle) from [<
c005e804>] (cpu_startup_entry+0x180/0x324)
[ 1.790639] [<
c005e804>] (cpu_startup_entry) from [<
40008764>] (0x40008764)
[ 1.797579] CPU0: stopping
[ 1.800272] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D
4.0.0-rc1-00062-g7a7cc79-dirty #123
[ 1.809642] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 1.815730] [<
c0016c84>] (unwind_backtrace) from [<
c00129bc>] (show_stack+0x10/0x14)
[ 1.823450] [<
c00129bc>] (show_stack) from [<
c0696f58>] (dump_stack+0x78/0xc8)
[ 1.830653] [<
c0696f58>] (dump_stack) from [<
c0015020>] (handle_IPI+0x16c/0x2b4)
[ 1.838030] [<
c0015020>] (handle_IPI) from [<
c00086bc>] (gic_handle_irq+0x64/0x6c)
[ 1.845581] [<
c00086bc>] (gic_handle_irq) from [<
c00134c0>] (__irq_svc+0x40/0x74)
[ 1.853043] Exception stack(0xc09ddf60 to 0xc09ddfa8)
[ 1.858081] df60:
00000000 00000000 c09ddfb0 c0021780 c09df938 00000001 ffffffff c0a3d680
[ 1.866239] df80:
c09c0dec c09df8d4 c0a27a74 c09df940 01000000 c09ddfa8 c00101c0 c00101c4
[ 1.874396] dfa0:
60000113 ffffffff
[ 1.877872] [<
c00134c0>] (__irq_svc) from [<
c00101c4>] (arch_cpu_idle+0x30/0x3c)
[ 1.885251] [<
c00101c4>] (arch_cpu_idle) from [<
c005e804>] (cpu_startup_entry+0x180/0x324)
[ 1.893499] [<
c005e804>] (cpu_startup_entry) from [<
c0974bc8>] (start_kernel+0x324/0x37c)
[ 1.901655] [<
c0974bc8>] (start_kernel) from [<
40008074>] (0x40008074)
[ 1.908161] CPU3: stopping
[ 1.910855] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G D
4.0.0-rc1-00062-g7a7cc79-dirty #123
[ 1.920225] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 1.926313] [<
c0016c84>] (unwind_backtrace) from [<
c00129bc>] (show_stack+0x10/0x14)
[ 1.934034] [<
c00129bc>] (show_stack) from [<
c0696f58>] (dump_stack+0x78/0xc8)
[ 1.941237] [<
c0696f58>] (dump_stack) from [<
c0015020>] (handle_IPI+0x16c/0x2b4)
[ 1.948613] [<
c0015020>] (handle_IPI) from [<
c00086bc>] (gic_handle_irq+0x64/0x6c)
[ 1.956165] [<
c00086bc>] (gic_handle_irq) from [<
c00134c0>] (__irq_svc+0x40/0x74)
[ 1.963626] Exception stack(0xee901f98 to 0xee901fe0)
[ 1.968661] 1f80:
00000000 00000000
[ 1.976823] 1fa0:
ee901fe8 c0021780 c09df938 00000015 10c0387d c0a3d988 4000406a c09df8d4
[ 1.984982] 1fc0:
c0a27a74 c09df940 01000000 ee901fe0 c00101c0 c00101c4 60000113 ffffffff
[ 1.993143] [<
c00134c0>] (__irq_svc) from [<
c00101c4>] (arch_cpu_idle+0x30/0x3c)
[ 2.000522] [<
c00101c4>] (arch_cpu_idle) from [<
c005e804>] (cpu_startup_entry+0x180/0x324)
[ 2.008765] [<
c005e804>] (cpu_startup_entry) from [<
40008764>] (0x40008764)
[ 2.015710] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Signed-off-by: Hyungwon Hwang <human.hwang@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Inki Dae [Fri, 6 Mar 2015 13:40:22 +0000 (22:40 +0900)]
drm/exynos: fix typo config name correctly.
This patch fixes DRM_EXYNOS7DECON to DRM_EXYNOS7_DECON.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Charles Keepax [Tue, 17 Feb 2015 17:14:41 +0000 (17:14 +0000)]
drm/exynos: Check for NULL dereference of crtc
The commit "drm/exynos: remove exynos_plane_dpms" (
d9ea6256) removed the
use of the enabled flag, which means that the code may attempt to call
win_enable on a NULL crtc. This results in the following oops on
Arndale:
[ 1.673479] Unable to handle kernel NULL pointer dereference at virtual address
00000368
[ 1.681500] pgd =
c0004000
[ 1.684154] [
00000368] *pgd=
00000000
[ 1.687713] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 1.693012] Modules linked in:
[ 1.696045] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
3.19.0-07545-g57485fa #1907
[ 1.703524] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
(....)
[ 2.014803] [<
c02f9cfc>] (exynos_plane_destroy) from [<
c02e61b4>] (drm_mode_config_cleanup+0x168/0x20c)
[ 2.024178] [<
c02e61b4>] (drm_mode_config_cleanup) from [<
c02f66fc>] (exynos_drm_load+0xac/0x12c)
This patch adds in a check to ensure exynos_crtc is not NULL before it
is dereferenced.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Dan Carpenter [Fri, 20 Feb 2015 10:54:43 +0000 (13:54 +0300)]
drm/exynos: IS_ERR() vs NULL bug
of_iomap() doesn't return error pointers, it returns NULL on error.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Andrzej Hajda [Wed, 18 Feb 2015 11:17:07 +0000 (12:17 +0100)]
drm/exynos: remove unused files
These files are not used anymore.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Mason [Tue, 17 Mar 2015 20:37:25 +0000 (21:37 +0100)]
ARM: 8313/1: Use read_cpuid_ext() macro instead of inline asm
Replace inline asm statement in __get_cpu_architecture() with equivalent
macro invocation, i.e. read_cpuid_ext(CPUID_EXT_MMFR0);
As an added bonus, this squashes a potential bug, described by Paul
Walmsley in commit
067e710b9a98 ("ARM: 7801/1: prevent gcc 4.5 from
reordering extended CP15 reads above is_smp() test").
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Laura Abbott [Fri, 13 Mar 2015 20:41:45 +0000 (21:41 +0100)]
ARM: 8311/1: Don't use is_module_addr in setting page attributes
The set_memory_* functions currently only support module
addresses. The addresses are validated using is_module_addr.
That function is special though and relies on internal state
in the module subsystem to work properly. At the time of
module initialization and calling set_memory_*, it's too early
for is_module_addr to work properly so it always returns
false. Rather than be subject to the whims of the module state,
just bounds check against the module virtual address range.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fabrice Gasnier [Thu, 12 Mar 2015 13:04:42 +0000 (14:04 +0100)]
ARM: 8310/1: l2c: Fix prefetch settings dt parsing
Allow prefetch settings overriding by device tree, in case
l2x0_cache_size_of_parse() returns value, prefetch tuning
properties are silently ignored. E.g. arm,double-linefill* and
arm,prefetch*.
This happens for example, when "cache-size" or "cache-sets"
properties haven't been filled in l2c dt node.
Comments from Fabrice Gasnier:
Allow device tree to override the L2C prefetch settings, even when
l2x0_cache_size_of_parse() fails to parse the cache geometry due to (eg)
missing "cache-size" or "cache-sets" properties.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Reviewed-by: Tomasz Figa <tomasz.figa@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Hans de Goede [Sun, 8 Mar 2015 21:13:57 +0000 (22:13 +0100)]
pinctrl: sun4i: GPIOs configured as irq must be set to input before reading
On sun4i-a10, when GPIOs are configured as external interrupt the value for
them in the data register does not seem to get updated, so set their mux to
input (and restore afterwards) when reading the pin.
Missed edges seem to be buffered, so this does not introduce a race
condition.
I've also tested this on sun5i-a13 and sun7i-a20 and those do not seem to
be affected, the input value representation in the data register does seem
to correctly get updated to the actual pin value while in irq mode there.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
hujianyang [Thu, 15 Jan 2015 05:20:57 +0000 (13:20 +0800)]
ovl: upper fs should not be R/O
After importing multi-lower layer support, users could mount a r/o
partition as the left most lowerdir instead of using it as upperdir.
And a r/o upperdir may cause an error like
overlayfs: failed to create directory ./workdir/work
during mount.
This patch check the *s_flags* of upper fs and return an error if
it is a r/o partition. The checking of *upper_mnt->mnt_sb->s_flags*
can be removed now.
This patch also remove
/* FIXME: workdir is not needed for a R/O mount */
from ovl_fill_super() because:
1) for upper fs r/o case
Setting a r/o partition as upper is prevented, no need to care about
workdir in this case.
2) for "mount overlay -o ro" with a r/w upper fs case
Users could remount overlayfs to r/w in this case, so workdir should
not be omitted.
Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
hujianyang [Thu, 15 Jan 2015 05:19:21 +0000 (13:19 +0800)]
ovl: check lowerdir amount for non-upper mount
Recently multi-lower layer mount support allow upperdir and workdir
to be omitted, then cause overlayfs can be mount with only one
lowerdir directory. This action make no sense and have potential risk.
This patch check the total number of lower directories to prevent
mounting overlayfs with only one directory.
Also, an error message is added to indicate lower directories exceed
OVL_MAX_STACK limit.
Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
hujianyang [Thu, 15 Jan 2015 05:17:36 +0000 (13:17 +0800)]
ovl: print error message for invalid mount options
Overlayfs should print an error message if an incorrect mount option
is caught like other filesystems.
After this patch, improper option input could be clearly known.
Reported-by: Fabian Sturm <fabian.sturm@aduu.de>
Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Damien Lespiau [Thu, 5 Feb 2015 19:35:13 +0000 (19:35 +0000)]
drm/i915: Make sure the primary plane is enabled before reading out the fb state
We don't want to end up in a state where we track that the pipe has its
primary plane enabled when primary plane registers are programmed with
values that look possible but the plane actually disabled.
Refuse to read out the fb state when the primary plane isn't enabled.
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reported-by: Andrey Skvortsov <andrej.skvortzov@gmail.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reference: http://mid.gmane.org/
20150203191507.GA2374@crion86
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>