John Stultz [Wed, 11 Dec 2013 01:18:18 +0000 (17:18 -0800)]
timekeeping: Avoid possible deadlock from clock_was_set_delayed
As part of normal operaions, the hrtimer subsystem frequently calls
into the timekeeping code, creating a locking order of
hrtimer locks -> timekeeping locks
clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
between the timekeeping the hrtimer subsystem, so that we could
notify the hrtimer subsytem the time had changed while holding
the timekeeping locks. This was done by scheduling delayed work
that would run later once we were out of the timekeeing code.
But unfortunately the lock chains are complex enoguh that in
scheduling delayed work, we end up eventually trying to grab
an hrtimer lock.
Sasha Levin noticed this in testing when the new seqlock lockdep
enablement triggered the following (somewhat abrieviated) message:
[ 251.100221] ======================================================
[ 251.100221] [ INFO: possible circular locking dependency detected ]
[ 251.100221]
3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
[ 251.101967] -------------------------------------------------------
[ 251.101967] kworker/10:1/4506 is trying to acquire lock:
[ 251.101967] (timekeeper_seq){----..}, at: [<
ffffffff81160e96>] retrigger_next_event+0x56/0x70
[ 251.101967]
[ 251.101967] but task is already holding lock:
[ 251.101967] (hrtimer_bases.lock#11){-.-...}, at: [<
ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[ 251.101967]
[ 251.101967] which lock already depends on the new lock.
[ 251.101967]
[ 251.101967]
[ 251.101967] the existing dependency chain (in reverse order) is:
[ 251.101967]
-> #5 (hrtimer_bases.lock#11){-.-...}:
[snipped]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[snipped]
-> #3 (&rq->lock){-.-.-.}:
[snipped]
-> #2 (&p->pi_lock){-.-.-.}:
[snipped]
-> #1 (&(&pool->lock)->rlock){-.-...}:
[ 251.101967] [<
ffffffff81194803>] validate_chain+0x6c3/0x7b0
[ 251.101967] [<
ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
[ 251.101967] [<
ffffffff81194ff2>] lock_acquire+0x182/0x1d0
[ 251.101967] [<
ffffffff84398500>] _raw_spin_lock+0x40/0x80
[ 251.101967] [<
ffffffff81153e69>] __queue_work+0x1a9/0x3f0
[ 251.101967] [<
ffffffff81154168>] queue_work_on+0x98/0x120
[ 251.101967] [<
ffffffff81161351>] clock_was_set_delayed+0x21/0x30
[ 251.101967] [<
ffffffff811c4bd1>] do_adjtimex+0x111/0x160
[ 251.101967] [<
ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
[ 251.101967] [<
ffffffff843a4b49>] ia32_sysret+0x0/0x5
[ 251.101967]
-> #0 (timekeeper_seq){----..}:
[snipped]
[ 251.101967] other info that might help us debug this:
[ 251.101967]
[ 251.101967] Chain exists of:
timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11
[ 251.101967] Possible unsafe locking scenario:
[ 251.101967]
[ 251.101967] CPU0 CPU1
[ 251.101967] ---- ----
[ 251.101967] lock(hrtimer_bases.lock#11);
[ 251.101967] lock(&rt_b->rt_runtime_lock);
[ 251.101967] lock(hrtimer_bases.lock#11);
[ 251.101967] lock(timekeeper_seq);
[ 251.101967]
[ 251.101967] *** DEADLOCK ***
[ 251.101967]
[ 251.101967] 3 locks held by kworker/10:1/4506:
[ 251.101967] #0: (events){.+.+.+}, at: [<
ffffffff81154960>] process_one_work+0x200/0x530
[ 251.101967] #1: (hrtimer_work){+.+...}, at: [<
ffffffff81154960>] process_one_work+0x200/0x530
[ 251.101967] #2: (hrtimer_bases.lock#11){-.-...}, at: [<
ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[ 251.101967]
[ 251.101967] stack backtrace:
[ 251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted
3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
[ 251.101967] Workqueue: events clock_was_set_work
So the best solution is to avoid calling clock_was_set_delayed() while
holding the timekeeping lock, and instead using a flag variable to
decide if we should call clock_was_set() once we've released the locks.
This works for the case here, where the do_adjtimex() was the deadlock
trigger point. Unfortuantely, in update_wall_time() we still hold
the jiffies lock, which would deadlock with the ipi triggered by
clock_was_set(), preventing us from calling it even after we drop the
timekeeping lock. So instead call clock_was_set_delayed() at that point.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: stable <stable@vger.kernel.org> #3.10+
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
John Stultz [Thu, 12 Dec 2013 04:07:49 +0000 (20:07 -0800)]
timekeeping: Fix potential lost pv notification of time change
In
780427f0e11 (Indicate that clock was set in the pvclock
gtod notifier), logic was added to pass a CLOCK_WAS_SET
notification to the pvclock notifier chain.
While that patch added a action flag returned from
accumulate_nsecs_to_secs(), it only uses the returned value
in one location, and not in the logarithmic accumulation.
This means if a leap second triggered during the logarithmic
accumulation (which is most likely where it would happen),
the notification that the clock was set would not make it to
the pv notifiers.
This patch extends the logarithmic_accumulation pass down
that action flag so proper notification will occur.
This patch also changes the varialbe action -> clock_set
per Ingo's suggestion.
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: <xen-devel@lists.xen.org>
Cc: stable <stable@vger.kernel.org> #3.11+
Signed-off-by: John Stultz <john.stultz@linaro.org>
John Stultz [Thu, 12 Dec 2013 02:50:25 +0000 (18:50 -0800)]
timekeeping: Fix lost updates to tai adjustment
Since
48cdc135d4840 (Implement a shadow timekeeper), we have to
call timekeeping_update() after any adjustment to the timekeeping
structure in order to make sure that any adjustments to the structure
persist.
Unfortunately, the updates to the tai offset via adjtimex do not
trigger this update, causing adjustments to the tai offset to be
made and then over-written by the previous value at the next
update_wall_time() call.
This patch resovles the issue by calling timekeeping_update()
right after setting the tai offset.
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable <stable@vger.kernel.org> #3.10+
Signed-off-by: John Stultz <john.stultz@linaro.org>
Ingo Molnar [Tue, 10 Dec 2013 11:32:36 +0000 (12:32 +0100)]
Merge branch 'timers/posix-timers-for-tip-v2' of git://git./linux/kernel/git/frederic/linux-dynticks into timers/core
Pull posix cpu timer changes for v3.14 from Frederic Weisbecker:
* Remove dying thread/process timers caching that was complicating the code
for no significant win.
* Remove early task reference release on dying timer sample read. Again it was
not worth the code complication. The other timer's resources aren't released
until timer_delete() is called anyway (or when the whole process dies).
* Remove leftover arguments in reaped target cleanup
* Consolidate some timer sampling code
* Remove use of tasklist lock
* Robustify sighand locking against exec and exit by using the safer
lock_task_sighand() API instead of sighand raw locking.
* Convert some unnecessary BUG_ON() to WARN_ON()
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Frederic Weisbecker [Fri, 11 Oct 2013 15:58:08 +0000 (17:58 +0200)]
posix-timers: Convert abuses of BUG_ON to WARN_ON
The posix cpu timers code makes a heavy use of BUG_ON()
but none of these concern fatal issues that require
to stop the machine. So let's just warn the user when
some internal state slips out of our hands.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Frederic Weisbecker [Fri, 11 Oct 2013 16:56:49 +0000 (18:56 +0200)]
posix-timers: Remove remaining uses of tasklist_lock
The remaining uses of tasklist_lock were mostly about synchronizing
against sighand modifications, getting coherent and safe group samples
and also thread/process wide timers list handling.
All of this is already safely synchronizable with the target's
sighand lock. Let's use it on these places instead.
Also update the comments about locking.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Frederic Weisbecker [Fri, 11 Oct 2013 15:41:11 +0000 (17:41 +0200)]
posix-timers: Use sighand lock instead of tasklist_lock on timer deletion
Timer deletion doesn't need the tasklist lock.
We need to protect against:
* concurrent access to the lists p->cputime_expires and
p->sighand->cputime_expires
* task reaping that may also delete the timer list entry
* timer firing
We already hold the timer lock which protects us against concurrent
timer firing.
The rest only need the targets sighand to be locked.
So hold it and drop the use of tasklist_lock there.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Frederic Weisbecker [Fri, 11 Oct 2013 15:41:11 +0000 (17:41 +0200)]
posix-timers: Use sighand lock instead of tasklist_lock for task clock sample
There is no need for the tasklist_lock just to take a process
wide clock sample.
All we need is to get a coherent sample that doesn't race with
exit() and exec():
* exit() may be concurrently reaping a task and flushing its time
* sighand is unstable under exit() and exec(), and the latter also
result in group leader that can change
To protect against these, locking the target's sighand is enough.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Frederic Weisbecker [Fri, 11 Oct 2013 15:41:11 +0000 (17:41 +0200)]
posix-timers: Consolidate posix_cpu_clock_get()
Consolidate the clock sampling common code used for both local
and remote targets.
Note that this introduces a tiny user ABI change: if a
PID is passed to clock_gettime() along the clockid,
we used to forbid a process wide clock sample when that
PID doesn't belong to a group leader. Now after this patch
we allow process wide clock samples if that PID belongs to
the current task, even if the current task is not the
group leader.
But local process wide clock samples are allowed if PID == 0
(current task) even if the current task is not the group leader.
So in the end this should be no big deal as this actually harmonize
the behaviour when the remote sample is actually a local one.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Frederic Weisbecker [Fri, 11 Oct 2013 14:11:43 +0000 (16:11 +0200)]
posix-timers: Remove useless clock sample on timers cleanup
a0b2062b0904ef07944c4a6e4d0f88ee44f1e9f2
("posix_timers: fix racy timer delta caching on task exit") forgot
to remove the arguments used for timer caching.
Fix this leftover.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Frederic Weisbecker [Thu, 10 Oct 2013 22:37:39 +0000 (00:37 +0200)]
posix-timers: Remove dead task special case
Now that we've removed all the optimizations that could
result in NULL timer's targets, we can remove all the
associated special case handling.
Also add some warnings on NULL targets to spot any possible
leftover.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Frederic Weisbecker [Thu, 10 Oct 2013 22:27:19 +0000 (00:27 +0200)]
posix-timers: Cleanup reaped target handling
When a timer's target is seen to be buried, for example on calls
to timer_gettime(), the posix cpu timers code behaves a bit
like a garbage collector and releases early the reference to the
task.
Then again, this optimization complicates the code for no much
value: it's up to the user to release the timer and its associated
ressources by calling timer_delete() after it buries the target
tasks.
Remove this to simplify the code.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Frederic Weisbecker [Thu, 10 Oct 2013 14:55:57 +0000 (16:55 +0200)]
posix-timers: Remove dead process posix cpu timers caching
Now that we removed dead thread posix cpu timers caching,
lets remove the dead process wide version. This caching
is similar to the per thread version but it should be even
more rare:
* If the process id dead, we are not reading its timers
status from a thread belonging to its group since they
are all dead. So this caching only concern remote process
timers reads. Now posix cpu timers using itimers or timer_settime()
can't do remote process timers anyway so it's not even clear if there
is actually a user for this caching.
* Unlike per thread timers caching, this only applies to
zombies targets. Buried targets' process wide timers return
0 values. But then again, timer_gettime() can't read remote
process timers, so if the process is dead, there can't be
any reader left anyway.
Then again this caching seem to complicate the code for
corner cases that are probably not worth it. So lets get
rid of it.
Also remove the sample snapshot on dying process timer
that is now useless, as suggested by Kosaki.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Frederic Weisbecker [Thu, 10 Oct 2013 14:55:57 +0000 (16:55 +0200)]
posix-timers: Remove dead thread posix cpu timers caching
When a task is exiting or has exited, its posix cpu timers
don't tick anymore and won't elapse further. It's too late
for them to expire.
So any further call to timer_gettime() on these timers will
return the same remaining expiry time.
The current code optimize this by caching the remaining delta
and storing it where we use to save the absolute expiration time.
This way, the future calls to timer_gettime() won't need to
compute the difference between the absolute expiration time and
the current time anymore.
Now this optimization doesn't seem to bring much value. Computing
the timer remaining delta is not very costly. Fetching the timer
value OTOH can be costly in two ways:
* CPUCLOCK_SCHED read requires to lock the target's rq. But some
optimizations are on the way to make task_sched_runtime() not holding
the rq lock of a non-running target.
* CPUCLOCK_VIRT/CPUCLOCK_PROF read simply consist in fetching
current->utime/current->stime except when the system uses full
dynticks cputime accounting. The latter requires a per task lock
in order to correctly compute user and system time. But once the
target is dead, this lock shouldn't be contended anyway.
All in one this caching doesn't seem to be justified.
Given that it complicates the code significantly for
few wins, let's remove it on single thread timers.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Ingo Molnar [Wed, 4 Dec 2013 09:09:53 +0000 (10:09 +0100)]
Merge branch 'timers/core-v2' of git://git./linux/kernel/git/frederic/linux-dynticks into timers/core
Pull dynticks updates from Frederic Weisbecker:
* Fix a bug where posix cpu timers requeued due to interval got ignored on full
dynticks CPUs (not a regression though as it only impacts full dynticks and the
bug is there since we merged full dynticks).
* Optimizations and cleanups on the use of per CPU APIs to improve code readability,
performance and debuggability in the nohz subsystem;
* Optimize posix cpu timer by sparing stub workqueue queue with full dynticks off case
* Rename some functions to extend with *_this_cpu() suffix for clarity
* Refine the naming of some context tracking subsystem state accessors
* Trivial spelling fix by Paul Gortmaker
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Mon, 2 Dec 2013 20:08:01 +0000 (12:08 -0800)]
Merge branch 'leds-fixes-for-3.13' of git://git./linux/kernel/git/cooloney/linux-leds
Pull LED subsystem bugfix from Bryan Wu.
* 'leds-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: pwm: Fix for deferred probe in DT booted mode
Peter Ujfalusi [Thu, 28 Nov 2013 09:06:38 +0000 (01:06 -0800)]
leds: pwm: Fix for deferred probe in DT booted mode
We need to make sure that the error code from devm_of_pwm_get() is the one
the module returns in case of failure.
Restructure the code to make this possible for DT booted case.
With this patch the driver can ask for deferred probing when the board is
booted with DT.
Fixes for example omap4-sdp board's keyboard backlight led.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
Linus Torvalds [Mon, 2 Dec 2013 19:50:37 +0000 (11:50 -0800)]
uio: we cannot mmap unaligned page contents
In commit
7314e613d5ff ("Fix a few incorrectly checked
[io_]remap_pfn_range() calls") the uio driver started more properly
checking the passed-in user mapping arguments against the size of the
actual uio driver data.
That in turn exposed that some driver authors apparently didn't realize
that mmap can only work on a page granularity, and had tried to use it
with smaller mappings, with the new size check catching that out.
So since it's not just the user mmap() arguments that can be confused,
make the uio mmap code also verify that the uio driver has the memory
allocated at page boundaries in order for mmap to work. If the device
memory isn't properly aligned, we return
[ENODEV]
The fildes argument refers to a file whose type is not supported by mmap().
as per the open group documentation on mmap.
Reported-by: Holger Brunck <holger.brunck@keymile.com>
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frederic Weisbecker [Wed, 6 Nov 2013 16:18:30 +0000 (17:18 +0100)]
posix-timers: Fix full dynticks CPUs kick on timer rescheduling
A posix CPU timer can be rearmed while it is firing or after it is
notified with a signal. This can happen for example with timers that
were set with a non zero interval in timer_settime().
This rearming can happen in two places:
1) On timer firing time, which happens on the target's tick. If the timer
can't trigger a signal because it is ignored, it reschedules itself
to honour the timer interval.
2) On signal handling from the timer's notification target. This one
can be a different task than the timer's target itself. Once the
signal is notified, the notification target rearms the timer, again
to honour the timer interval.
When a timer is rearmed, we need to notify the full dynticks CPUs
such that they restart their tick in case they are running tasks that
may have a share in elapsing this timer.
Now the 1st case above handles full dynticks CPUs with a call to
posix_cpu_timer_kick_nohz() from the posix cpu timer firing code. But
the second case ignores the fact that some CPUs may run non-idle tasks
with their tick off. As a result, when a timer is resheduled after its signal
notification, the full dynticks CPUs may completely ignore it and not
tick on the timer as expected
This patch fixes this bug by handling both cases in one. All we need
is to move the kick to the rearming common code in posix_cpu_timer_schedule().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Olivier Langlois <olivier@olivierlanglois.net>
Frederic Weisbecker [Wed, 6 Nov 2013 14:42:04 +0000 (15:42 +0100)]
posix-timers: Spare workqueue if there is no full dynticks CPU to kick
After a posix cpu timer is set, a workqueue is scheduled in order to
kick the full dynticks CPUs and let them restart their tick if
necessary in case the task they are running is concerned by the
new timer.
This kick is implemented by way of IPIs, which require interrupts
to be enabled, hence the need for a workqueue to raise them because
the posix cpu timer set path has interrupts disabled.
Now if there is no full dynticks CPU on the system, the workqueue is
still scheduled but it simply won't send any IPI and return immediately.
So lets spare that worqueue when it is not needed.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Frederic Weisbecker [Wed, 6 Nov 2013 14:11:57 +0000 (15:11 +0100)]
context_tracking: Rename context_tracking_active() to context_tracking_cpu_is_enabled()
We currently have a confusing couple of API naming with the existing
context_tracking_active() and context_tracking_is_enabled().
Lets keep the latter one, context_tracking_is_enabled(), for global
context tracking state check and use context_tracking_cpu_is_enabled()
for local state check.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Frederic Weisbecker [Wed, 6 Nov 2013 13:45:57 +0000 (14:45 +0100)]
context_tracking: Wrap static key check into more intuitive function name
Use a function with a meaningful name to check the global context
tracking state. static_key_false() is a bit confusing for reviewers.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Paul Gortmaker [Thu, 24 Oct 2013 14:07:47 +0000 (10:07 -0400)]
trivial: fix spelling in CONTEXT_TRACKING_FORCE help text
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Frederic Weisbecker [Wed, 7 Aug 2013 20:28:01 +0000 (22:28 +0200)]
nohz: Convert a few places to use local per cpu accesses
A few functions use remote per CPU access APIs when they
deal with local values.
Just do the right conversion to improve performance, code
readability and debug checks.
While at it, lets extend some of these function names with *_this_cpu()
suffix in order to display their purpose more clearly.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Linus Torvalds [Mon, 2 Dec 2013 18:15:39 +0000 (10:15 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
- Correction of fuzzy and fragile IRQ_RETVAL macro
- IRQ related resume fix affecting only XEN
- ARM/GIC fix for chained GIC controllers
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Gic: fix boot for chained gics
irq: Enable all irqs unconditionally in irq_resume
genirq: Correct fuzzy and fragile IRQ_RETVAL() definition
Linus Torvalds [Mon, 2 Dec 2013 18:13:44 +0000 (10:13 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Various smaller fixlets, all over the place"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/doc: Fix generation of device-drivers
sched: Expose preempt_schedule_irq()
sched: Fix a trivial typo in comments
sched: Remove unused variable in 'struct sched_domain'
sched: Avoid NULL dereference on sd_busy
sched: Check sched_domain before computing group power
MAINTAINERS: Update file patterns in the lockdep and scheduler entries
Linus Torvalds [Mon, 2 Dec 2013 18:13:09 +0000 (10:13 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc kernel and tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools lib traceevent: Fix conversion of pointer to integer of different size
perf/trace: Properly use u64 to hold event_id
perf: Remove fragile swevent hlist optimization
ftrace, perf: Avoid infinite event generation loop
tools lib traceevent: Fix use of multiple options in processing field
perf header: Fix possible memory leaks in process_group_desc()
perf header: Fix bogus group name
perf tools: Tag thread comm as overriden
Linus Torvalds [Mon, 2 Dec 2013 18:12:01 +0000 (10:12 -0800)]
Merge tag 'stable/for-linus-3.13-rc2-tag' of git://git./linux/kernel/git/xen/tip
Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
"Fixes to patches that went in this merge window along with a latent
bug:
- Fix lazy flushing in case m2p override fails.
- Fix module compile issues with ARM/Xen
- Add missing call to DMA map page for Xen SWIOTLB for ARM"
* tag 'stable/for-linus-3.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/gnttab: leave lazy MMU mode in the case of a m2p override failure
xen/arm: p2m_init and p2m_lock should be static
arm/xen: Export phys_to_mach to fix Xen module link errors
swiotlb-xen: add missing xen_dma_map_page call
Linus Torvalds [Mon, 2 Dec 2013 18:10:55 +0000 (10:10 -0800)]
Merge tag 'spi-v3.13-rc2' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A smattering of driver specific fixes here, including a bunch for a
long standing common pattern in the error handling paths, and a fix
for an embarrassing thinko in the new devm master registration code"
* tag 'spi-v3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi/pxa2xx: Restore private register bits.
spi/qspi: Fix qspi remove path.
spi/qspi: cleanup pm_runtime error check.
spi/qspi: set correct platform drvdata in ti_qspi_probe()
spi/pxa2xx: add new ACPI IDs
spi: core: invert success test in devm_spi_register_master
spi: spi-mxs: fix reference leak to master in mxs_spi_remove()
spi: bcm63xx: fix reference leak to master in bcm63xx_spi_remove()
spi: txx9: fix reference leak to master in txx9spi_remove()
spi: mpc512x: fix reference leak to master in mpc512x_psc_spi_do_remove()
spi: rspi: use platform drvdata correctly in rspi_remove()
spi: bcm2835: fix reference leak to master in bcm2835_spi_remove()
Linus Torvalds [Mon, 2 Dec 2013 18:09:07 +0000 (10:09 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking updates from David Miller:
"Here is a pile of bug fixes that accumulated while I was in Europe"
1) In fixing kernel leaks to userspace during copying of socket
addresses, we broke a case that used to work, namely the user
providing a buffer larger than the in-kernel generic socket address
structure. This broke Ruby amongst other things. Fix from Dan
Carpenter.
2) Fix regression added by byte queue limit support in 8139cp driver,
from Yang Yingliang.
3) The addition of MSG_SENDPAGE_NOTLAST buggered up a few sendpage
implementations, they should just treat it the same as MSG_MORE.
Fix from Richard Weinberger and Shawn Landden.
4) Handle icmpv4 errors received on ipv6 SIT tunnels correctly, from
Oussama Ghorbel. In particular we should send an ICMPv6 unreachable
in such situations.
5) Fix some regressions in the recent genetlink fixes, in particular
get the pmcraid driver to use the new safer interfaces correctly.
From Johannes Berg.
6) macvtap was converted to use a per-cpu set of statistics, but some
code was still bumping tx_dropped elsewhere. From Jason Wang.
7) Fix build failure of xen-netback due to missing include on some
architectures, from Andy Whitecroft.
8) macvtap double counts received packets in statistics, fix from Vlad
Yasevich.
9) Fix various cases of using *_STATS_BH() when *_STATS() is more
appropriate. From Eric Dumazet and Hannes Frederic Sowa.
10) Pktgen ipsec mode doesn't update the ipv4 header length and checksum
properly after encapsulation. Fix from Fan Du.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
net/mlx4_en: Remove selftest TX queues empty condition
{pktgen, xfrm} Update IPv4 header total len and checksum after tranformation
virtio_net: make all RX paths handle erors consistently
virtio_net: fix error handling for mergeable buffers
virtio_net: Fixed a trivial typo (fitler --> filter)
netem: fix gemodel loss generator
netem: fix loss 4 state model
netem: missing break in ge loss generator
net/hsr: Support iproute print_opt ('ip -details ...')
net/hsr: Very small fix of comment style.
MAINTAINERS: Added net/hsr/ maintainer
ipv6: fix possible seqlock deadlock in ip6_finish_output2
ixgbe: Make ixgbe_identify_qsfp_module_generic static
ixgbe: turn NETIF_F_HW_L2FW_DOFFLOAD off by default
ixgbe: ixgbe_fwd_ring_down needs to be static
e1000: fix possible reset_task running after adapter down
e1000: fix lockdep warning in e1000_reset_task
e1000: prevent oops when adapter is being closed and reset simultaneously
igb: Fixed Wake On LAN support
inet: fix possible seqlock deadlocks
...
Linus Torvalds [Mon, 2 Dec 2013 17:44:51 +0000 (09:44 -0800)]
vfs: fix subtle use-after-free of pipe_inode_info
The pipe code was trying (and failing) to be very careful about freeing
the pipe info only after the last access, with a pattern like:
spin_lock(&inode->i_lock);
if (!--pipe->files) {
inode->i_pipe = NULL;
kill = 1;
}
spin_unlock(&inode->i_lock);
__pipe_unlock(pipe);
if (kill)
free_pipe_info(pipe);
where the final freeing is done last.
HOWEVER. The above is actually broken, because while the freeing is
done at the end, if we have two racing processes releasing the pipe
inode info, the one that *doesn't* free it will decrement the ->files
count, and unlock the inode i_lock, but then still use the
"pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".
This is *very* hard to trigger in practice, since the race window is
very small, and adding debug options seems to just hide it by slowing
things down.
Simon originally reported this way back in July as an Oops in
kmem_cache_allocate due to a single bit corruption (due to the final
"spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
allocation that had re-used the free'd pipe-info), it's taken this long
to figure out.
Since the 'pipe->files' accesses aren't even protected by the pipe lock
(we very much use the inode lock for that), the simple solution is to
just drop the pipe lock early. And since there were two users of this
pattern, create a helper function for it.
Introduced commit
ba5bb147330a ("pipe: take allocation and freeing of
pipe_inode_info out of ->i_mutex").
Reported-by: Simon Kirby <sim@hostway.ca>
Reported-by: Ian Applegate <ia@cloudflare.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org # v3.10+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eugenia Emantayev [Sun, 1 Dec 2013 12:19:34 +0000 (14:19 +0200)]
net/mlx4_en: Remove selftest TX queues empty condition
Remove waiting for TX queues to become empty during selftest.
This check is not necessary for any purpose, and might put
the driver into an infinite loop.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fan.du [Sun, 1 Dec 2013 08:28:48 +0000 (16:28 +0800)]
{pktgen, xfrm} Update IPv4 header total len and checksum after tranformation
commit
a553e4a6317b2cfc7659542c10fe43184ffe53da ("[PKTGEN]: IPSEC support")
tried to support IPsec ESP transport transformation for pktgen, but acctually
this doesn't work at all for two reasons(The orignal transformed packet has
bad IPv4 checksum value, as well as wrong auth value, reported by wireshark)
- After transpormation, IPv4 header total length needs update,
because encrypted payload's length is NOT same as that of plain text.
- After transformation, IPv4 checksum needs re-caculate because of payload
has been changed.
With this patch, armmed pktgen with below cofiguration, Wireshark is able to
decrypted ESP packet generated by pktgen without any IPv4 checksum error or
auth value error.
pgset "flag IPSEC"
pgset "flows 1"
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Thu, 28 Nov 2013 11:30:59 +0000 (13:30 +0200)]
virtio_net: make all RX paths handle erors consistently
receive mergeable now handles errors internally.
Do same for big and small packet paths, otherwise
the logic is too hard to follow.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Thu, 28 Nov 2013 11:30:55 +0000 (13:30 +0200)]
virtio_net: fix error handling for mergeable buffers
Eric Dumazet noticed that if we encounter an error
when processing a mergeable buffer, we don't
dequeue all of the buffers from this packet,
the result is almost sure to be loss of networking.
Jason Wang noticed that we also leak a page and that we don't decrement
the rq buf count, so we won't repost buffers (a resource leak).
Fix both issues.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael Dalton <mwdalton@google.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 1 Dec 2013 23:33:53 +0000 (15:33 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger:
"Fixes two regressions which got introduced this merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Build always with -mcmodel=large on 64bit
um: Rename print_stack_trace to do_stack_trace
Linus Torvalds [Sun, 1 Dec 2013 23:32:19 +0000 (15:32 -0800)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Some ARM fixes, the biggest of which is the fix for the signal return
codes; this came up due to an interaction between the V7M nommu
changes and the BE8 changes. Dave Martin spotted that the kexec
trampoline wasn't being correctly copied (in a way which allows
Thumb-2 to work).
I've also fixed a number of breakages on footbridge platforms as I've
upgraded one of my machines to v3.12... one which had a 1200 day
uptime"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation
ARM: 7897/1: kexec: Use the right ISA for relocate_new_kernel
ARM: 7895/1: signal: fix armv7-m build issue in sigreturn_codes.S
ARM: footbridge: fix EBSA285 LEDs
ARM: footbridge: fix VGA initialisation
ARM: fix booting low-vectors machines
ARM: dma-mapping: check DMA mask against available memory
Richard Weinberger [Fri, 29 Nov 2013 14:39:41 +0000 (15:39 +0100)]
um: Build always with -mcmodel=large on 64bit
On UML SUBARCH can be x86, x86_64 and i386 and if it is x86
we use uname -m to select a defconfig.
Therefore we can no longer use -mcmodel=large only if SUBARCH
is x86_64.
Reported-and-tested-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Thu, 21 Nov 2013 08:27:37 +0000 (09:27 +0100)]
um: Rename print_stack_trace to do_stack_trace
We cannot use print_stack_trace because the name conflicts
with linux/stacktrace.h.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Fabio Estevam [Sat, 30 Nov 2013 14:24:42 +0000 (15:24 +0100)]
ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation
Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=
3317760)
Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)
The original object code looks like this:
00000010 <__loop_const_udelay>:
10:
e3e01000 mvn r1, #0
14:
e51f201c ldr r2, [pc, #-28] ; 0 <__loop_udelay-0x8>
18:
e5922000 ldr r2, [r2]
1c:
e0800921 add r0, r0, r1, lsr #18
20:
e1a00720 lsr r0, r0, #14
24:
e0822b21 add r2, r2, r1, lsr #22
28:
e1a02522 lsr r2, r2, #10
2c:
e0000092 mul r0, r2, r0
30:
e0800d21 add r0, r0, r1, lsr #26
34:
e1b00320 lsrs r0, r0, #6
38:
01a0f00e moveq pc, lr
0000003c <__loop_delay>:
3c:
e2500001 subs r0, r0, #1
40:
8afffffe bhi 3c <__loop_delay>
44:
e1a0f00e mov pc, lr
After adding the 'align 3' directive to __loop_delay (align to 8 bytes):
00000010 <__loop_const_udelay>:
10:
e3e01000 mvn r1, #0
14:
e51f201c ldr r2, [pc, #-28] ; 0 <__loop_udelay-0x8>
18:
e5922000 ldr r2, [r2]
1c:
e0800921 add r0, r0, r1, lsr #18
20:
e1a00720 lsr r0, r0, #14
24:
e0822b21 add r2, r2, r1, lsr #22
28:
e1a02522 lsr r2, r2, #10
2c:
e0000092 mul r0, r2, r0
30:
e0800d21 add r0, r0, r1, lsr #26
34:
e1b00320 lsrs r0, r0, #6
38:
01a0f00e moveq pc, lr
3c:
e320f000 nop {0}
00000040 <__loop_delay>:
40:
e2500001 subs r0, r0, #1
44:
8afffffe bhi 40 <__loop_delay>
48:
e1a0f00e mov pc, lr
4c:
e320f000 nop {0}
, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=
4980736)
Some more test results:
On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=
1757184)
On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=
2643968)
Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
BogoMIPS value before and after this patch.
Reported-by: Tom Evans <tom_usenet@optusnet.com.au>
Suggested-by: Tom Evans <tom_usenet@optusnet.com.au>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Dave Martin [Mon, 25 Nov 2013 13:54:47 +0000 (14:54 +0100)]
ARM: 7897/1: kexec: Use the right ISA for relocate_new_kernel
Copying a function with memcpy() and then trying to execute the
result isn't trivially portable to Thumb.
This patch modifies the kexec soft restart code to copy its
assembler trampoline relocate_new_kernel() using fncpy() instead,
so that relocate_new_kernel can be in the same ISA as the rest of
the kernel without problems.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reported-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Tested-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Victor Kamensky [Thu, 21 Nov 2013 06:17:03 +0000 (07:17 +0100)]
ARM: 7895/1: signal: fix armv7-m build issue in sigreturn_codes.S
After "ARM: signal: sigreturn_codes should be endian neutral to
work in BE8" commit, thumb only platforms, like armv7m, fails to
compile sigreturn_codes.S. The reason is that for such arch
values '.arm' directive and arm opcodes are not allowed.
Fix conditionally enables arm opcodes only if no CONFIG_CPU_THUMBONLY
defined and it uses .org instructions to keep sigreturn_codes
layout.
Suggested-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Fri, 29 Nov 2013 00:54:38 +0000 (00:54 +0000)]
ARM: footbridge: fix EBSA285 LEDs
- The LEDs register is write-only: it can't be read-modify-written.
- The LEDs are write-1-for-off not 0.
- The check for the platform was inverted.
Fixes:
cf6856d693dd ("ARM: mach-footbridge: retire custom LED code")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: stable@vger.kernel.org
Thomas Huth [Fri, 29 Nov 2013 09:02:19 +0000 (10:02 +0100)]
virtio_net: Fixed a trivial typo (fitler --> filter)
"MAC filter" sounds more reasonable than "MAC fitler".
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 29 Nov 2013 19:04:26 +0000 (11:04 -0800)]
netem: fix gemodel loss generator
Patch from developers of the alternative loss models, downloaded from:
http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG
"in case 2, of the switch we change the direction of the inequality to
net_random()>clg->a3, because clg->a3 is h in the GE model and when h
is 0 all packets will be lost."
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 29 Nov 2013 19:03:35 +0000 (11:03 -0800)]
netem: fix loss 4 state model
Patch from developers of the alternative loss models, downloaded from:
http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG
"In the case 1 of the switch statement in the if conditions we
need to add clg->a4 to clg->a1, according to the model."
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 29 Nov 2013 19:02:43 +0000 (11:02 -0800)]
netem: missing break in ge loss generator
There is a missing break statement in the Gilbert Elliot loss model
generator which makes state machine behave incorrectly.
Reported-by: Martin Burri <martin.burri@ch.abb.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arvid Brodin [Fri, 29 Nov 2013 22:38:16 +0000 (23:38 +0100)]
net/hsr: Support iproute print_opt ('ip -details ...')
This implements the rtnl_link_ops fill_info routine for HSR.
Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arvid Brodin [Fri, 29 Nov 2013 22:37:07 +0000 (23:37 +0100)]
net/hsr: Very small fix of comment style.
Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arvid Brodin [Fri, 29 Nov 2013 22:36:00 +0000 (23:36 +0100)]
MAINTAINERS: Added net/hsr/ maintainer
Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Fri, 29 Nov 2013 05:39:44 +0000 (06:39 +0100)]
ipv6: fix possible seqlock deadlock in ip6_finish_output2
IPv6 stats are 64 bits and thus are protected with a seqlock. By not
disabling bottom-half we could deadlock here if we don't disable bh and
a softirq reentrantly updates the same mib.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 30 Nov 2013 17:42:20 +0000 (12:42 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to igb, e1000 and ixgbe.
Akeem provides a igb fix where WOL was being reported as supported on
some ethernet devices which did not have that capability.
Yanjun provides a fix for e1000 which is similar to a previous fix
for e1000e commit
bb9e44d0d0f4 ("e1000e: prevent oops when adapter is
being closed and reset simultaneously"), where the same issue was
observed on the older e1000 cards.
Vladimir Davydov provides 2 e1000 fixes. The first fixes a lockdep
warning e1000_down() tries to synchronously cancel e1000 auxiliary
works (reset_task, watchdog_task, phy_info_task and fifo_stall_task)
which take adapter->mutex in their handlers. The second patch is to
fix a possible race condition where reset_task() would be running
after adapter down.
John provides 2 fixes for ixgbe. First turns ixgbe_fwd_ring_down
to static and the second disables NETIF_F_HW_L2FW_DOFFLOAD by default
because it allows upper layer net devices to use queues in the hardware
to directly submit and receive skbs.
Mark Rustad provides a single patch for ixgbe to make
ixgbe_identify_qsfp_module_generic static to resolve compile
warnings.
v2: Drop igb patch "igb: Update queue reinit function to call dev_close
when init of queues fails" from Carolyn, so that the solution can
be re-worked based on feedback from David Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 28 Nov 2013 21:55:41 +0000 (21:55 +0000)]
ARM: footbridge: fix VGA initialisation
It's no good setting vga_base after the VGA console has been
initialised, because if we do that we get this:
Unable to handle kernel paging request at virtual address
000b8000
pgd =
c0004000
[
000b8000] *pgd=
07ffc831, *pte=
00000000, *ppte=
00000000
0Internal error: Oops: 5017 [#1] ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0+ #49
task:
c03e2974 ti:
c03d8000 task.ti:
c03d8000
PC is at vgacon_startup+0x258/0x39c
LR is at request_resource+0x10/0x1c
pc : [<
c01725d0>] lr : [<
c0022b50>] psr:
60000053
sp :
c03d9f68 ip :
000b8000 fp :
c03d9f8c
r10:
000055aa r9 :
4401a103 r8 :
ffffaa55
r7 :
c03e357c r6 :
c051b460 r5 :
000000ff r4 :
000c0000
r3 :
000b8000 r2 :
c03e0514 r1 :
00000000 r0 :
c0304971
Flags: nZCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment kernel
which is an access to the 0xb8000 without the PCI offset required to
make it work.
Fixes:
cc22b4c18540 ("ARM: set vga memory base at run-time")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: <stable@vger.kernel.org>
Russell King [Thu, 28 Nov 2013 21:43:40 +0000 (21:43 +0000)]
ARM: fix booting low-vectors machines
Commit
f6f91b0d9fd9 (ARM: allow kuser helpers to be removed from the
vector page) required two pages for the vectors code. Although the
code setting up the initial page tables was updated, the code which
allocates page tables for new processes wasn't, neither was the code
which tears down the mappings. Fix this.
Fixes:
f6f91b0d9fd9 ("ARM: allow kuser helpers to be removed from the vector page")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: <stable@vger.kernel.org>
Russell King [Mon, 25 Nov 2013 21:52:25 +0000 (21:52 +0000)]
ARM: dma-mapping: check DMA mask against available memory
Some buses have negative offsets, which causes the DMA mask checks to
falsely fail. Fix this by using the actual amount of memory fitted in
the system.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Mark Rustad [Sat, 23 Nov 2013 03:19:19 +0000 (03:19 +0000)]
ixgbe: Make ixgbe_identify_qsfp_module_generic static
Correct a namespace complaint by making the function static
and moving the prototype into the .c file.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
John Fastabend [Tue, 12 Nov 2013 12:13:29 +0000 (12:13 +0000)]
ixgbe: turn NETIF_F_HW_L2FW_DOFFLOAD off by default
NETIF_F_HW_L2FW_DOFFLOAD allows upper layer net devices such
as macvlan to use queues in the hardware to directly submit and
receive skbs.
This creates a subtle change in the datapath though. One change
being the skb may no longer use the root devices qdisc.
Because users may not expect this we can't enable the feature
by default unless the hardware can offload all the software
functionality above it. So for now disable it by default and
let users opt in.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
John Fastabend [Sat, 9 Nov 2013 07:11:26 +0000 (07:11 +0000)]
ixgbe: ixgbe_fwd_ring_down needs to be static
When compiling with -Wstrict-prototypes gcc catches a static
I missed.
./ixgbe_main.c:4254: warning: no previous prototype for 'ixgbe_fwd_ring_down'
Reported-by: Phillip Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Vladimir Davydov [Sat, 23 Nov 2013 07:18:01 +0000 (07:18 +0000)]
e1000: fix possible reset_task running after adapter down
On e1000_down(), we should ensure every asynchronous work is canceled
before proceeding. Since the watchdog_task can schedule other works
apart from itself, it should be stopped first, but currently it is
stopped after the reset_task. This can result in the following race
leading to the reset_task running after the module unload:
e1000_down_and_stop(): e1000_watchdog():
---------------------- -----------------
cancel_work_sync(reset_task)
schedule_work(reset_task)
cancel_delayed_work_sync(watchdog_task)
The patch moves cancel_delayed_work_sync(watchdog_task) at the beginning
of e1000_down_and_stop() thus ensuring the race is impossible.
Cc: Tushar Dave <tushar.n.dave@intel.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Vladimir Davydov [Sat, 23 Nov 2013 07:17:56 +0000 (07:17 +0000)]
e1000: fix lockdep warning in e1000_reset_task
The patch fixes the following lockdep warning, which is 100%
reproducible on network restart:
======================================================
[ INFO: possible circular locking dependency detected ]
3.12.0+ #47 Tainted: GF
-------------------------------------------------------
kworker/1:1/27 is trying to acquire lock:
((&(&adapter->watchdog_task)->work)){+.+...}, at: [<
ffffffff8108a5b0>] flush_work+0x0/0x70
but task is already holding lock:
(&adapter->mutex){+.+...}, at: [<
ffffffffa0177c0a>] e1000_reset_task+0x4a/0xa0 [e1000]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&adapter->mutex){+.+...}:
[<
ffffffff810bdb5d>] lock_acquire+0x9d/0x120
[<
ffffffff816b8cbc>] mutex_lock_nested+0x4c/0x390
[<
ffffffffa017233d>] e1000_watchdog+0x7d/0x5b0 [e1000]
[<
ffffffff8108b972>] process_one_work+0x1d2/0x510
[<
ffffffff8108ca80>] worker_thread+0x120/0x3a0
[<
ffffffff81092c1e>] kthread+0xee/0x110
[<
ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0
-> #0 ((&(&adapter->watchdog_task)->work)){+.+...}:
[<
ffffffff810bd9c0>] __lock_acquire+0x1710/0x1810
[<
ffffffff810bdb5d>] lock_acquire+0x9d/0x120
[<
ffffffff8108a5eb>] flush_work+0x3b/0x70
[<
ffffffff8108b5d8>] __cancel_work_timer+0x98/0x140
[<
ffffffff8108b693>] cancel_delayed_work_sync+0x13/0x20
[<
ffffffffa0170cec>] e1000_down_and_stop+0x3c/0x60 [e1000]
[<
ffffffffa01775b1>] e1000_down+0x131/0x220 [e1000]
[<
ffffffffa0177c12>] e1000_reset_task+0x52/0xa0 [e1000]
[<
ffffffff8108b972>] process_one_work+0x1d2/0x510
[<
ffffffff8108ca80>] worker_thread+0x120/0x3a0
[<
ffffffff81092c1e>] kthread+0xee/0x110
[<
ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&adapter->mutex);
lock((&(&adapter->watchdog_task)->work));
lock(&adapter->mutex);
lock((&(&adapter->watchdog_task)->work));
*** DEADLOCK ***
3 locks held by kworker/1:1/27:
#0: (events){.+.+.+}, at: [<
ffffffff8108b906>] process_one_work+0x166/0x510
#1: ((&adapter->reset_task)){+.+...}, at: [<
ffffffff8108b906>] process_one_work+0x166/0x510
#2: (&adapter->mutex){+.+...}, at: [<
ffffffffa0177c0a>] e1000_reset_task+0x4a/0xa0 [e1000]
stack backtrace:
CPU: 1 PID: 27 Comm: kworker/1:1 Tainted: GF 3.12.0+ #47
Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0501 05/31/2007
Workqueue: events e1000_reset_task [e1000]
ffffffff820f6000 ffff88007b9dba98 ffffffff816b54a2 0000000000000002
ffffffff820f5e50 ffff88007b9dbae8 ffffffff810ba936 ffff88007b9dbac8
ffff88007b9dbb48 ffff88007b9d8f00 ffff88007b9d8780 ffff88007b9d8f00
Call Trace:
[<
ffffffff816b54a2>] dump_stack+0x49/0x5f
[<
ffffffff810ba936>] print_circular_bug+0x216/0x310
[<
ffffffff810bd9c0>] __lock_acquire+0x1710/0x1810
[<
ffffffff8108a5b0>] ? __flush_work+0x250/0x250
[<
ffffffff810bdb5d>] lock_acquire+0x9d/0x120
[<
ffffffff8108a5b0>] ? __flush_work+0x250/0x250
[<
ffffffff8108a5eb>] flush_work+0x3b/0x70
[<
ffffffff8108a5b0>] ? __flush_work+0x250/0x250
[<
ffffffff8108b5d8>] __cancel_work_timer+0x98/0x140
[<
ffffffff8108b693>] cancel_delayed_work_sync+0x13/0x20
[<
ffffffffa0170cec>] e1000_down_and_stop+0x3c/0x60 [e1000]
[<
ffffffffa01775b1>] e1000_down+0x131/0x220 [e1000]
[<
ffffffffa0177c12>] e1000_reset_task+0x52/0xa0 [e1000]
[<
ffffffff8108b972>] process_one_work+0x1d2/0x510
[<
ffffffff8108b906>] ? process_one_work+0x166/0x510
[<
ffffffff8108ca80>] worker_thread+0x120/0x3a0
[<
ffffffff8108c960>] ? manage_workers+0x2c0/0x2c0
[<
ffffffff81092c1e>] kthread+0xee/0x110
[<
ffffffff81092b30>] ? __init_kthread_worker+0x70/0x70
[<
ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0
[<
ffffffff81092b30>] ? __init_kthread_worker+0x70/0x70
== The issue background ==
The problem occurs, because e1000_down(), which is called under
adapter->mutex by e1000_reset_task(), tries to synchronously cancel
e1000 auxiliary works (reset_task, watchdog_task, phy_info_task,
fifo_stall_task), which take adapter->mutex in their handlers. So the
question is what does adapter->mutex protect there?
The adapter->mutex was introduced by commit 0ef4ee ("e1000: convert to
private mutex from rtnl") as a replacement for rtnl_lock() taken in the
asynchronous handlers. It targeted on fixing a similar lockdep warning
issued when e1000_down() was called under rtnl_lock(), and it fixed it,
but unfortunately it introduced the lockdep warning described above.
Anyway, that said the source of this bug is that the asynchronous works
were made to take rtnl_lock() some time ago, so let's look deeper and
find why it was added there.
The rtnl_lock() was added to asynchronous handlers by commit 338c15
("e1000: fix occasional panic on unload") in order to prevent
asynchronous handlers from execution after the module is unloaded
(e1000_down() is called) as it follows from the comment to the commit:
> Net drivers in general have an issue where timers fired
> by mod_timer or work threads with schedule_work are running
> outside of the rtnl_lock.
>
> With no other lock protection these routines are vulnerable
> to races with driver unload or reset paths.
>
> The longer term solution to this might be a redesign with
> safer locks being taken in the driver to guarantee no
> reentrance, but for now a safe and effective fix is
> to take the rtnl_lock in these routines.
I'm not sure if this locking scheme fixed the problem or just made it
unlikely, although I incline to the latter. Anyway, this was long time
ago when e1000 auxiliary works were implemented as timers scheduling
real work handlers in their routines. The e1000_down() function only
canceled the timers, but left the real handlers running if they were
running, which could result in work execution after module unload.
Today, the e1000 driver uses sane delayed works instead of the pair
timer+work to implement its delayed asynchronous handlers, and the
e1000_down() synchronously cancels all the works so that the problem
that commit 338c15 tried to cope with disappeared, and we don't need any
locks in the handlers any more. Moreover, any locking there can
potentially result in a deadlock.
So, this patch reverts commits 0ef4ee and 338c15.
Fixes:
0ef4eedc2e98 ("e1000: convert to private mutex from rtnl")
Fixes:
338c15e470d8 ("e1000: fix occasional panic on unload")
Cc: Tushar Dave <tushar.n.dave@intel.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
yzhu1 [Sat, 23 Nov 2013 07:07:40 +0000 (07:07 +0000)]
e1000: prevent oops when adapter is being closed and reset simultaneously
This change is based on a similar change made to e1000e support in
commit
bb9e44d0d0f4 ("e1000e: prevent oops when adapter is being closed
and reset simultaneously"). The same issue has also been observed
on the older e1000 cards.
Here, we have increased the RESET_COUNT value to 50 because there are too
many accesses to e1000 nic on stress tests to e1000 nic, it is not enough
to set RESET_COUT 25. Experimentation has shown that it is enough to set
RESET_COUNT 50.
Signed-off-by: yzhu1 <yanjun.zhu@windriver.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Akeem G Abodunrin [Fri, 8 Nov 2013 01:54:07 +0000 (01:54 +0000)]
igb: Fixed Wake On LAN support
This patch fixes Wake on LAN being reported as supported on some Ethernet
ports, in contrary to Hardware capability.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Roberto Sassu [Wed, 27 Nov 2013 13:40:41 +0000 (14:40 +0100)]
ima: store address of template_fmt_copy in a pointer before calling strsep
This patch stores the address of the 'template_fmt_copy' variable in a new
variable, called 'template_fmt_ptr', so that the latter is passed as an
argument of strsep() instead of the former. This modification is needed
in order to correctly free the memory area referenced by
'template_fmt_copy' (strsep() modifies the pointer of the passed string).
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Eric Dumazet [Thu, 28 Nov 2013 17:51:22 +0000 (09:51 -0800)]
inet: fix possible seqlock deadlocks
In commit
c9e9042994d3 ("ipv4: fix possible seqlock deadlock") I left
another places where IP_INC_STATS_BH() were improperly used.
udp_sendmsg(), ping_v4_sendmsg() and tcp_v4_connect() are called from
process context, not from softirq context.
This was detected by lockdep seqlock support.
Reported-by: jongman heo <jongman.heo@samsung.com>
Fixes:
584bdf8cbdf6 ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Fixes:
c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 28 Nov 2013 17:01:38 +0000 (18:01 +0100)]
team: fix master carrier set when user linkup is enabled
When user linkup is enabled and user sets linkup of individual port,
we need to recompute linkup (carrier) of master interface so the change
is reflected. Fix this by calling __team_carrier_check() which does the
needed work.
Please apply to all stable kernels as well. Thanks.
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shawn Landden [Mon, 25 Nov 2013 06:36:28 +0000 (22:36 -0800)]
net: update consumers of MSG_MORE to recognize MSG_SENDPAGE_NOTLAST
Commit
35f9c09fe (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag MSG_SENDPAGE_NOTLAST, similar to
MSG_MORE.
algif_hash, algif_skcipher, and udp used MSG_MORE from tcp_sendpages()
and need to see the new flag as identical to MSG_MORE.
This fixes sendfile() on AF_ALG.
v3: also fix udp
Cc: Tom Herbert <therbert@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> # 3.4.x + 3.2.x
Reported-and-tested-by: Shawn Landden <shawnlandden@gmail.com>
Original-patch: Richard Weinberger <richard@nod.at>
Signed-off-by: Shawn Landden <shawn@churchofgit.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Thu, 28 Nov 2013 02:54:31 +0000 (18:54 -0800)]
sfc: Convert to use hwmon_device_register_with_groups
Simplify the code. Avoid race conditions caused by attributes
being created after hwmon device registration. Implicitly
(through hwmon API) add mandatory 'name' sysfs attribute.
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Thu, 28 Nov 2013 13:33:52 +0000 (14:33 +0100)]
net: smc91: fix crash regression on the versatile
After commit
e9e4ea74f06635f2ffc1dffe5ef40c854faa0a90
"net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data"
The Versatile SMSC LAN91C111 is crashing like this:
------------[ cut here ]------------
kernel BUG at /home/linus/linux/drivers/net/ethernet/smsc/smc91x.c:599!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 43 Comm: udhcpc Not tainted 3.13.0-rc1+ #24
task:
c6ccfaa0 ti:
c6cd0000 task.ti:
c6cd0000
PC is at smc_hardware_send_pkt+0x198/0x22c
LR is at smc_hardware_send_pkt+0x24/0x22c
pc : [<
c01be324>] lr : [<
c01be1b0>] psr:
20000013
sp :
c6cd1d08 ip :
00000001 fp :
00000000
r10:
c02adb08 r9 :
00000000 r8 :
c6ced802
r7 :
c786fba0 r6 :
00000146 r5 :
c8800000 r4 :
c78d6000
r3 :
0000000f r2 :
00000146 r1 :
00000000 r0 :
00000031
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control:
0005317f Table:
06cf4000 DAC:
00000015
Process udhcpc (pid: 43, stack limit = 0xc6cd01c0)
Stack: (0xc6cd1d08 to 0xc6cd2000)
1d00:
00000010 c8800000 c78d6000 c786fba0 c78d6000 c01be868
1d20:
c01be7a4 00004000 00000000 c786fba0 c6c12b80 c0208554 000004d0 c780fc60
1d40:
00000220 c01fb734 00000000 00000000 00000000 c6c9a440 c6c12b80 c78d6000
1d60:
c786fba0 c6c9a440 00000000 c021d1d8 00000000 00000000 c6c12b80 c78d6000
1d80:
c786fba0 00000001 c6c9a440 c02087f8 c6c9a4a0 00080008 00000000 00000000
1da0:
c78d6000 c786fba0 c78d6000 00000138 00000000 00000000 00000000 00000000
1dc0:
00000000 c027ba74 00000138 00000138 00000001 00000010 c6cedc00 00000000
1de0:
00000008 c7404400 c6cd1eec c6cd1f14 c067a73c c065c0b8 00000000 c067a740
1e00:
01ffffff 002040d0 00000000 00000000 00000000 00000000 00000000 ffffffff
1e20:
43004400 00110022 c6cdef20 c027ae8c c6ccfaa0 be82d65c 00000014 be82d3cc
1e40:
00000000 00000000 00000000 c01f2870 00000000 00000000 00000000 c6cd1e88
1e60:
c6ccfaa0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1e80:
00000000 00000000 00000031 c7802310 c7802300 00000138 c7404400 c0771da0
1ea0:
00000000 c6cd1eec c7800340 00000138 be82d65c 00000014 be82d3cc c6cd1f08
1ec0:
00000014 00000000 c7404400 c7404400 00000138 c01f4628 c78d6000 00000000
1ee0:
00000000 be82d3cc 00000138 c6cd1f08 00000014 c6cd1ee4 00000001 00000000
1f00:
00000000 00000000 00080011 00000002 06000000 ffffffff 0000ffff 00000002
1f20:
06000000 ffffffff 0000ffff c00928c8 c065c520 c6cd1f58 00000003 c009299c
1f40:
00000003 c065c520 c7404400 00000000 c7404400 c01f2218 c78106b0 c7441cb0
1f60:
00000000 00000006 c06799fc 00000000 00000000 00000006 00000000 c01f3ee0
1f80:
00000000 00000000 be82d678 be82d65c 00000014 00000001 00000122 c00139c8
1fa0:
c6cd0000 c0013840 be82d65c 00000014 00000006 be82d3cc 00000138 00000000
1fc0:
be82d65c 00000014 00000001 00000122 00000000 00000000 00018cb1 00000000
1fe0:
00003801 be82d3a8 0003a0c7 b6e9af08 60000010 00000006 00000000 00000000
[<
c01be324>] (smc_hardware_send_pkt+0x198/0x22c) from [<
c01be868>] (smc_hard_start_xmit+0xc4/0x1e8)
[<
c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) from [<
c0208554>] (dev_hard_start_xmit+0x460/0x4cc)
[<
c0208554>] (dev_hard_start_xmit+0x460/0x4cc) from [<
c021d1d8>] (sch_direct_xmit+0x94/0x18c)
[<
c021d1d8>] (sch_direct_xmit+0x94/0x18c) from [<
c02087f8>] (dev_queue_xmit+0x238/0x42c)
[<
c02087f8>] (dev_queue_xmit+0x238/0x42c) from [<
c027ba74>] (packet_sendmsg+0xbe8/0xd28)
[<
c027ba74>] (packet_sendmsg+0xbe8/0xd28) from [<
c01f2870>] (sock_sendmsg+0x84/0xa8)
[<
c01f2870>] (sock_sendmsg+0x84/0xa8) from [<
c01f4628>] (SyS_sendto+0xb8/0xdc)
[<
c01f4628>] (SyS_sendto+0xb8/0xdc) from [<
c0013840>] (ret_fast_syscall+0x0/0x2c)
Code:
e3130002 1a000001 e3130001 0affffcd (
e7f001f2)
---[ end trace
81104fe70e8da7fe ]---
Kernel panic - not syncing: Fatal exception in interrupt
This is because the macro operations in smc91x.h defined
for Versatile are missing SMC_outsw() as used in this
commit.
The Versatile needs and uses the same accessors as the other
platforms in the first if(...) clause, just switch it to using
that and we have one problem less to worry about.
This includes a hunk of a patch from Will Deacon fixin
the other 32bit platforms as well: Innokom, Ramses, PXA,
PCM027.
Checkpatch complains about spacing, but I have opted to
follow the style of this .h-file.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 29 Nov 2013 21:23:14 +0000 (16:23 -0500)]
Revert "net: smc91: fix crash regression on the versatile"
This reverts commit
b268daffdcb9762ad9aa3898096570a9dd92aa9b.
I applied the wrong version of this patch, the proper version
is coming up next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Wed, 27 Nov 2013 12:33:21 +0000 (13:33 +0100)]
net: smc91: fix crash regression on the versatile
After commit
e9e4ea74f06635f2ffc1dffe5ef40c854faa0a90
"net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data"
The Versatile SMSC LAN91C111 is crashing like this:
------------[ cut here ]------------
kernel BUG at /home/linus/linux/drivers/net/ethernet/smsc/smc91x.c:599!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 43 Comm: udhcpc Not tainted 3.13.0-rc1+ #24
task:
c6ccfaa0 ti:
c6cd0000 task.ti:
c6cd0000
PC is at smc_hardware_send_pkt+0x198/0x22c
LR is at smc_hardware_send_pkt+0x24/0x22c
pc : [<
c01be324>] lr : [<
c01be1b0>] psr:
20000013
sp :
c6cd1d08 ip :
00000001 fp :
00000000
r10:
c02adb08 r9 :
00000000 r8 :
c6ced802
r7 :
c786fba0 r6 :
00000146 r5 :
c8800000 r4 :
c78d6000
r3 :
0000000f r2 :
00000146 r1 :
00000000 r0 :
00000031
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control:
0005317f Table:
06cf4000 DAC:
00000015
Process udhcpc (pid: 43, stack limit = 0xc6cd01c0)
Stack: (0xc6cd1d08 to 0xc6cd2000)
1d00:
00000010 c8800000 c78d6000 c786fba0 c78d6000 c01be868
1d20:
c01be7a4 00004000 00000000 c786fba0 c6c12b80 c0208554 000004d0 c780fc60
1d40:
00000220 c01fb734 00000000 00000000 00000000 c6c9a440 c6c12b80 c78d6000
1d60:
c786fba0 c6c9a440 00000000 c021d1d8 00000000 00000000 c6c12b80 c78d6000
1d80:
c786fba0 00000001 c6c9a440 c02087f8 c6c9a4a0 00080008 00000000 00000000
1da0:
c78d6000 c786fba0 c78d6000 00000138 00000000 00000000 00000000 00000000
1dc0:
00000000 c027ba74 00000138 00000138 00000001 00000010 c6cedc00 00000000
1de0:
00000008 c7404400 c6cd1eec c6cd1f14 c067a73c c065c0b8 00000000 c067a740
1e00:
01ffffff 002040d0 00000000 00000000 00000000 00000000 00000000 ffffffff
1e20:
43004400 00110022 c6cdef20 c027ae8c c6ccfaa0 be82d65c 00000014 be82d3cc
1e40:
00000000 00000000 00000000 c01f2870 00000000 00000000 00000000 c6cd1e88
1e60:
c6ccfaa0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1e80:
00000000 00000000 00000031 c7802310 c7802300 00000138 c7404400 c0771da0
1ea0:
00000000 c6cd1eec c7800340 00000138 be82d65c 00000014 be82d3cc c6cd1f08
1ec0:
00000014 00000000 c7404400 c7404400 00000138 c01f4628 c78d6000 00000000
1ee0:
00000000 be82d3cc 00000138 c6cd1f08 00000014 c6cd1ee4 00000001 00000000
1f00:
00000000 00000000 00080011 00000002 06000000 ffffffff 0000ffff 00000002
1f20:
06000000 ffffffff 0000ffff c00928c8 c065c520 c6cd1f58 00000003 c009299c
1f40:
00000003 c065c520 c7404400 00000000 c7404400 c01f2218 c78106b0 c7441cb0
1f60:
00000000 00000006 c06799fc 00000000 00000000 00000006 00000000 c01f3ee0
1f80:
00000000 00000000 be82d678 be82d65c 00000014 00000001 00000122 c00139c8
1fa0:
c6cd0000 c0013840 be82d65c 00000014 00000006 be82d3cc 00000138 00000000
1fc0:
be82d65c 00000014 00000001 00000122 00000000 00000000 00018cb1 00000000
1fe0:
00003801 be82d3a8 0003a0c7 b6e9af08 60000010 00000006 00000000 00000000
[<
c01be324>] (smc_hardware_send_pkt+0x198/0x22c) from [<
c01be868>] (smc_hard_start_xmit+0xc4/0x1e8)
[<
c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) from [<
c0208554>] (dev_hard_start_xmit+0x460/0x4cc)
[<
c0208554>] (dev_hard_start_xmit+0x460/0x4cc) from [<
c021d1d8>] (sch_direct_xmit+0x94/0x18c)
[<
c021d1d8>] (sch_direct_xmit+0x94/0x18c) from [<
c02087f8>] (dev_queue_xmit+0x238/0x42c)
[<
c02087f8>] (dev_queue_xmit+0x238/0x42c) from [<
c027ba74>] (packet_sendmsg+0xbe8/0xd28)
[<
c027ba74>] (packet_sendmsg+0xbe8/0xd28) from [<
c01f2870>] (sock_sendmsg+0x84/0xa8)
[<
c01f2870>] (sock_sendmsg+0x84/0xa8) from [<
c01f4628>] (SyS_sendto+0xb8/0xdc)
[<
c01f4628>] (SyS_sendto+0xb8/0xdc) from [<
c0013840>] (ret_fast_syscall+0x0/0x2c)
Code:
e3130002 1a000001 e3130001 0affffcd (
e7f001f2)
---[ end trace
81104fe70e8da7fe ]---
Kernel panic - not syncing: Fatal exception in interrupt
This is because the macro operations in smc91x.h defined
for Versatile are missing SMC_outsw() as used in this
commit.
The Versatile needs and uses the same accessors as the other
platforms in the first if(...) clause, just switch it to using
that and we have one problem less to worry about.
Checkpatch complains about spacing, but I have opted to
follow the style of this .h-file.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Wed, 27 Nov 2013 06:32:52 +0000 (14:32 +0800)]
net: 8139cp: fix a BUG_ON triggered by wrong bytes_compl
Using iperf to send packets(GSO mode is on), a bug is triggered:
[ 212.672781] kernel BUG at lib/dynamic_queue_limits.c:26!
[ 212.673396] invalid opcode: 0000 [#1] SMP
[ 212.673882] Modules linked in: 8139cp(O) nls_utf8 edd fuse loop dm_mod ipv6 i2c_piix4 8139too i2c_core intel_agp joydev pcspkr hid_generic intel_gtt floppy sr_mod mii button sg cdrom ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif crct10dif_common processor thermal_sys hwmon scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_generic ata_piix libata scsi_mod [last unloaded: 8139cp]
[ 212.676084] CPU: 0 PID: 4124 Comm: iperf Tainted: G O 3.12.0-0.7-default+ #16
[ 212.676084] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 212.676084] task:
ffff8800d83966c0 ti:
ffff8800db4c8000 task.ti:
ffff8800db4c8000
[ 212.676084] RIP: 0010:[<
ffffffff8122e23f>] [<
ffffffff8122e23f>] dql_completed+0x17f/0x190
[ 212.676084] RSP: 0018:
ffff880116e03e30 EFLAGS:
00010083
[ 212.676084] RAX:
00000000000005ea RBX:
0000000000000f7c RCX:
0000000000000002
[ 212.676084] RDX:
ffff880111dd0dc0 RSI:
0000000000000bd4 RDI:
ffff8800db6ffcc0
[ 212.676084] RBP:
ffff880116e03e48 R08:
0000000000000992 R09:
0000000000000000
[ 212.676084] R10:
ffffffff8181e400 R11:
0000000000000004 R12:
000000000000000f
[ 212.676084] R13:
ffff8800d94ec840 R14:
ffff8800db440c80 R15:
000000000000000e
[ 212.676084] FS:
00007f6685a3c700(0000) GS:
ffff880116e00000(0000) knlGS:
0000000000000000
[ 212.676084] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 212.676084] CR2:
00007f6685ad6460 CR3:
00000000db714000 CR4:
00000000000006f0
[ 212.676084] Stack:
[ 212.676084]
ffff8800db6ffc00 000000000000000f ffff8800d94ec840 ffff880116e03eb8
[ 212.676084]
ffffffffa041509f ffff880116e03e88 0000000f16e03e88 ffff8800d94ec000
[ 212.676084]
00000bd400059858 000000050000000f ffffffff81094c36 ffff880116e03eb8
[ 212.676084] Call Trace:
[ 212.676084] <IRQ>
[ 212.676084] [<
ffffffffa041509f>] cp_interrupt+0x4ef/0x590 [8139cp]
[ 212.676084] [<
ffffffff81094c36>] ? ktime_get+0x56/0xd0
[ 212.676084] [<
ffffffff8108cf73>] handle_irq_event_percpu+0x53/0x170
[ 212.676084] [<
ffffffff8108d0cc>] handle_irq_event+0x3c/0x60
[ 212.676084] [<
ffffffff8108fdb5>] handle_fasteoi_irq+0x55/0xf0
[ 212.676084] [<
ffffffff810045df>] handle_irq+0x1f/0x30
[ 212.676084] [<
ffffffff81003c8b>] do_IRQ+0x5b/0xe0
[ 212.676084] [<
ffffffff8142beaa>] common_interrupt+0x6a/0x6a
[ 212.676084] <EOI>
[ 212.676084] [<
ffffffffa0416a21>] ? cp_start_xmit+0x621/0x97c [8139cp]
[ 212.676084] [<
ffffffffa0416a09>] ? cp_start_xmit+0x609/0x97c [8139cp]
[ 212.676084] [<
ffffffff81378ed9>] dev_hard_start_xmit+0x2c9/0x550
[ 212.676084] [<
ffffffff813960a9>] sch_direct_xmit+0x179/0x1d0
[ 212.676084] [<
ffffffff813793f3>] dev_queue_xmit+0x293/0x440
[ 212.676084] [<
ffffffff813b0e46>] ip_finish_output+0x236/0x450
[ 212.676084] [<
ffffffff810e59e7>] ? __alloc_pages_nodemask+0x187/0xb10
[ 212.676084] [<
ffffffff813b10e8>] ip_output+0x88/0x90
[ 212.676084] [<
ffffffff813afa64>] ip_local_out+0x24/0x30
[ 212.676084] [<
ffffffff813aff0d>] ip_queue_xmit+0x14d/0x3e0
[ 212.676084] [<
ffffffff813c6fd1>] tcp_transmit_skb+0x501/0x840
[ 212.676084] [<
ffffffff813c8323>] tcp_write_xmit+0x1e3/0xb20
[ 212.676084] [<
ffffffff81363237>] ? skb_page_frag_refill+0x87/0xd0
[ 212.676084] [<
ffffffff813c8c8b>] tcp_push_one+0x2b/0x40
[ 212.676084] [<
ffffffff813bb7e6>] tcp_sendmsg+0x926/0xc90
[ 212.676084] [<
ffffffff813e1d21>] inet_sendmsg+0x61/0xc0
[ 212.676084] [<
ffffffff8135e861>] sock_aio_write+0x101/0x120
[ 212.676084] [<
ffffffff81107cf1>] ? vma_adjust+0x2e1/0x5d0
[ 212.676084] [<
ffffffff812163e0>] ? timerqueue_add+0x60/0xb0
[ 212.676084] [<
ffffffff81130b60>] do_sync_write+0x60/0x90
[ 212.676084] [<
ffffffff81130d44>] ? rw_verify_area+0x54/0xf0
[ 212.676084] [<
ffffffff81130f66>] vfs_write+0x186/0x190
[ 212.676084] [<
ffffffff811317fd>] SyS_write+0x5d/0xa0
[ 212.676084] [<
ffffffff814321e2>] system_call_fastpath+0x16/0x1b
[ 212.676084] Code: ca 41 89 dc 41 29 cc 45 31 db 29 c2 41 89 c5 89 d0 45 29 c5 f7 d0 c1 e8 1f e9 43 ff ff ff 66 0f 1f 44 00 00 31 c0 e9 7b ff ff ff <0f> 0b eb fe 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 c7 47 40 00
[ 212.676084] RIP [<
ffffffff8122e23f>] dql_completed+0x17f/0x190
------------[ cut here ]------------
When a skb has frags, bytes_compl plus skb->len nr_frags times in cp_tx().
It's not the correct value(actually, it should plus skb->len once) and it
will trigger the BUG_ON(bytes_compl > num_queued - dql->num_completed).
So only increase bytes_compl when finish sending all frags. pkts_compl also
has a wrong value, fix it too.
It's introduced by commit
871f0d4c ("8139cp: enable bql").
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Chang [Wed, 27 Nov 2013 07:48:36 +0000 (15:48 +0800)]
r8169: check ALDPS bit and disable it if enabled for the 8168g
Windows driver will enable ALDPS function, but linux driver and firmware
do not have any configuration related to ALDPS function for 8168g.
So restart system to linux and remove the NIC cable, LAN enter ALDPS,
then LAN RX will be disabled.
This issue can be easily reproduced on dual boot windows and linux
system with RTL_GIGA_MAC_VER_40 chip.
Realtek said, ALDPS function can be disabled by configuring to PHY,
switch to page 0x0A43, reg0x10 bit2=0.
Signed-off-by: David Chang <dchang@suse.com>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 27 Nov 2013 12:40:21 +0000 (15:40 +0300)]
net: clamp ->msg_namelen instead of returning an error
If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the
original code that would lead to memory corruption in the kernel if you
had audit configured. If you didn't have audit configured it was
harmless.
There are some programs such as beta versions of Ruby which use too
large of a buffer and returning an error code breaks them. We should
clamp the ->msg_namelen value instead.
Fixes:
1661bf364ae9 ("net: heap overflow in __audit_sockaddr()")
Reported-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Eric Wong <normalperson@yhbt.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Fri, 29 Nov 2013 08:53:23 +0000 (09:53 +0100)]
af_packet: block BH in prb_shutdown_retire_blk_timer()
Currently we're using plain spin_lock() in prb_shutdown_retire_blk_timer(),
however the timer might fire right in the middle and thus try to re-aquire
the same spinlock, leaving us in a endless loop.
To fix that, use the spin_lock_bh() to block it.
Fixes:
f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
CC: "David S. Miller" <davem@davemloft.net>
CC: Daniel Borkmann <dborkman@redhat.com>
CC: Willem de Bruijn <willemb@google.com>
CC: Phil Sutter <phil@nwl.cc>
CC: Eric Dumazet <edumazet@google.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Tue, 26 Nov 2013 17:37:12 +0000 (12:37 -0500)]
macvtap: Do not double-count received packets
Currently macvlan will count received packets after calling each
vlans receive handler. Macvtap attempts to count the packet
yet again when the user reads the packet from the tap socket.
This code doesn't do this consistently either. Remove the
counting from macvtap and let only macvlan count received
packets.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 29 Nov 2013 20:57:14 +0000 (12:57 -0800)]
Linux 3.13-rc2
Linus Torvalds [Fri, 29 Nov 2013 17:57:13 +0000 (09:57 -0800)]
Merge tag 'arm64-stable' of git://git./linux/kernel/git/cmarinas/linux-aarch64
Pull ARM64 fixes from Catalin Marinas:
- Remove preempt_count modifications in the arm64 IRQ handling code
since that's already dealt with in generic irq_enter/irq_exit
- PTE_PROT_NONE bit moved higher up to avoid overlapping with the
hardware bits (for PROT_NONE mappings which are pte_present)
- Big-endian fixes for ptrace support
- Asynchronous aborts unmasking while in the kernel
- pgprot_writecombine() change to create Normal NonCacheable memory
rather than Device GRE
* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
arm64: Move PTE_PROT_NONE higher up
arm64: Use Normal NonCacheable memory for writecombine
arm64: debug: make aarch32 bkpt checking endian clean
arm64: ptrace: fix compat registes get/set to be endian clean
arm64: Unmask asynchronous aborts when in kernel mode
arm64: dts: Reserve the memory used for secondary CPU release address
arm64: let the core code deal with preempt_count
Linus Torvalds [Fri, 29 Nov 2013 17:56:15 +0000 (09:56 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
"One performance improvement and a few bug fixes. Two of the fixes
deal with the clock related problems we have seen on recent kernels"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: handle asce-type exceptions as normal page fault
s390,time: revert direct ktime path for s390 clockevent device
s390/time,vdso: convert to the new update_vsyscall interface
s390/uaccess: add missing page table walk range check
s390/mm: optimize copy_page
s390/dasd: validate request size before building CCW/TCW request
s390/signal: always restore saved runtime instrumentation psw bit
Linus Torvalds [Fri, 29 Nov 2013 17:55:13 +0000 (09:55 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some easy but needed fixes for i2c drivers since rc1"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: bcm2835: Linking platform nodes to adapter nodes
i2c: omap: raw read and write endian fix
i2c: i2c-bcm-kona: Fix module build
i2c: i2c-diolan-u2c: different usb endpoints for DLN-2-U2C
i2c: bcm-kona: remove duplicated include
i2c: davinci: raw read and write endian fix
Linus Torvalds [Fri, 29 Nov 2013 17:49:08 +0000 (09:49 -0800)]
Merge branch 'for-3.13-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"This contains one important fix. The NUMA support added a while back
broke ordering guarantees on ordered workqueues. It was enforced by
having single frontend interface with @max_active == 1 but the NUMA
support puts multiple interfaces on unbound workqueues on NUMA
machines thus breaking the ordered guarantee. This is fixed by
disabling NUMA support on ordered workqueues.
The above and a couple other patches were sitting in for-3.12-fixes
but I forgot to push that out, so they ended up waiting a bit too
long. My aplogies.
Other fixes are minor"
* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues
workqueue: fix comment typo for __queue_work()
workqueue: fix ordered workqueues in NUMA setups
workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
Linus Torvalds [Fri, 29 Nov 2013 17:48:25 +0000 (09:48 -0800)]
Merge branch 'for-3.13-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"libata device removal path was removing parent device node before its
child, which is mostly harmless but triggers warning after recent
sysfs changes. Rafael's patch fixes the order.
Other than that, minor controller-specific fixes and device ID
additions"
* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ATA: Fix port removal ordering
ahci: add Marvell 9230 to the AHCI PCI device list
ata: fix acpi_bus_get_device() return value check
pata_arasan_cf: add missing clk_disable_unprepare() on error path
ahci: add support for IBM Akebono platform device
Linus Torvalds [Fri, 29 Nov 2013 17:47:06 +0000 (09:47 -0800)]
Merge branch 'for-3.13-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Fixes for three issues.
- cgroup destruction path could swamp system_wq possibly leading to
deadlock. This actually seems to happen in the wild with memcg
because memcg destruction path adds nested dependency on system_wq.
Resolved by isolating cgroup destruction work items on its
dedicated workqueue.
- Possible locking context deadlock through seqcount reported by
lockdep
- Memory leak under certain conditions"
* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix cgroup_subsys_state leak for seq_files
cpuset: Fix memory allocator deadlock
cgroup: use a dedicated workqueue for cgroup destruction
Linus Torvalds [Fri, 29 Nov 2013 17:36:42 +0000 (09:36 -0800)]
Merge tag 'sound-3.13-rc2' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Quite a few HD-Audio fixes, a WUSB audio fix and a fix for FireWire
audio. The HD-audio part contains a couple of fixes for the generic
parser, and these are the only intrusive fixes. The rest are mostly
device-specific fixes"
* tag 'sound-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add LFE chmap to ASUS ET2700
ALSA: hda - Initialize missing bass speaker pin for ASUS AIO ET2700
ALSA: hda - limit mic boost on Asus UX31[A,E]
ALSA: hda - Check leaf nodes to find aamix amps
ALSA: hda - Fix hp-mic mode without VREF bits
ALSA: hda - Create Headhpone Mic Jack Mode when really needed
ALSA: usb: use multiple packets per urb for Wireless USB inbound audio
ALSA: hda - Enable mute/mic-mute LEDs for more Thinkpads with Conexant codec
ALSA: hda - Drop bus->avoid_link_reset flag
ALSA: hda/realtek - Set pcbeep amp for ALC668
ALSA: hda/realtek - Add support of ALC231 codec
ALSA: firewire-lib: fix wrong value for FDF field as an empty packet
Linus Torvalds [Fri, 29 Nov 2013 17:27:19 +0000 (09:27 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs dentry reference count fix from Al Viro.
This fixes a possible inode_permission NULL pointer dereference (and
other problems) that were due to the root dentry count being decremented
too much. In commit
48a066e72d97 ("RCU'd vfsmounts") the placement of
clearing the LOOKUP_RCU bit changed, and we then returned failure of
incrementing the lockref on the parent dentry with LOOKUP_RCU cleared.
But that meant we needed to go through the same cleanup routines that
the later failures did wrt LOOKUP_ROOT and nd->root.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix bogus path_put() of nd->root after some unlazy_walk() failures
Linus Torvalds [Fri, 29 Nov 2013 17:26:42 +0000 (09:26 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm qxl leak fix from Dave Airlie:
"As usual 5 mins after I send a trivial pull fix I find a real bug!
This fixes a memory leak and I'd like to get it into stable queue
asap"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/qxl: fix memory leak in release list handling
Catalin Marinas [Wed, 27 Nov 2013 16:59:27 +0000 (16:59 +0000)]
arm64: Move PTE_PROT_NONE higher up
PTE_PROT_NONE means that a pte is present but does not have any
read/write attributes. However, setting the memory type like
pgprot_writecombine() is allowed and such bits overlap with
PTE_PROT_NONE. This causes mmap/munmap issues in drivers that change the
vma->vm_pg_prot on PROT_NONE mappings.
This patch reverts the PTE_FILE/PTE_PROT_NONE shift in commit
59911ca4325d (ARM64: mm: Move PTE_PROT_NONE bit) and moves PTE_PROT_NONE
together with the other software bits.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Steve Capper <steve.capper@linaro.org>
Cc: <stable@vger.kernel.org> # 3.11+
Catalin Marinas [Fri, 29 Nov 2013 10:56:14 +0000 (10:56 +0000)]
arm64: Use Normal NonCacheable memory for writecombine
This provides better performance compared to Device GRE and also allows
unaligned accesses. Such memory is intended to be used with standard RAM
(e.g. framebuffers) and not I/O.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Al Viro [Fri, 29 Nov 2013 06:48:32 +0000 (01:48 -0500)]
fix bogus path_put() of nd->root after some unlazy_walk() failures
Failure to grab reference to parent dentry should go through the
same cleanup as nd->seq mismatch. As it is, we might end up with
caller thinking it needs to path_put() nd->root, with obvious
nasty results once we'd hit that bug enough times to drive the
refcount of root dentry all the way to zero...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Ivan Vecera [Wed, 27 Nov 2013 07:59:32 +0000 (08:59 +0100)]
be2net: call napi_disable() for all event queues
The recent be2net commit
6384a4d (adds a support for busy polling)
introduces a regression that results in kernel crash. It incorrectly
modified be_close() so napi_disable() is called only for the first queue.
This breaks a correct pairing of napi_enable/_disable for the rest
of event queues and causes a crash in subsequent be_open() call.
v2: Applied suggestions from Sathya
Fixes:
6384a4d ("be2net: add support for ndo_busy_poll")
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Nov 2013 23:53:36 +0000 (18:53 -0500)]
Revert "be2net: call napi_disable() for all event queues"
This reverts commit
55485e7b417b640870b14eceec4cfbcb2b3e7a92.
I applied the wrong version of this patch, the right one is coming up
next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Tue, 26 Nov 2013 16:54:41 +0000 (17:54 +0100)]
be2net: call napi_disable() for all event queues
The recent be2net commit
6384a4d (adds a support for busy polling)
introduces a regression that results in kernel crash. It incorrectly
modified be_close() so napi_disable() is called only for the first queue.
This breaks a correct pairing of napi_enable/_disable for the rest
of event queues and causes a crash in subsequent be_open() call.
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Baker Zhang [Tue, 26 Nov 2013 01:29:15 +0000 (09:29 +0800)]
net: remove outdated comment for ipv4 and ipv6 protocol handler
since
f9242b6b28d61295f2bf7e8adfb1060b382e5381
inet: Sanitize inet{,6} protocol demux.
there are not pretended hash tables for ipv4 or
ipv6 protocol handler.
Signed-off-by: Baker Zhang <Baker.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
françois romieu [Mon, 25 Nov 2013 23:40:58 +0000 (00:40 +0100)]
via-velocity: fix netif_receive_skb use in irq disabled section.
2fdac010bdcf10a30711b6924612dfc40daf19b8 ("via-velocity.c: update napi
implementation") overlooked an irq disabling spinlock when the Rx part
of the NAPI poll handler was converted from netif_rx to netif_receive_skb.
NAPI Rx processing can be taken out of the locked section with a pair of
napi_{disable / enable} since it only races with the MTU change function.
An heavier rework of the NAPI locking would be able to perform NAPI Tx
before Rx where I simply removed one of velocity_tx_srv calls.
References: https://bugzilla.redhat.com/show_bug.cgi?id=
1022733
Fixes:
2fdac010bdcf (via-velocity.c: update napi implementation)
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Alex A. Schmidt <aaschmidt1@gmail.com>
Cc: Jamie Heilman <jamie@audible.transient.net>
Cc: Michele Baldessari <michele@acksyn.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Nov 2013 23:40:10 +0000 (18:40 -0500)]
Merge branch 'fixes-for-3.13-
20131127' of git://gitorious.org/linux-can/linux-can
Marc Kleine-Budde says:
====================
here's a pull request for v3.13, i.e. net/master. It consists of a patch by
Oliver Hartkopp which fixes some corner cases in the interrupt handler of the
sja1000 driver. Then there are two patches for the c_can dirver. One by me,
which fixes a runtime pm related "scheduling while atomic" error and patch by
Holger Bechtold that fixes the calculation of the transmitted bytes.
The fourth patch is by me, it corrects the clock usage in the flexcan
driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Whitcroft [Mon, 25 Nov 2013 16:52:34 +0000 (16:52 +0000)]
xen-netback: include definition of csum_ipv6_magic
We are now using csum_ipv6_magic, include the appropriate header.
Avoids the following error:
drivers/net/xen-netback/netback.c:1313:4: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration]
tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Mon, 25 Nov 2013 09:21:24 +0000 (17:21 +0800)]
sit: use kfree_skb to replace dev_kfree_skb
In failure case, we should use kfree_skb not
dev_kfree_skb to free skbuff, dev_kfree_skb
is defined as consume_skb.
Trace takes advantage of this point.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Mon, 25 Nov 2013 09:19:04 +0000 (17:19 +0800)]
macvtap: fix tx_dropped counting error
After commit
8ffab51b3dfc54876f145f15b351c41f3f703195
(macvlan: lockless tx path), tx stat counter were converted to percpu stat
structure. So we need use to this also for tx_dropped in macvtap. Otherwise, the
management won't notice the dropping packet in macvtap tx path.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shaohui Xie [Mon, 25 Nov 2013 04:40:49 +0000 (12:40 +0800)]
phy: Add Vitesse 8514 phy ID
Phy is compatible with Vitesse 82xx
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xufeng Zhang [Mon, 25 Nov 2013 03:26:57 +0000 (11:26 +0800)]
sctp: Restore 'resent' bit to avoid retransmitted chunks for RTT measurements
Currently retransmitted DATA chunks could also be used for
RTT measurements since there are no flag to identify whether
the transmitted DATA chunk is a new one or a retransmitted one.
This problem is introduced by commit
ae19c5486 ("sctp: remove
'resent' bit from the chunk") which inappropriately removed the
'resent' bit completely, instead of doing this, we should set
the resent bit only for the retransmitted DATA chunks.
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Sun, 24 Nov 2013 20:09:26 +0000 (21:09 +0100)]
genetlink/pmcraid: use proper genetlink multicast API
The pmcraid driver is abusing the genetlink API and is using its
family ID as the multicast group ID, which is invalid and may
belong to somebody else (and likely will.)
Make it use the correct API, but since this may already be used
as-is by userspace, reserve a family ID for this code and also
reserve that group ID to not break userspace assumptions.
My previous patch broke event delivery in the driver as I missed
that it wasn't using the right API and forgot to update it later
in my series.
While changing this, I noticed that the genetlink code could use
the static group ID instead of a strcmp(), so also do that for
the VFS_DQUOT family.
Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>