NeilBrown [Fri, 14 Aug 2015 02:47:33 +0000 (12:47 +1000)]
md/raid5: ensure device failure recorded before write request returns.
When a write to one of the devices of a RAID5/6 fails, the failure is
recorded in the metadata of the other devices so that after a restart
the data on the failed drive wont be trusted even if that drive seems
to be working again (maybe a cable was unplugged).
Similarly when we record a bad-block in response to a write failure,
we must not let the write complete until the bad-block update is safe.
Currently there is no interlock between the write request completing
and the metadata update. So it is possible that the write will
complete, the app will confirm success in some way, and then the
machine will crash before the metadata update completes.
This is an extremely small hole for a racy to fit in, but it is
theoretically possible and so should be closed.
So:
- set MD_CHANGE_PENDING when requesting a metadata update for a
failed device, so we can know with certainty when it completes
- queue requests that completed when MD_CHANGE_PENDING is set to
only be processed after the metadata update completes
- call raid_end_bio_io() on bios in that queue when the time comes.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Fri, 14 Aug 2015 02:07:57 +0000 (12:07 +1000)]
md/raid5: use bio_list for the list of bios to return.
This will make it easier to splice two lists together which will
be needed in future patch.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Fri, 14 Aug 2015 01:26:17 +0000 (11:26 +1000)]
md/raid10: ensure device failure recorded before write request returns.
When a write to one of the legs of a RAID10 fails, the failure is
recorded in the metadata of the other legs so that after a restart
the data on the failed drive wont be trusted even if that drive seems
to be working again (maybe a cable was unplugged).
Currently there is no interlock between the write request completing
and the metadata update. So it is possible that the write will
complete, the app will confirm success in some way, and then the
machine will crash before the metadata update completes.
This is an extremely small hole for a racy to fit in, but it is
theoretically possible and so should be closed.
So:
- set MD_CHANGE_PENDING when requesting a metadata update for a
failed device, so we can know with certainty when it completes
- queue requests that experienced an error on a new queue which
is only processed after the metadata update completes
- call raid_end_bio_io() on bios in that queue when the time comes.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Fri, 14 Aug 2015 01:11:10 +0000 (11:11 +1000)]
md/raid1: ensure device failure recorded before write request returns.
When a write to one of the legs of a RAID1 fails, the failure is
recorded in the metadata of the other leg(s) so that after a restart
the data on the failed drive wont be trusted even if that drive seems
to be working again (maybe a cable was unplugged).
Similarly when we record a bad-block in response to a write failure,
we must not let the write complete until the bad-block update is safe.
Currently there is no interlock between the write request completing
and the metadata update. So it is possible that the write will
complete, the app will confirm success in some way, and then the
machine will crash before the metadata update completes.
This is an extremely small hole for a racy to fit in, but it is
theoretically possible and so should be closed.
So:
- set MD_CHANGE_PENDING when requesting a metadata update for a
failed device, so we can know with certainty when it completes
- queue requests that experienced an error on a new queue which
is only processed after the metadata update completes
- call raid_end_bio_io() on bios in that queue when the time comes.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Fri, 14 Aug 2015 00:22:00 +0000 (10:22 +1000)]
md-cluster: remove inappropriate try_module_get from join()
md_setup_cluster already calls try_module_get(), so this
try_module_get isn't needed.
Also, there is no matching module_put (except in error patch),
so this leaves an unbalanced module count.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Thu, 13 Aug 2015 02:32:55 +0000 (12:32 +1000)]
md: extend spinlock protection in register_md_cluster_operations
This code looks racy.
The only possible race is if two modules try to register at the same
time and that won't happen. But make the code look safe anyway.
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 09:01:22 +0000 (17:01 +0800)]
md-cluster: Read the disk bitmap sb and check if it needs recovery
In gather_all_resync_info, we need to read the disk bitmap sb and
check if it needs recovery.
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 09:01:21 +0000 (17:01 +0800)]
md-cluster: only call complete(&cinfo->completion) when node join cluster
Introduce MD_CLUSTER_BEGIN_JOIN_CLUSTER flag to make sure
complete(&cinfo->completion) is only be invoked when node
join cluster. Otherwise node failure could also call the
complete, and it doesn't make sense to do it.
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 09:01:20 +0000 (17:01 +0800)]
md-cluster: add missed lockres_free
We also need to free the lock resource before goto out.
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 09:01:19 +0000 (17:01 +0800)]
md-cluster: remove the unused sb_lock
The sb_lock is not used anywhere, so let's remove it.
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 09:01:18 +0000 (17:01 +0800)]
md-cluster: init suspend_list and suspend_lock early in join
If the node just join the cluster, and receive the msg from other nodes
before init suspend_list, it will cause kernel crash due to NULL pointer
dereference, so move the initializations early to fix the bug.
md-cluster: Joined cluster
3578507b-e0cb-6d4f-6322-
696cd7b1b10c slot 3
BUG: unable to handle kernel NULL pointer dereference at (null)
... ... ...
Call Trace:
[<
ffffffffa0444924>] process_recvd_msg+0x2e4/0x330 [md_cluster]
[<
ffffffffa0444a06>] recv_daemon+0x96/0x170 [md_cluster]
[<
ffffffffa045189d>] md_thread+0x11d/0x170 [md_mod]
[<
ffffffff810768c4>] kthread+0xb4/0xc0
[<
ffffffff8151927c>] ret_from_fork+0x7c/0xb0
... ... ...
RIP [<
ffffffffa0443581>] __remove_suspend_info+0x11/0xa0 [md_cluster]
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 09:01:17 +0000 (17:01 +0800)]
md-cluster: add the error check if failed to get dlm lock
In complicated cluster environment, it is possible that the
dlm lock couldn't be get/convert on purpose, the related err
info is added for better debug potential issue.
For lockres_free, if the lock is blocking by a lock request or
conversion request, then dlm_unlock just put it back to grant
queue, so need to ensure the lock is free finally.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 09:01:16 +0000 (17:01 +0800)]
md-cluster: init completion within lockres_init
We should init completion within lockres_init, otherwise
completion could be initialized more than one time during
it's life cycle.
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 09:01:15 +0000 (17:01 +0800)]
md-cluster: fix deadlock issue on message lock
There is problem with previous communication mechanism, and we got below
deadlock scenario with cluster which has 3 nodes.
Sender Receiver Receiver
token(EX)
message(EX)
writes message
downconverts message(CR)
requests ack(EX)
get message(CR) gets message(CR)
reads message reads message
requests EX on message requests EX on message
To fix this problem, we do the following changes:
1. the sender downconverts MESSAGE to CW rather than CR.
2. and the receiver request PR lock not EX lock on message.
And in case we failed to down-convert EX to CW on message, it is better to
unlock message otherthan still hold the lock.
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Lidong Zhong <ldzhong@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 08:54:04 +0000 (16:54 +0800)]
md-cluster: transfer the resync ownership to another node
When node A stops an array while the array is doing a resync, we need
to let another node B take over the resync task.
To achieve the goal, we need the A send an explicit BITMAP_NEEDS_SYNC
message to the cluster. And the node B which received that message will
invoke __recover_slot to do resync.
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 08:54:03 +0000 (16:54 +0800)]
md-cluster: split recover_slot for future code reuse
Make recover_slot as a wraper to __recover_slot, since the
logic of __recover_slot can be reused for the condition
when other nodes need to take over the resync job.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Guoqing Jiang [Fri, 10 Jul 2015 08:54:02 +0000 (16:54 +0800)]
md-cluster: use %pU to print UUIDs
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Sasha Levin [Fri, 24 Jul 2015 22:19:58 +0000 (18:19 -0400)]
md: setup safemode_timer before it's being used
We used to set up the safemode_timer timer in md_run. If md_run
would fail before the timer was set up we'd end up trying to modify
a timer that doesn't have a callback function when we access safe_delay_store,
which would trigger a BUG.
neilb: delete init_timer() call as setup_timer() does that.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Fri, 24 Jul 2015 03:30:32 +0000 (13:30 +1000)]
md/raid5: handle possible race as reshape completes.
It is possible (though unlikely) for a reshape to be
interrupted between the time that end_reshape is called
and the time when raid5_finish_reshape is called.
This can leave conf->reshape_progress set to MaxSector,
but mddev->reshape_position not.
This combination confused reshape_request() when ->reshape_backwards.
As conf->reshape_progress is so high, it seems the reshape hasn't
really begun. But assuming MaxSector is a valid address only
leads to sorrow.
So ensure reshape_position and reshape_progress both agree,
and add an extra check in reshape_request() just in case they don't.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Fri, 24 Jul 2015 03:27:08 +0000 (13:27 +1000)]
md: sync sync_completed has correct value as recovery finishes.
There can be a small window between the moment that recovery
actually writes the last block and the time when various sysfs
and /proc/mdstat attributes report that it has finished.
During this time, 'sync_completed' can have the wrong value.
This can confuse monitoring software.
So:
- don't set curr_resync_completed beyond the end of the devices,
- set it correctly when resync/recovery has completed.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Fri, 17 Jul 2015 02:06:02 +0000 (12:06 +1000)]
md: be careful when testing resync_max against curr_resync_completed.
While it generally shouldn't happen, it is not impossible for
curr_resync_completed to exceed resync_max.
This can particularly happen when reshaping RAID5 - the current
status isn't copied to curr_resync_completed promptly, so when it
is, it can exceed resync_max.
This happens when the reshape is 'frozen', resync_max is set low,
and reshape is re-enabled.
Taking a difference between two unsigned numbers is always dangerous
anyway, so add a test to behave correctly if
curr_resync_completed > resync_max
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Fri, 17 Jul 2015 01:57:30 +0000 (11:57 +1000)]
md: set MD_RECOVERY_RECOVER when starting a degraded array.
This ensures that 'sync_action' will show 'recover' immediately the
array is started. If there is no spare the status will change to
'idle' once that is detected.
Clear MD_RECOVERY_RECOVER for a read-only array to ensure this change
happens.
This allows scripts which monitor status not to get confused -
particularly my test scripts.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Wed, 15 Jul 2015 07:54:15 +0000 (17:54 +1000)]
md/raid5: remove incorrect "min_t()" when calculating writepos.
This code is calculating:
writepos, which is the furthest along address (device-space) that we
*will* be writing to
readpos, which is the earliest address that we *could* possible read
from, and
safepos, which is the earliest address in the 'old' section that we
might read from after a crash when the reshape position is
recovered from metadata.
The first is a precise calculation, so clipping at zero doesn't
make sense. As the reshape position is now guaranteed to always be
a multiple of reshape_sectors and as we already BUG_ON when
reshape_progress is zero, there is no point in this min_t() call.
The readpos and safepos are worst case - actual value depends on
precise geometry. That worst case could be negative, which is only
a problem because we are storing the value in an unsigned.
So leave the min_t() for those.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Wed, 15 Jul 2015 07:36:21 +0000 (17:36 +1000)]
md/raid5: strengthen check on reshape_position at run.
When reshaping, we work in units of the largest chunk size.
If changing from a larger to a smaller chunk size, that means we
reshape more than one stripe at a time. So the required alignment
of reshape_position needs to take into account both the old
and new chunk size.
This means that both 'here_new' and 'here_old' are calculated with
respect to the same (maximum) chunk size, so testing if they are the
same when delta_disks is zero becomes pointless.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Wed, 15 Jul 2015 07:24:17 +0000 (17:24 +1000)]
md/raid5: switch to use conf->chunk_sectors in place of mddev->chunk_sectors where possible
The chunk_sectors and new_chunk_sectors fields of mddev can be changed
any time (via sysfs) that the reconfig mutex can be taken. So raid5
keeps internal copies in 'conf' which are stable except for a short
locked moment when reshape stops/starts.
So any access that does not hold reconfig_mutex should use the 'conf'
values, not the 'mddev' values.
Several don't.
This could result in corruption if new values were written at awkward
times.
Also use min() or max() rather than open-coding.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Fri, 17 Jul 2015 02:17:50 +0000 (12:17 +1000)]
md/raid5: always set conf->prev_chunk_sectors and ->prev_algo
These aren't really needed when no reshape is happening,
but it is safer to have them always set to a meaningful value.
The next patch will use ->prev_chunk_sectors without checking
if a reshape is happening (because that makes the code simpler),
and this patch makes that safe.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Mon, 6 Jul 2015 06:33:47 +0000 (16:33 +1000)]
md/raid10: fix a few typos in comments
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Mon, 6 Jul 2015 02:28:45 +0000 (12:28 +1000)]
md/raid5: consider updating reshape_position at start of reshape.
md/raid5 only updates ->reshape_position (which is stored in
metadata and is authoritative) occasionally, but particularly
when getting closed to ->resync_max as it must be correct
when ->resync_max is reached.
When mdadm tries to stop an array which is reshaping it will:
- freeze the reshape,
- set resync_max to where the reshape has reached.
- unfreeze the reshape.
When this happens, the reshape is aborted and then restarted.
The restart doesn't check that resync_max is close, and so doesn't
update ->reshape_position like it should.
This results in the reshape stopping, but ->reshape_position being
incorrect.
So on that first call to reshape_request, make sure ->reshape_position
is updated if needed.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Mon, 6 Jul 2015 02:26:57 +0000 (12:26 +1000)]
md: close some races between setting and checking sync_action.
When checking sync_action in a script, we want to be sure it is
as accurate as possible.
As resync/reshape etc doesn't always start immediately (a separate
thread is scheduled to do it), it is best if 'action_show'
checks if MD_RECOVER_NEEDED is set (which it does) and in that
case reports what is likely to start soon (which it only sometimes
does).
So:
- report 'reshape' if reshape_position suggests one might start.
- set MD_RECOVERY_RECOVER in raid1_reshape(), because that is very
likely to happen next.
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Thu, 2 Jul 2015 07:12:58 +0000 (17:12 +1000)]
md: Keep /proc/mdstat reporting recovery until fully DONE.
Currently when a recovery completes, mdstat shows that it has finished
before the new device is marked as a full member. Because of this it
can appear to a script that the recovery finished but the array isn't
in sync.
So while MD_RECOVERY_DONE is still set, keep mdstat reporting "recovery".
Once md_reap_sync_thread() completes, the spare will be active and then
MD_RECOVERY_DONE will be cleared.
To ensure this is race-free, set MD_RECOVERY_DONE before clearning
curr_resync.
Signed-off-by: NeilBrown <neilb@suse.com>
Ard Biesheuvel [Wed, 1 Jul 2015 02:19:56 +0000 (12:19 +1000)]
md/raid6: delta syndrome for ARM NEON
This implements XOR syndrome calculation using NEON intrinsics.
As before, the module can be built for ARM and arm64 from the
same source.
Relative performance on a Cortex-A57 based system:
raid6: int64x1 gen() 905 MB/s
raid6: int64x1 xor() 881 MB/s
raid6: int64x2 gen() 1343 MB/s
raid6: int64x2 xor() 1286 MB/s
raid6: int64x4 gen() 1896 MB/s
raid6: int64x4 xor() 1321 MB/s
raid6: int64x8 gen() 1773 MB/s
raid6: int64x8 xor() 1165 MB/s
raid6: neonx1 gen() 1834 MB/s
raid6: neonx1 xor() 1278 MB/s
raid6: neonx2 gen() 2528 MB/s
raid6: neonx2 xor() 1942 MB/s
raid6: neonx4 gen() 2888 MB/s
raid6: neonx4 xor() 2334 MB/s
raid6: neonx8 gen() 2957 MB/s
raid6: neonx8 xor() 2232 MB/s
raid6: using algorithm neonx8 gen() 2957 MB/s
raid6: .... xor() 2232 MB/s, rmw enabled
Cc: Markus Stockhausen <stockhausen@collogia.de>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Mon, 3 Aug 2015 03:11:47 +0000 (13:11 +1000)]
md/raid0: update queue parameter in a safer location.
When a (e.g.) RAID5 array is reshaped to RAID0, the updating
of queue parameters (e.g. max number of sectors per bio) is
done in the wrong place.
It should be part of ->run, but it is actually part of ->takeover.
This means it happens before level_store() calls:
blk_set_stacking_limits(&mddev->queue->limits);
and so it ineffective. This can lead to errors from underlying
devices.
So move all the relevant settings out of create_stripe_zones()
and into raid0_run().
As this can lead to a bug-on it is suitable for any -stable
kernel which supports reshape to RAID0. So 2.6.35 or later.
As the bug has been present for five years there is no urgency,
so no need to rush into -stable.
Fixes:
9af204cf720c ("md: Add support for Raid5->Raid0 and Raid10->Raid0 takeover")
Cc: stable@vger.kernel.org (v2.6.35+ - please delay until after -final release).
Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Benjamin Randazzo [Sat, 25 Jul 2015 14:36:50 +0000 (16:36 +0200)]
md: simplify get_bitmap_file now that "file" is zeroed.
There is no point assigning '\0' to file->pathname[0] as
file is now zeroed out, so remove that branch and
simplify the code.
[Original patch combined this with the change to use
kzalloc. I split the two so that the change to kzalloc
is easier to backport. - neilb]
Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Mon, 3 Aug 2015 07:09:57 +0000 (17:09 +1000)]
md/raid5: don't let shrink_slab shrink too far.
I have a report of drop_one_stripe() called from
raid5_cache_scan() apparently finding ->max_nr_stripes == 0.
This should not be allowed.
So add a test to keep max_nr_stripes above min_nr_stripes.
Also use a 'mask' rather than a 'mod' in drop_one_stripe
to ensure 'hash' is valid even if max_nr_stripes does reach zero.
Fixes:
edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.")
Cc: stable@vger.kernel.org (4.1 - please release with 2d5b569b665)
Reported-by: Tomas Papan <tomas.papan@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Benjamin Randazzo [Sat, 25 Jul 2015 14:36:50 +0000 (16:36 +0200)]
md: use kzalloc() when bitmap is disabled
In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
mdu_bitmap_file_t called "file".
5769 file = kmalloc(sizeof(*file), GFP_NOIO);
5770 if (!file)
5771 return -ENOMEM;
This structure is copied to user space at the end of the function.
5786 if (err == 0 &&
5787 copy_to_user(arg, file, sizeof(*file)))
5788 err = -EFAULT
But if bitmap is disabled only the first byte of "file" is initialized
with zero, so it's possible to read some bytes (up to 4095) of kernel
space memory from user space. This is an information leak.
5775 /* bitmap disabled, zero the first byte and copy out */
5776 if (!mddev->bitmap_info.file)
5777 file->pathname[0] = '\0';
Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown [Mon, 27 Jul 2015 01:48:52 +0000 (11:48 +1000)]
md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies
raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.
This should probably be part of
Commit:
34cab6f42003 ("md/raid1: fix test for 'was read error from
last working device'.")
as it addresses the same issue. It fixes the same bug and should go
to -stable for same reasons.
Fixes:
76073054c95b ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NeilBrown <neilb@suse.com>
Linus Torvalds [Mon, 3 Aug 2015 01:34:55 +0000 (18:34 -0700)]
Linux 4.2-rc5
Linus Torvalds [Mon, 3 Aug 2015 01:07:36 +0000 (18:07 -0700)]
Merge tag 'powerpc-4.2-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- TCE table memory calculation fix from Alexey
- Build fix for ans-lcd from Luis
- Unbalanced IRQ warning fix from Alistair
* tag 'powerpc-4.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/eeh-powernv: Fix unbalanced IRQ warning
macintosh/ans-lcd: fix build failure after module_init/exit relocation
powerpc/powernv/ioda2: Fix calculation for memory allocated for TCE table
Linus Torvalds [Thu, 30 Jul 2015 05:18:16 +0000 (22:18 -0700)]
i915: temporary fix for DP MST docking station NULL pointer dereference
Ted Ts'o reports that his Lenovo T540p ThinkPad crashes at boot if
attached to the docking station. This is a regression that he was able
to bisect to commit
8c7b5ccb7298: "drm/i915: Use atomic helpers for
computing changed flags:"
The reason seems to be the new call to drm_atomic_helper_check_modeset()
added to intel_modeset_compute_config(), which in turn calls
update_connector_routing(), and somehow ends up picking a NULL crtc for
the connector state, causing the subsequent drm_crtc_index() to OOPS.
Daniel Vetter says that the fundamental issue seems to be confusion in
the encoder selection, and this isn't the right fix, but while he chases
down the proper fix, this at least avoids the NULL pointer dereference
and makes Ted's docking station work again.
Reported-bisected-and-tested-by: Theodore Ts'o <tytso@mit.edu>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Mani Nikula <jani.nikula@linux.intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 2 Aug 2015 16:36:21 +0000 (09:36 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A set of three fixes for the ipr driver and one fairly major one for
memory leaks in the mq path of SCSI"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fix memory leak with scsi-mq
ipr: Fix invalid array indexing for HRRQ
ipr: Fix incorrect trace indexing
ipr: Fix locking for unit attention handling
Linus Torvalds [Sun, 2 Aug 2015 16:12:46 +0000 (09:12 -0700)]
Merge tag 'armsoc-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Things are calming down nicely here w.r.t. fixes. This batch
includes two week's worth since I missed to send before -rc4.
Nothing particularly scary to point out, smaller fixes here and there.
Shortlog describes it pretty well"
* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: keystone: fix dt bindings to use post div register for mainpll
ARM: nomadik: disable UART0 on Nomadik boards
ARM: dts: i.MX35: Fix can support.
ARM: OMAP2+: hwmod: Fix _wait_target_ready() for hwmods without sysc
ARM: dts: add CPU OPP and regulator supply property for exynos4210
ARM: dts: Update video-phy node with syscon phandle for exynos3250
ARM: DRA7: hwmod: fix gpmc hwmod
Linus Torvalds [Sun, 2 Aug 2015 00:42:14 +0000 (17:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull VFS fix from Al Viro:
"Spurious ENOTDIR fix"
This should fix the problems reported by Dominique Martinet and Hugh
Dickins.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
link_path_walk(): be careful when failing with ENOTDIR
Al Viro [Sat, 1 Aug 2015 23:59:28 +0000 (19:59 -0400)]
link_path_walk(): be careful when failing with ENOTDIR
In RCU mode we might end up with dentry evicted just we check
that it's a directory. In such case we should return ECHILD
rather than ENOTDIR, so that pathwalk would be retries in non-RCU
mode.
Breakage had been introduced in commit
b18825a - prior to that
we were looking at nd->inode, which had been fetched before
verifying that ->d_seq was still valid. That form of check
would only be satisfied if at some point the pathname prefix
would indeed have resolved to a non-directory. The fix consists
of checking ->d_seq after we'd run into a non-directory dentry,
and failing with ECHILD in case of mismatch.
Note that all branches since 3.12 have that problem...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 1 Aug 2015 19:47:04 +0000 (12:47 -0700)]
Merge tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"We had a regression due to reuse of descriptor so we have reverted
that.
The rest are driver fixes:
- at_hdmac and at_xdmac for residue, trannfer width, and channel config
- pl330 final fix for dma fails and overflow issue
- xgene resouce map fix
- mv_xor big endian op fix"
* tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
Revert "dmaengine: virt-dma: don't always free descriptor upon completion"
dmaengine: mv_xor: fix big endian operation in register mode
dmaengine: xgene-dma: Fix the resource map to handle overlapping
dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg()
dmaengine: at_hdmac: fix residue computation
dmaengine: at_xdmac: fix bug about channel configuration
dmaengine: pl330: Really fix choppy sound because of wrong residue calculation
dmaengine: pl330: Fix overflow when reporting residue in memcpy
Linus Torvalds [Sat, 1 Aug 2015 16:47:11 +0000 (09:47 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixlets from Thomas Gleixner:
"Just two updates to the maintainers file"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Appoint Jiang and Marc as irqdomain maintainers
MAINTAINERS: Appoint Marc Zyngier as irqchips co-maintainer
Linus Torvalds [Sat, 1 Aug 2015 16:16:33 +0000 (09:16 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fallout from the recent NMI fixes: make x86 LDT handling more robust.
Also some EFI fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ldt: Make modify_ldt synchronous
x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()
efi: Check for NULL efi kernel parameters
x86/efi: Use all 64 bit of efi_memmap in setup_e820()
Linus Torvalds [Sat, 1 Aug 2015 00:10:56 +0000 (17:10 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Must teardown SR-IOV before unregistering netdev in igb driver, from
Alex Williamson.
2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell.
3) Default route selection in ipv4 should take the prefix length, table
ID, and TOS into account, from Julian Anastasov.
4) sch_plug must have a reset method in order to purge all buffered
packets when the qdisc is reset, likewise for sch_choke, from WANG
Cong.
5) Fix deadlock and races in slave_changelink/br_setport in bridging.
From Nikolay Aleksandrov.
6) mlx4 bug fixes (wrong index in port even propagation to VFs,
overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack
Morgenstein, and Or Gerlitz.
7) Turn off klog message about SCTP userspace interface compat that
makes no sense at all, from Daniel Borkmann.
8) Fix unbounded restarts of inet frag eviction process, causing NMI
watchdog soft lockup messages, from Florian Westphal.
9) Suspend/resume fixes for r8152 from Hayes Wang.
10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from
Sabrina Dubroca.
11) Fix performance regression when removing a lot of routes from the
ipv4 routing tables, from Alexander Duyck.
12) Fix device leak in AF_PACKET, from Lars Westerhoff.
13) AF_PACKET also has a header length comparison bug due to signedness,
from Alexander Drozdov.
14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann.
15) Memory leaks, TSO stats, watchdog timeout and other fixes to
thunderx driver from Sunil Goutham and Thanneeru Srinivasulu.
16) act_bpf can leak memory when replacing programs, from Daniel
Borkmann.
17) WOL packet fixes in gianfar driver, from Claudiu Manoil.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
stmmac: fix missing MODULE_LICENSE in stmmac_platform
gianfar: Enable device wakeup when appropriate
gianfar: Fix suspend/resume for wol magic packet
gianfar: Fix warning when CONFIG_PM off
act_pedit: check binding before calling tcf_hash_release()
net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
net: sched: fix refcount imbalance in actions
r8152: reset device when tx timeout
r8152: add pre_reset and post_reset
qlcnic: Fix corruption while copying
act_bpf: fix memory leaks when replacing bpf programs
net: thunderx: Fix for crash while BGX teardown
net: thunderx: Add PCI driver shutdown routine
net: thunderx: Fix crash when changing rss with mutliple traffic flows
net: thunderx: Set watchdog timeout value
net: thunderx: Wakeup TXQ only if CQE_TX are processed
net: thunderx: Suppress alloc_pages() failure warnings
net: thunderx: Fix TSO packet statistic
net: thunderx: Fix memory leak when changing queue count
net: thunderx: Fix RQ_DROP miscalculation
...
Linus Torvalds [Sat, 1 Aug 2015 00:05:37 +0000 (17:05 -0700)]
Merge branch 'for-linus-4.2' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Filipe fixed up a hard to trigger ENOSPC regression from our merge
window pull, and we have a few other smaller fixes"
* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix quick exhaustion of the system array in the superblock
btrfs: its btrfs_err() instead of btrfs_error()
btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail
btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()
Linus Torvalds [Sat, 1 Aug 2015 00:00:25 +0000 (17:00 -0700)]
Merge tag 'sound-4.2-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became a relative big update as it includes the collected ASoC
fixes. There are a few fixes in ASoC core side, mostly for DAPM and
the new topology API. The rest are various ASoC driver-specific
fixes, as well as the usual HD-audio and USB-audio quirks"
* tag 'sound-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
ALSA: hda - Fix MacBook Pro 5,2 quirk
ALSA: hda - Fix race between PM ops and HDA init/probe
ALSA: usb-audio: add dB range mapping for some devices
ALSA: hda - Apply a fixup to Dell Vostro 5480
ALSA: hda - Add pin quirk for the headset mic jack detection on Dell laptop
ALSA: hda - Apply fixup for another Toshiba Satellite S50D
ALSA: fireworks: add support for AudioFire2 quirk
ALSA: hda - Fix the headset mic that will not work on Dell desktop machine
ALSA: hda - fix cs4210_spdif_automute()
ASoC: pcm1681: Fix setting de-emphasis sampling rate selection
ASoC: ssm4567: Keep TDM_BCLKS in ssm4567_set_dai_fmt
ASoC: sgtl5000: Fix up define for SGTL5000_SMALL_POP
ASoC: dapm: Don't add prefix to widget stream name
ASoC: rt5645: Check if codec is initialized in workqueue handler
ASoC: Intel: Get correct usage_count value to load firmware
ASoC: topology: Fix to add dapm mixer info
ASoC: zx: spdif: Fix devm_ioremap_resource return value check
ASoC: zx: i2s: Fix devm_ioremap_resource return value check
ASoC: mediatek: Use platform_of_node for machine drivers
ASoC: Free card DAPM context on snd_soc_instantiate_card() error path
...
Joachim Eastwood [Fri, 31 Jul 2015 17:13:22 +0000 (19:13 +0200)]
stmmac: fix missing MODULE_LICENSE in stmmac_platform
Commit
50649ab14982 ("stmmac: drop driver from stmmac platform code")
was a bit overzealous in removing code and dropped the MODULE_*
macro's that are still needed since stmmac_platform can be a module.
Fix this by putting the macro's remvoed in
50649ab14982 back.
This fixes the following errors when used as a module:
stmmac_platform: module license 'unspecified' taints kernel.
Disabling lock debugging due to kernel taint
stmmac_platform: Unknown symbol devm_kmalloc (err 0)
stmmac_platform: Unknown symbol stmmac_suspend (err 0)
stmmac_platform: Unknown symbol platform_get_irq_byname (err 0)
stmmac_platform: Unknown symbol stmmac_dvr_remove (err 0)
stmmac_platform: Unknown symbol platform_get_resource (err 0)
stmmac_platform: Unknown symbol of_get_phy_mode (err 0)
stmmac_platform: Unknown symbol of_property_read_u32_array (err 0)
stmmac_platform: Unknown symbol of_alias_get_id (err 0)
stmmac_platform: Unknown symbol stmmac_resume (err 0)
stmmac_platform: Unknown symbol stmmac_dvr_probe (err 0)
Fixes:
50649ab14982 ("stmmac: drop driver from stmmac platform code")
Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 31 Jul 2015 22:41:50 +0000 (15:41 -0700)]
Merge branch 'gianfar-wol-fixes'
Claudiu Manoil says:
====================
gianfar: wol magic packet fixes
These changes were already validated as part of FSL SDK.
Patch 2 fixes occasional wake-on magic packet failures during
traffic, probably due to incorrect traffic stop/ device halt
sequence and incorrect usage of txlock.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 31 Jul 2015 15:38:33 +0000 (18:38 +0300)]
gianfar: Enable device wakeup when appropriate
The wol_en flag is 0 by default anyway, and we have the
following inconsistency: a MAGIC packet wol capable eth
interface is registered as a wake-up source but unable
to wake-up the system as wol_en is 0 (wake-on flag set to 'd').
Calling set_wakeup_enable() at netdev open is just redundant
because wol_en is 0 by default.
Let only ethtool call set_wakeup_enable() for now.
The bflock is obviously obsoleted, its utility has been corroded
over time. The bitfield flags used today in gianfar are accessed
only on the init/ config path, with no real possibility of
concurrency - nothing that would justify smth. like bflock.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 31 Jul 2015 15:38:32 +0000 (18:38 +0300)]
gianfar: Fix suspend/resume for wol magic packet
If we disable NAPI in the first place we can mask the device's
interrupts (and halt it) without fearing that imask may be
concurrently accessed from interrupt context, so there's
no need to do local_irq_save() around gfar_halt_nodisable().
lock_rx_qs()/unlock_tx_qs() are just obsoleted and potentially
buggy routines. The txlock is currently used in the driver only
to manage TX congestion, it has nothing to do with halting the
device. With these changes, the TX processing is stopped before
gfar_halt().
Compact gfar_halt() is used instead of gfar_halt_nodisable(),
as it disables Rx/TX DMA h/w blocks and the Rx/TX h/w queues.
gfar_start() re-enables all these blocks on resume. Enabling
the magic-packet mode remains the same, note that the RX block
is re-enabled just before entering sleep mode.
Add IRQF_NO_SUSPEND flag for the error interrupt line, to signal
that the interrupt line must remain active during sleep in order
to wake the system by magic packet (MAG) reception interrupt.
(On some systems the MAG interrupt did trigger w/o this flag
as well, but on others it didn't.)
Without these fixes, when suspended during fair Tx traffic the
interface occasionally failed to be woken up by magic packet.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 31 Jul 2015 15:38:31 +0000 (18:38 +0300)]
gianfar: Fix warning when CONFIG_PM off
CC drivers/net/ethernet/freescale/gianfar.o
drivers/net/ethernet/freescale/gianfar.c:568:13: warning: 'lock_tx_qs'
defined but not used [-Wunused-function]
static void lock_tx_qs(struct gfar_private *priv)
^
drivers/net/ethernet/freescale/gianfar.c:576:13: warning: 'unlock_tx_qs'
defined but not used [-Wunused-function]
static void unlock_tx_qs(struct gfar_private *priv)
^
Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 31 Jul 2015 00:12:21 +0000 (17:12 -0700)]
act_pedit: check binding before calling tcf_hash_release()
When we share an action within a filter, the bind refcnt
should increase, therefore we should not call tcf_hash_release().
Fixes:
1a29321ed045 ("net_sched: act: Dont increment refcnt on replace")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Murali Karicheri [Fri, 29 May 2015 16:04:13 +0000 (12:04 -0400)]
ARM: dts: keystone: fix dt bindings to use post div register for mainpll
All of the keystone devices have a separate register to hold post
divider value for main pll clock. Currently the fixed-postdiv
value used for k2hk/l/e SoCs works by sheer luck as u-boot happens to
use a value of 2 for this. Now that we have fixed this in the pll
clock driver change the dt bindings for the same.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Torvalds [Fri, 31 Jul 2015 19:34:10 +0000 (12:34 -0700)]
Merge tag 'iommu-fixes-v4.2-rc4' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"These fixes are all for the AMD IOMMU driver:
- A regression with HSA caused by the conversion of the driver to
default domains. The fixes make sure that an HSA device can still
be attached to an IOMMUv2 domain and that these domains also allow
non-IOMMUv2 capable devices.
- Fix iommu=pt mode which did not work because the dma_ops where set
to nommu_ops, which breaks devices that can only do 32bit DMA.
- Fix an issue with non-PCI devices not working, because there are no
dma_ops for them. This issue was discovered recently as new AMD
x86 platforms have non-PCI devices too"
* tag 'iommu-fixes-v4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Allow non-ATS devices in IOMMUv2 domains
iommu/amd: Set global dma_ops if swiotlb is disabled
iommu/amd: Use swiotlb in passthrough mode
iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains
iommu/amd: Use iommu core for passthrough mode
iommu/amd: Use iommu_attach_group()
Linus Torvalds [Fri, 31 Jul 2015 19:11:01 +0000 (12:11 -0700)]
Merge tag 'drm-intel-fixes-2015-07-31' of git://anongit.freedesktop.org/drm-intel
Pull drm intel fixes from Daniel Vetter:
"I delayed my -fixes pull a bit hoping that I could include a fix for
the dp mst stuff but looks a bit more nasty than that. So just 3
other regression fixes, one 4.2 other two cc: stable"
* tag 'drm-intel-fixes-2015-07-31' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Declare the swizzling unknown for L-shaped configurations
drm/i915: Mark PIN_USER binding as GLOBAL_BIND without the aliasing ppgtt
drm/i915: Replace WARN inside I915_READ64_2x32 with retry loop
Linus Torvalds [Fri, 31 Jul 2015 19:05:02 +0000 (12:05 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This has a bunch of nouveau fixes, as Ben has been hibernating and has
lots of small fixes for lots of bugs across nouveau.
Radeon has one major fix for hdmi/dp audio regression that is larger
than Alex would like, but seems to fix up a fair few bugs, along with
some misc fixes.
And a few msm fixes, one of which is also a bit large.
But nothing in here seems insane or crazy for this stage, just more
than I'd like"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (33 commits)
drm/msm/mdp5: release SMB (shared memory blocks) in various cases
drm/msm: change to uninterruptible wait in atomic commit
drm/msm: mdp4: Fix drm_framebuffer dereference crash
drm/msm: fix msm_gem_prime_get_sg_table()
drm/amdgpu: add new parameter to seperate map and unmap
drm/amdgpu: hdp_flush is not needed for inside IB
drm/amdgpu: different emit_ib for gfx and compute
drm/amdgpu: information leak in amdgpu_info_ioctl()
drm/amdgpu: clean up init sequence for failures
drm/radeon/combios: add some validation of lvds values
drm/radeon: rework audio modeset to handle non-audio hdmi features
drm/radeon: rework audio detect (v4)
drm/amdgpu: Drop drm/ prefix for including drm.h in amdgpu_drm.h
drm/radeon: Drop drm/ prefix for including drm.h in radeon_drm.h
drm/nouveau/nouveau/ttm: fix tiled system memory with Maxwell
drm/nouveau/kms/nv50-: guard against enabling cursor on disabled heads
drm/nouveau/fbcon/g80: reduce PUSH_SPACE alloc, fire ring on accel init
drm/nouveau/fbcon/gf100-: reduce RING_SPACE allocation
drm/nouveau/fbcon/nv11-: correctly account for ring space usage
drm/nouveau/bios: add proper support for opcode 0x59
...
Jun Nie [Fri, 10 Jul 2015 12:02:49 +0000 (20:02 +0800)]
Revert "dmaengine: virt-dma: don't always free descriptor upon completion"
This reverts commit
b9855f03d560d351e95301b9de0bc3cad3b31fe9.
The patch break existing DMA usage case. For example, audio SOC
dmaengine never release channel and cause virt-dma to cache too
much memory in descriptor to exhaust system memory.
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Thomas Petazzoni [Wed, 8 Jul 2015 14:28:14 +0000 (16:28 +0200)]
dmaengine: mv_xor: fix big endian operation in register mode
Commit
6f166312c6ea2 ("dmaengine: mv_xor: add support for a38x command
in descriptor mode") introduced the support for a feature that
appeared in Armada 38x: specifying the operation to be performed in a
per-descriptor basis rather than globally per channel.
However, when doing so, it changed the function mv_chan_set_mode() to
use:
if (IS_ENABLED(__BIG_ENDIAN))
instead of:
#if defined(__BIG_ENDIAN)
While IS_ENABLED() is perfectly fine for CONFIG_* symbols, it is not
for other symbols such as __BIG_ENDIAN that is provided directly by
the compiler. Consequently, the commit broke support for big-endian,
as the XOR_DESCRIPTOR_SWAP flag was not set in the XOR channel
configuration register.
The primarily visible effect was some nasty warnings and failures
appearing during the self-test of the XOR unit:
[ 1.197368] mv_xor
d0060900.xor: error on chan 0. intr cause 0x00000082
[ 1.197393] mv_xor
d0060900.xor: config 0x00008440
[ 1.197410] mv_xor
d0060900.xor: activation 0x00000000
[ 1.197427] mv_xor
d0060900.xor: intr cause 0x00000082
[ 1.197443] mv_xor
d0060900.xor: intr mask 0x000003f7
[ 1.197460] mv_xor
d0060900.xor: error cause 0x00000000
[ 1.197477] mv_xor
d0060900.xor: error addr 0x00000000
[ 1.197491] ------------[ cut here ]------------
[ 1.197513] WARNING: CPU: 0 PID: 1 at ../drivers/dma/mv_xor.c:664 mv_xor_interrupt_handler+0x14c/0x170()
See also:
http://storage.kernelci.org/next/next-
20150617/arm-mvebu_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y/lab-khilman/boot-armada-xp-openblocks-ax3-4.txt
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Fixes:
6f166312c6ea2 ("dmaengine: mv_xor: add support for a38x command in descriptor mode")
Reviewed-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Rameshwar Prasad Sahu [Tue, 7 Jul 2015 10:04:25 +0000 (15:34 +0530)]
dmaengine: xgene-dma: Fix the resource map to handle overlapping
There is an overlap in dma ring cmd csr region due to sharing of ethernet
ring cmd csr region. This patch fix the resource overlapping by mapping
the entire dma ring cmd csr region.
Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Cyrille Pitchen [Tue, 30 Jun 2015 12:36:57 +0000 (14:36 +0200)]
dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg()
This patch adds the missing update of the transfer data width in
at_xdmac_prep_slave_sg().
Indeed, for each item in the scatter-gather list, we check whether the
transfer length is aligned with the data width provided by
dmaengine_slave_config(). If so, we directly use this data width for the
current part of the transfer we are preparing. Otherwise, the data width
is reduced to 8 bits (1 byte). Of course, the actual number of register
accesses must also be updated to match the new data width.
So one chunk was missing in the original patch (see Fixes tag below): the
number of register accesses was correctly set to (len >> fixed_dwidth) in
mbr_ubc but the real data width was not updated in mbr_cfg. Since mbr_cfg
may change for each part of the scatter-gather transfer this also explains
why the original patch used the Descriptor View 2 instead of the
Descriptor View 1.
Let's take the example of a DMA transfer to write 8bit data into an Atmel
USART with FIFOs. When FIFOs are enabled in the USART, its Transmit
Holding Register (THR) works in multidata mode, that is to say that up to
4 8bit data can be written into the THR in a single 32bit access and it is
still possible to write only one data with a 8bit access. To take
advantage of this new feature, the DMA driver was modified to allow
multiple dwidths when doing slave transfers.
For instance, when the total length is 22 bytes, the USART driver splits
the transfer into 2 parts:
First part: 20 bytes transferred through 5 32bit writes into THR
Second part: 2 bytes transferred though 2 8bit writes into THR
For the second part, the data width was first set to 4_BYTES by the USART
driver thanks to dmaengine_slave_config() then at_xdmac_prep_slave_sg()
reduces this data width to 1_BYTE because the 2 byte length is not aligned
with the original 4_BYTES data width. Since the data width is modified,
the actual number of writes into THR must be set accordingly.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Fixes:
6d3a7d9e3ada ("dmaengine: at_xdmac: allow muliple dwidths when doing slave transfers")
Cc: stable@vger.kernel.org #4.0 and later
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Cyrille Pitchen [Thu, 18 Jun 2015 11:25:41 +0000 (13:25 +0200)]
dmaengine: at_hdmac: fix residue computation
As claimed by the programmer datasheet and confirmed by the IP designer,
the Block Transfer Size (BTSIZE) bitfield of the Channel x Control A
Register (CTRLAx) always refers to a number of Source Width (SRC_WIDTH)
transfers.
Both the SRC_WIDTH and BTSIZE bitfields can be extacted from the CTRLAx
register to compute the DMA residue. So the 'tx_width' field is useless
and can be removed from the struct at_desc.
Before this patch, atc_prep_slave_sg() was not consistent: BTSIZE was
correctly initialized according to the SRC_WIDTH but 'tx_width' was always
set to reg_width, which was incorrect for MEM_TO_DEV transfers. It led to
bad DMA residue when 'tx_width' != SRC_WIDTH.
Also the 'tx_width' field was mostly set only in the first and last
descriptors. Depending on the kind of DMA transfer, this field remained
uninitialized for intermediate descriptors. The accurate DMA residue was
computed only when the currently processed descriptor was the first or the
last of the chain. This algorithm was a little bit odd. An accurate DMA
residue can always be computed using the SRC_WIDTH and BTSIZE bitfields
in the CTRLAx register.
Finally, the test to check whether the currently processed descriptor is
the last of the chain was wrong: for cyclic transfer, last_desc->lli.dscr
is NOT equal to zero, since set_desc_eol() is never called, but logically
equal to first_desc->txd.phys. This bug has a side effect on the
drivers/tty/serial/atmel_serial.c driver, which uses cyclic DMA transfer
to receive data. Since the DMA residue was wrong each time the DMA
transfer reaches the second (and last) period of the transfer, no more
data were received by the USART driver till the cyclic DMA transfer loops
back to the first period.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Acked-by: Torsten Fleischer <torfl6749@gmail.com>
Tested-by: Jirà Prchal <jiri.prchal@aksignal.cz>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Ludovic Desroches [Wed, 17 Jun 2015 14:22:26 +0000 (16:22 +0200)]
dmaengine: at_xdmac: fix bug about channel configuration
When using descriptor view 2 or higher, we don't write the configuration
into AT_XDMAC_CC register because this configuration will be fetch from
the descriptor. Unfortunately, the PROT bit is not updated with this
method, we have to do it manually before enabling the channel.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Joerg Roedel [Thu, 30 Jul 2015 09:24:45 +0000 (11:24 +0200)]
iommu/amd: Allow non-ATS devices in IOMMUv2 domains
With the grouping of multi-function devices a non-ATS
capable device might also end up in the same domain as an
IOMMUv2 capable device.
So handle this situation gracefully and don't consider it a
bug anymore.
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Andy Lutomirski [Thu, 30 Jul 2015 21:31:32 +0000 (14:31 -0700)]
x86/ldt: Make modify_ldt synchronous
modify_ldt() has questionable locking and does not synchronize
threads. Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.
This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.
This fixes some fallout from the CVE-2015-5157 fixes.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Thu, 30 Jul 2015 21:31:31 +0000 (14:31 -0700)]
x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables. Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.
While we're at it, add comments to help explain what's going on.
This isn't a great long-term fix. This code should probably be
changed to use something like set_memory_ro.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <dvrabel@cantab.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Fri, 31 Jul 2015 07:55:26 +0000 (09:55 +0200)]
Merge tag 'efi-urgent' of git://git./linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
* Fix an EFI boot issue preventing a Parallels virtual machine from
booting because the upper 32-bits of the EFI memmap pointer were
being discarded in setup_e820(). (Dmitry Skorodumov)
* Validate that the "efi" kernel parameter gets used with an argument,
otherwise we will oops. (Ricardo Neri)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Fri, 31 Jul 2015 03:36:49 +0000 (20:36 -0700)]
Merge tag 'xfs-for-linus-4.2-rc4' of git://git./linux/kernel/git/dgc/linux-xfs
Pull xfs fixes from Dave Chinner:
"There are a couple of recently found, long standing remote attribute
corruption fixes caused by log recovery getting confused after a
crash, and the new DAX code in XFS (merged in 4.2-rc1) needs to
actually use the DAX fault path on read faults.
Summary:
- remote attribute log recovery corruption fixes
- DAX page faults need to use direct mappings, not a page cache
mapping"
* tag 'xfs-for-linus-4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: remote attributes need to be considered data
xfs: remote attribute headers contain an invalid LSN
xfs: call dax_fault on read page faults for DAX
Sowmini Varadhan [Thu, 30 Jul 2015 13:50:36 +0000 (15:50 +0200)]
net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
The newsk returned by sk_clone_lock should hold a get_net()
reference if, and only if, the parent is not a kernel socket
(making this similar to sk_alloc()).
E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock
sets up the syn_recv newsk from sk_clone_lock. When the parent (listen)
socket is a kernel socket (defined in sk_alloc() as having
sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt
and should not hold a get_net() reference.
Fixes:
26abe14379f8 ("net: Modify sk_alloc to not reference count the
netns of kernel sockets.")
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 29 Jul 2015 21:35:25 +0000 (23:35 +0200)]
net: sched: fix refcount imbalance in actions
Since commit
55334a5db5cd ("net_sched: act: refuse to remove bound action
outside"), we end up with a wrong reference count for a tc action.
Test case 1:
FOO="1,6 0 0
4294967295,"
BAR="1,6 0 0
4294967294,"
tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \
action bpf bytecode "$FOO"
tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0
4294967295' default-action pipe
index 1 ref 1 bind 1
tc actions replace action bpf bytecode "$BAR" index 1
tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0
4294967294' default-action pipe
index 1 ref 2 bind 1
tc actions replace action bpf bytecode "$FOO" index 1
tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0
4294967295' default-action pipe
index 1 ref 3 bind 1
Test case 2:
FOO="1,6 0 0
4294967295,"
tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
tc actions show action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
tc actions add action drop index 1
RTNETLINK answers: File exists [...]
tc actions show action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 2 bind 1
tc actions add action drop index 1
RTNETLINK answers: File exists [...]
tc actions show action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 3 bind 1
What happens is that in tcf_hash_check(), we check tcf_common for a given
index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've
found an existing action. Now there are the following cases:
1) We do a late binding of an action. In that case, we leave the
tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init()
handler. This is correctly handeled.
2) We replace the given action, or we try to add one without replacing
and find out that the action at a specific index already exists
(thus, we go out with error in that case).
In case of 2), we have to undo the reference count increase from
tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to
do so because of the 'tcfc_bindcnt > 0' check which bails out early with
an -EPERM error.
Now, while commit
55334a5db5cd prevents 'tc actions del action ...' on an
already classifier-bound action to drop the reference count (which could
then become negative, wrap around etc), this restriction only accounts for
invocations outside a specific action's ->init() handler.
One possible solution would be to add a flag thus we possibly trigger
the -EPERM ony in situations where it is indeed relevant.
After the patch, above test cases have correct reference count again.
Fixes:
55334a5db5cd ("net_sched: act: refuse to remove bound action outside")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 30 Jul 2015 21:03:46 +0000 (14:03 -0700)]
Merge branch 'r8152-fixes'
Hayes Wang says:
====================
r8152: device reset
v3:
For patch #2, remove cancel_delayed_work().
v2:
For patch #1, remove usb_autopm_get_interface(), usb_autopm_put_interface(), and
the checking of intf->condition.
For patch #2, replace the original method with usb_queue_reset_device() to reset
the device.
v1:
Although the driver works normally, we find the device may get all 0xff data when
transmitting packets on certain platforms. It would break the device and no packet
could be transmitted. The reset is necessary to recover the hw for this situation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Wed, 29 Jul 2015 12:39:09 +0000 (20:39 +0800)]
r8152: reset device when tx timeout
The device reset is necessary if the hw becomes abnormal and stops
transmitting packets.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Wed, 29 Jul 2015 12:39:08 +0000 (20:39 +0800)]
r8152: add pre_reset and post_reset
Add rtl8152_pre_reset() and rtl8152_post_reset() which are used when
calling usb_reset_device(). The two functions could reduce the time
of reset when calling usb_reset_device() after probe().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takashi Iwai [Thu, 30 Jul 2015 20:30:29 +0000 (22:30 +0200)]
ALSA: hda - Fix MacBook Pro 5,2 quirk
MacBook Pro 5,2 with ALC889 codec had already a fixup entry, but this
seems not working correctly, a fix for pin NID 0x15 is needed in
addition. It's equivalent with the fixup for MacBook Air 1,1, so use
this instead.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102131
Reported-and-tested-by: Jeffery Miller <jefferym@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Thomas Gleixner [Thu, 30 Jul 2015 10:40:55 +0000 (12:40 +0200)]
MAINTAINERS: Appoint Jiang and Marc as irqdomain maintainers
Ben was pretty surprised that he is still listed as the maintainer and
he has no objections against transferring the duty to those who
rumaged in and revamped that code in the recent past.
Add kernel/irq/msi.c to the affected files as it's part of the shiny
new hierarchical irqdomain machinery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Grant Likely <grant.likely@linaro.org>
Thomas Gleixner [Thu, 30 Jul 2015 10:38:06 +0000 (12:38 +0200)]
MAINTAINERS: Appoint Marc Zyngier as irqchips co-maintainer
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Jiang Liu [Thu, 30 Jul 2015 07:51:32 +0000 (15:51 +0800)]
x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()
Commit
d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical
irqdomain interfaces") introduced a regression which causes
malfunction of interrupt lines.
The reason is that the conversion of mp_check_pin_attr() missed to
update the polarity selection of the interrupt pin with the caller
provided setting and instead uses a stale attribute value. That in
turn results in chosing the wrong interrupt flow handler.
Use the caller supplied setting to configure the pin correctly which
also choses the correct interrupt flow handler.
This restores the original behaviour and on the affected
machine/driver (Surface Pro 3, i2c controller) all IOAPIC IRQ
configuration are identical to v4.1.
Fixes:
d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-and-tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-and-tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1438242695-23531-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Thu, 30 Jul 2015 18:03:04 +0000 (11:03 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"The main change is support for keyboards and touchpads found in 2015
editions of Macbooks"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Revert "Input: zforce - don't overwrite the stack"
Input: bcm5974 - add support for the 2015 Macbook Pro
HID: apple: Add support for the 2015 Macbook Pro
Input: bcm5974 - prepare for a new trackpad generation
Input: synaptics - dump ext10 capabilities as well
Tony Battersby [Thu, 16 Jul 2015 15:40:41 +0000 (11:40 -0400)]
scsi: fix memory leak with scsi-mq
Fix a memory leak with scsi-mq triggered by commands with large data
transfer length.
__sg_alloc_table() sets both table->nents and table->orig_nents to the
same value. When the scatterlist is DMA-mapped, table->nents is
overwritten with the (possibly smaller) size of the DMA-mapped
scatterlist, while table->orig_nents retains the original size of the
allocated scatterlist. scsi_free_sgtable() should therefore check
orig_nents instead of nents, and all code that initializes sdb->table
without calling __sg_alloc_table() should set both nents and orig_nents.
Fixes:
d285203cf647 ("scsi: add support for a blk-mq based I/O path.")
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Brian King [Tue, 14 Jul 2015 16:41:33 +0000 (11:41 -0500)]
ipr: Fix invalid array indexing for HRRQ
Fixes another signed / unsigned array indexing bug in the ipr driver.
Currently, when hrrq_index wraps, it becomes a negative number. We
do the modulo, but still have a negative number, so we end up indexing
backwards in the array. Given where the hrrq array is located in memory,
we probably won't actually reference memory we don't own, but nonetheless
ipr is still looking at data within struct ipr_ioa_cfg and interpreting it as
struct ipr_hrr_queue data, so bad things could certainly happen.
Each ipr adapter has anywhere from 1 to 16 HRRQs. By default, we use 2 on new
adapters. Let's take an example:
Assume ioa_cfg->hrrq_index=0x7fffffffe and ioa_cfg->hrrq_num=4:
The atomic_add_return will then return -1. We mod this with 3 and get -2, add
one and get -1 for an array index.
On adapters which support more than a single HRRQ, we dedicate HRRQ to adapter
initialization and error interrupts so that we can optimize the other queues
for fast path I/O. So all normal I/O uses HRRQ 1-15. So we want to spread the
I/O requests across those HRRQs.
With the default module parameter settings, this bug won't hit, only when
someone sets the ipr.number_of_msix parameter to a value larger than 3 is when
bad things start to happen.
Cc: <stable@vger.kernel.org>
Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Brian King [Tue, 14 Jul 2015 16:41:31 +0000 (11:41 -0500)]
ipr: Fix incorrect trace indexing
When ipr's internal driver trace was changed to an atomic, a signed/unsigned
bug slipped in which results in us indexing backwards in our memory buffer
writing on memory that does not belong to us. This patch fixes this by removing
the modulo and instead just mask off the low bits.
Cc: <stable@vger.kernel.org>
Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Brian King [Tue, 14 Jul 2015 16:41:29 +0000 (11:41 -0500)]
ipr: Fix locking for unit attention handling
Make sure we have the host lock held when calling scsi_report_bus_reset. Fixes
a crash seen as the __devices list in the scsi host was changing as we were
iterating through it.
Cc: <stable@vger.kernel.org>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Ricardo Neri [Thu, 16 Jul 2015 02:36:03 +0000 (19:36 -0700)]
efi: Check for NULL efi kernel parameters
Even though it is documented how to specifiy efi parameters, it is
possible to cause a kernel panic due to a dereference of a NULL pointer when
parsing such parameters if "efi" alone is given:
PANIC: early exception 0e rip 10:
ffffffff812fb361 error 0 cr2 0
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-rc1+ #450
[ 0.000000]
ffffffff81fe20a9 ffffffff81e03d50 ffffffff8184bb0f 00000000000003f8
[ 0.000000]
0000000000000000 ffffffff81e03e08 ffffffff81f371a1 64656c62616e6520
[ 0.000000]
0000000000000069 000000000000005f 0000000000000000 0000000000000000
[ 0.000000] Call Trace:
[ 0.000000] [<
ffffffff8184bb0f>] dump_stack+0x45/0x57
[ 0.000000] [<
ffffffff81f371a1>] early_idt_handler_common+0x81/0xae
[ 0.000000] [<
ffffffff812fb361>] ? parse_option_str+0x11/0x90
[ 0.000000] [<
ffffffff81f4dd69>] arch_parse_efi_cmdline+0x15/0x42
[ 0.000000] [<
ffffffff81f376e1>] do_early_param+0x50/0x8a
[ 0.000000] [<
ffffffff8106b1b3>] parse_args+0x1e3/0x400
[ 0.000000] [<
ffffffff81f37a43>] parse_early_options+0x24/0x28
[ 0.000000] [<
ffffffff81f37691>] ? loglevel+0x31/0x31
[ 0.000000] [<
ffffffff81f37a78>] parse_early_param+0x31/0x3d
[ 0.000000] [<
ffffffff81f3ae98>] setup_arch+0x2de/0xc08
[ 0.000000] [<
ffffffff8109629a>] ? vprintk_default+0x1a/0x20
[ 0.000000] [<
ffffffff81f37b20>] start_kernel+0x90/0x423
[ 0.000000] [<
ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[ 0.000000] [<
ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
[ 0.000000] RIP 0xffffffff81ba2efc
This panic is not reproducible with "efi=" as this will result in a non-NULL
zero-length string.
Thus, verify that the pointer to the parameter string is not NULL. This is
consistent with other parameter-parsing functions which check for NULL pointers.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Dmitry Skorodumov [Tue, 28 Jul 2015 14:38:32 +0000 (18:38 +0400)]
x86/efi: Use all 64 bit of efi_memmap in setup_e820()
The efi_info structure stores low 32 bits of memory map
in efi_memmap and high 32 bits in efi_memmap_hi.
While constructing pointer in the setup_e820(), need
to take into account all 64 bit of the pointer.
It is because on 64bit machine the function
efi_get_memory_map() may return full 64bit pointer and before
the patch that pointer was truncated.
The issue is triggered on Parallles virtual machine and
fixed with this patch.
Signed-off-by: Dmitry Skorodumov <sdmitry@parallels.com>
Cc: Denis V. Lunev <den@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Linus Torvalds [Thu, 30 Jul 2015 15:04:19 +0000 (08:04 -0700)]
Merge tag 'hwmon-for-linus-v4.2-rc5' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Two patches headed for -stable.
nct7802: Fix integer overflow seen when writing voltage limits
nct7904: Rename pwm attributes to match hwmon ABI"
* tag 'hwmon-for-linus-v4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct7802) Fix integer overflow seen when writing voltage limits
hwmon: (nct7904) Rename pwm attributes to match hwmon ABI
Chris Wilson [Sun, 28 Jun 2015 08:19:26 +0000 (09:19 +0100)]
drm/i915: Declare the swizzling unknown for L-shaped configurations
The old style of memory interleaving swizzled upto the end of the
first even bank of memory, and then used the remainder as unswizzled on
the unpaired bank - i.e. swizzling is not constant for all memory. This
causes problems when we try to migrate memory and so the kernel prevents
migration at all when we detect L-shaped inconsistent swizzling.
However, this issue also extends to userspace who try to manually detile
into memory as the swizzling for an individual page is unknown (it
depends on its physical address only known to the kernel), userspace
cannot correctly swizzle.
Note that this is a new attempt for the previously merged one,
reverted in
commit
d82c0ba6e306f079407f07003e53c262d683397b
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Jul 14 12:29:27 2015 +0200
Revert "drm/i915: Declare the swizzling unknown for L-shaped configurations"
This is cc: stable since we need it to fix up troubles with wc cpu
mmaps that userspace recently started to use widely.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91105
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
[danvet: Add note about previous (failed attempt).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Wed, 29 Jul 2015 19:02:48 +0000 (20:02 +0100)]
drm/i915: Mark PIN_USER binding as GLOBAL_BIND without the aliasing ppgtt
If the device does not support the aliasing ppgtt, we must translate
user bind requests (PIN_USER) from LOCAL_BIND to a GLOBAL_BIND. However,
since this is device specific we cannot do this conveniently in the
upper layers and so must manage the vma->bound flags in the backend.
Partial revert of commit
75d04a3773ecee617847de963ae4195d6aa74c28 [4.2-rc1]
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Apr 28 17:56:17 2015 +0300
drm/i915/gtt: Allocate va range only if vma is not bound
Note this was spotted by Daniel originally, but we dropped the ball in
getting the fix in before the bug going wild. Sorry all.
Reported-by: Vincent Legoll vincent.legoll@gmail.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91133
References: https://bugs.freedesktop.org/show_bug.cgi?id=90224
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Alistair Popple [Thu, 30 Jul 2015 06:53:54 +0000 (16:53 +1000)]
powerpc/eeh-powernv: Fix unbalanced IRQ warning
pnv_eeh_next_error() re-enables the eeh opal event interrupt but it
gets called from a loop if there are more outstanding events to
process, resulting in a warning due to enabling an already enabled
interrupt. Instead the interrupt should only be re-enabled once the
last outstanding event has been processed.
Tested-by: Daniel Axtens <dja@axtens.net>
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Joerg Roedel [Tue, 28 Jul 2015 14:58:51 +0000 (16:58 +0200)]
iommu/amd: Set global dma_ops if swiotlb is disabled
Some AMD systems also have non-PCI devices which can do DMA.
Those can't be handled by the AMD IOMMU, as the hardware can
only handle PCI. These devices would end up with no dma_ops,
as neither the per-device nor the global dma_ops will get
set. SWIOTLB provides global dma_ops when it is active, so
make sure there are global dma_ops too when swiotlb is
disabled.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Joerg Roedel [Tue, 28 Jul 2015 14:58:50 +0000 (16:58 +0200)]
iommu/amd: Use swiotlb in passthrough mode
In passthrough mode (iommu=pt) all devices are identity
mapped. If a device does not support 64bit DMA it might
still need remapping. Make sure swiotlb is initialized to
provide this remapping.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Joerg Roedel [Tue, 28 Jul 2015 14:58:49 +0000 (16:58 +0200)]
iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains
Since devices with IOMMUv2 functionality might be in the
same group as devices without it, allow those devices in
IOMMUv2 domains too.
Otherwise attaching the group with the IOMMUv2 device to the
domain will fail.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Joerg Roedel [Tue, 28 Jul 2015 14:58:48 +0000 (16:58 +0200)]
iommu/amd: Use iommu core for passthrough mode
Remove the AMD IOMMU driver implementation for passthrough
mode and rely on the new iommu core features for that.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Joerg Roedel [Tue, 28 Jul 2015 14:58:47 +0000 (16:58 +0200)]
iommu/amd: Use iommu_attach_group()
Since the conversion to default domains the
iommu_attach_device function only works for devices with
their own group. But this isn't always true for current
IOMMUv2 capable devices, so use iommu_attach_group instead.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Shahed Shaikh [Wed, 29 Jul 2015 11:55:35 +0000 (07:55 -0400)]
qlcnic: Fix corruption while copying
Use proper typecasting while performing byte-by-byte copy
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 29 Jul 2015 16:40:56 +0000 (18:40 +0200)]
act_bpf: fix memory leaks when replacing bpf programs
We currently trigger multiple memory leaks when replacing bpf
actions, besides others:
comm "tc", pid 1909, jiffies
4294851310 (age 1602.796s)
hex dump (first 32 bytes):
01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................
18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00 ...m............
backtrace:
[<
ffffffff817e623e>] kmemleak_alloc+0x4e/0xb0
[<
ffffffff8120a22d>] __vmalloc_node_range+0x1bd/0x2c0
[<
ffffffff8120a37a>] __vmalloc+0x4a/0x50
[<
ffffffff811a8d0a>] bpf_prog_alloc+0x3a/0xa0
[<
ffffffff816c0684>] bpf_prog_create+0x44/0xa0
[<
ffffffffa09ba4eb>] tcf_bpf_init+0x28b/0x3c0 [act_bpf]
[<
ffffffff816d7001>] tcf_action_init_1+0x191/0x1b0
[<
ffffffff816d70a2>] tcf_action_init+0x82/0xf0
[<
ffffffff816d4d12>] tcf_exts_validate+0xb2/0xc0
[<
ffffffffa09b5838>] cls_bpf_modify_existing+0x98/0x340 [cls_bpf]
[<
ffffffffa09b5cd6>] cls_bpf_change+0x1a6/0x274 [cls_bpf]
[<
ffffffff816d56e5>] tc_ctl_tfilter+0x335/0x910
[<
ffffffff816b9145>] rtnetlink_rcv_msg+0x95/0x240
[<
ffffffff816df34f>] netlink_rcv_skb+0xaf/0xc0
[<
ffffffff816b909e>] rtnetlink_rcv+0x2e/0x40
[<
ffffffff816deaaf>] netlink_unicast+0xef/0x1b0
Issue is that the old content from tcf_bpf is allocated and needs
to be released when we replace it. We seem to do that since the
beginning of act_bpf on the filter and insns, later on the name as
well.
Example test case, after patch:
# FOO="1,6 0 0
4294967295,"
# BAR="1,6 0 0
4294967294,"
# tc actions add action bpf bytecode "$FOO" index 2
# tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0
4294967295' default-action pipe
index 2 ref 1 bind 0
# tc actions replace action bpf bytecode "$BAR" index 2
# tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0
4294967294' default-action pipe
index 2 ref 1 bind 0
# tc actions replace action bpf bytecode "$FOO" index 2
# tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0
4294967295' default-action pipe
index 2 ref 1 bind 0
# tc actions del action bpf index 2
[...]
# echo "scan" > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak | grep "comm \"tc\"" | wc -l
0
Fixes:
d23b8ad8ab23 ("tc: add BPF based action")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 30 Jul 2015 06:52:32 +0000 (23:52 -0700)]
Merge branch 'thunderx-fixes'
Aleksey Makarov says:
====================
net: thunderx: Misc fixes
Miscellaneous fixes for the ThunderX VNIC driver
All the patches can be applied individually.
It's ok to drop some if the maintainer feels uncomfortable
with applying for 4.2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanneeru Srinivasulu [Wed, 29 Jul 2015 13:49:46 +0000 (16:49 +0300)]
net: thunderx: Fix for crash while BGX teardown
Cortina phy does not have kernel driver and we don't attach
device with phy layer for intefaces like XFI, XLAUI etc,
Hence check for interface type before calling disconnect.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Wed, 29 Jul 2015 13:49:45 +0000 (16:49 +0300)]
net: thunderx: Add PCI driver shutdown routine
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>