md/raid5: Use conf->device_lock protect changing of multi-thread resources.
When we change group_thread_cnt from sysfs entry, it can OOPS.
The kernel messages are:
[ 135.299021] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 135.299073] IP: [<
ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[ 135.299107] PGD 0
[ 135.299122] Oops: 0000 [#1] SMP
[ 135.299144] Modules linked in: netconsole e1000e ptp pps_core
[ 135.299188] CPU: 3 PID: 2225 Comm: md0_raid5 Not tainted 3.12.0+ #24
[ 135.299214] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011
[ 135.299255] task:
ffff8800b9638f80 ti:
ffff8800b77a4000 task.ti:
ffff8800b77a4000
[ 135.299283] RIP: 0010:[<
ffffffff815188ab>] [<
ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[ 135.299323] RSP: 0018:
ffff8800b77a5c48 EFLAGS:
00010002
[ 135.299344] RAX:
ffff880037bb5c70 RBX:
0000000000000000 RCX:
0000000000000008
[ 135.299371] RDX:
ffff880037bb5cb8 RSI:
0000000000000001 RDI:
ffff880037bb5c00
[ 135.299398] RBP:
ffff8800b77a5d08 R08:
0000000000000001 R09:
0000000000000000
[ 135.299425] R10:
ffff8800b77a5c98 R11:
00000000ffffffff R12:
ffff880037bb5c00
[ 135.299452] R13:
0000000000000000 R14:
0000000000000000 R15:
ffff880037bb5c70
[ 135.299479] FS:
0000000000000000(0000) GS:
ffff88013fd80000(0000) knlGS:
0000000000000000
[ 135.299510] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
[ 135.299532] CR2:
0000000000000000 CR3:
0000000001c0b000 CR4:
00000000000407e0
[ 135.299559] Stack:
[ 135.299570]
ffff8800b77a5c88 ffffffff8107383e ffff8800b77a5c88 ffff880037a64300
[ 135.299611]
000000000000ec08 ffff880037bb5cb8 ffff8800b77a5c98 ffffffffffffffd8
[ 135.299654]
000000000000ec08 ffff880037bb5c60 ffff8800b77a5c98 ffff8800b77a5c98
[ 135.299696] Call Trace:
[ 135.299711] [<
ffffffff8107383e>] ? __wake_up+0x4e/0x70
[ 135.299733] [<
ffffffff81518f88>] raid5d+0x4c8/0x680
[ 135.299756] [<
ffffffff817174ed>] ? schedule_timeout+0x15d/0x1f0
[ 135.299781] [<
ffffffff81524c9f>] md_thread+0x11f/0x170
[ 135.299804] [<
ffffffff81069cd0>] ? wake_up_bit+0x40/0x40
[ 135.299826] [<
ffffffff81524b80>] ? md_rdev_init+0x110/0x110
[ 135.299850] [<
ffffffff81069656>] kthread+0xc6/0xd0
[ 135.299871] [<
ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[ 135.299899] [<
ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
[ 135.299923] [<
ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[ 135.299951] Code: ff ff ff 0f 84 d7 fe ff ff e9 5c fe ff ff 66 90 41 8b b4 24 d8 01 00 00 45 31 ed 85 f6 0f 8e 7b fd ff ff 49 8b 9c 24 d0 01 00 00 <48> 3b 1b 49 89 dd 0f 85 67 fd ff ff 48 8d 43 28 31 d2 eb 17 90
[ 135.300005] RIP [<
ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[ 135.300005] RSP <
ffff8800b77a5c48>
[ 135.300005] CR2:
0000000000000000
[ 135.300005] ---[ end trace
504854e5bb7562ed ]---
[ 135.300005] Kernel panic - not syncing: Fatal exception
This is because raid5d() can be running when the multi-thread
resources are changed via system. We see need to provide locking.
mddev->device_lock is suitable, but we cannot simple call
alloc_thread_groups under this lock as we cannot allocate memory
while holding a spinlock.
So change alloc_thread_groups() to allocate and return the data
structures, then raid5_store_group_thread_cnt() can take the lock
while updating the pointers to the data structures.
This fixes a bug introduced in 3.12 and so is suitable for the 3.12.x
stable series.
Fixes:
b721420e8719131896b009b11edbbd27
Cc: stable@vger.kernel.org (3.12)
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>