Commit
80e96c5484be ("dm thin: do not allow thin device activation
while pool is suspended") delayed the initialization of a new thin
device's refcount and completion until after this new thin was added
to the pool's active_thins list and the pool lock is released. This
opens a race with a worker thread that walks the list and calls
thin_get/put, noticing that the refcount goes to 0 and calling
complete, freezing up the system and giving the oops below:
kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
kernel: IP: [<
ffffffff810d360b>] __wake_up_common+0x2b/0x90
kernel: Call Trace:
kernel: [<
ffffffff810d3683>] __wake_up_locked+0x13/0x20
kernel: [<
ffffffff810d3dc7>] complete+0x37/0x50
kernel: [<
ffffffffa0595c50>] thin_put+0x20/0x30 [dm_thin_pool]
kernel: [<
ffffffffa059aab7>] do_worker+0x667/0x870 [dm_thin_pool]
kernel: [<
ffffffff816a8a4c>] ? __schedule+0x3ac/0x9a0
kernel: [<
ffffffff810b1aef>] process_one_work+0x14f/0x400
kernel: [<
ffffffff810b206b>] worker_thread+0x6b/0x490
kernel: [<
ffffffff810b2000>] ? rescuer_thread+0x260/0x260
kernel: [<
ffffffff810b6a7b>] kthread+0xdb/0x100
kernel: [<
ffffffff810b69a0>] ? kthread_create_on_node+0x170/0x170
kernel: [<
ffffffff816ad7ec>] ret_from_fork+0x7c/0xb0
kernel: [<
ffffffff810b69a0>] ? kthread_create_on_node+0x170/0x170
Set the thin device's initial refcount and initialize the completion
before adding it to the pool's active_thins list in thin_ctr().
Signed-off-by: Marc Dionne <marc.dionne@your-file-system.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
r = -EINVAL;
goto bad;
}
+ atomic_set(&tc->refcount, 1);
+ init_completion(&tc->can_destroy);
list_add_tail_rcu(&tc->list, &tc->pool->active_thins);
spin_unlock_irqrestore(&tc->pool->lock, flags);
/*
dm_put(pool_md);
- atomic_set(&tc->refcount, 1);
- init_completion(&tc->can_destroy);
-
return 0;
bad: