blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt
authorAlexander Gordeev <agordeev@redhat.com>
Thu, 12 Jun 2014 15:05:37 +0000 (17:05 +0200)
committerJens Axboe <axboe@fb.com>
Wed, 18 Jun 2014 05:13:05 +0000 (22:13 -0700)
commit2971c35f35886b87af54675313a2afef937c1b0c
tree3a812e2286298ca05f764e93c777a80adc310d40
parent8537b12034cf1fd3fab3da2c859d71f76846fae9
blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt

This piece of code in bt_clear_tag() function is racy:

bs = bt_wake_ptr(bt);
if (bs && atomic_dec_and_test(&bs->wait_cnt)) {
atomic_set(&bs->wait_cnt, bt->wake_cnt);
  wake_up(&bs->wait);
}

Since nothing prevents bt_wake_ptr() from returning the very
same 'bs' address on multiple CPUs, the following scenario is
possible:

    CPU1                                CPU2
    ----                                ----

0.  bs = bt_wake_ptr(bt);               bs = bt_wake_ptr(bt);
1.  atomic_dec_and_test(&bs->wait_cnt)
2.                                      atomic_dec_and_test(&bs->wait_cnt)
3.  atomic_set(&bs->wait_cnt, bt->wake_cnt);

If the decrement in [1] yields zero then for some amount of time
the decrement in [2] results in a negative/overflow value, which
is not expected. The follow-up assignment in [3] overwrites the
invalid value with the batch value (and likely prevents the issue
from being severe) which is still incorrect and should be a lesser.

Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
block/blk-mq-tag.c