When I test fio script with big I/O depth, I found the total throughput drops
compared to some relative small I/O depth. The reason is the thread accumulates
big requests in its plug list and causes some delays (surely this depends
on CPU speed).
I thought we'd better have a threshold for requests. When a threshold reaches,
this means there is no request merge and queue lock contention isn't severe
when pushing per-task requests to queue, so the main advantages of blk plug
don't exist. We can force a plug list flush in this case.
With this, my test throughput actually increases and almost equals to small
I/O depth. Another side effect is irq off time decreases in blk_flush_plug_list()
for big I/O depth.
The BLK_MAX_REQUEST_COUNT is choosen arbitarily, but 16 is efficiently to
reduce lock contention to me. But I'm open here, 32 is ok in my test too.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
plug->should_sort = 1;
}
list_add_tail(&req->queuelist, &plug->list);
+ plug->count++;
drive_stat_acct(req, 1);
+ if (plug->count >= BLK_MAX_REQUEST_COUNT)
+ blk_flush_plug_list(plug, false);
} else {
spin_lock_irq(q->queue_lock);
add_acct_request(q, req, where);
INIT_LIST_HEAD(&plug->list);
INIT_LIST_HEAD(&plug->cb_list);
plug->should_sort = 0;
+ plug->count = 0;
/*
* If this is a nested plug, don't actually assign it. It will be
return;
list_splice_init(&plug->list, &list);
+ plug->count = 0;
if (plug->should_sort) {
list_sort(NULL, &list, plug_rq_cmp);
struct list_head list;
struct list_head cb_list;
unsigned int should_sort;
+ unsigned int count;
};
+#define BLK_MAX_REQUEST_COUNT 16
+
struct blk_plug_cb {
struct list_head list;
void (*callback)(struct blk_plug_cb *);