rbd: implement REQ_OP_WRITE_ZEROES
authorIlya Dryomov <idryomov@gmail.com>
Mon, 22 May 2017 17:59:24 +0000 (19:59 +0200)
committerIlya Dryomov <idryomov@gmail.com>
Mon, 29 May 2017 09:18:01 +0000 (11:18 +0200)
Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.

rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
release either some or all blocks depending on whether the zeroing
request is rbd_obj_bytes() aligned.  This is how we currently implement
discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
now.  Caveats:

- REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
  current implementations - nvme and loop

- there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
  is hence less helpful than blk_bio_discard_split(), but this can (and
  should) be fixed on the rbd side

In the future we will split these into two code paths to respect
REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
released on discard.

Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
drivers/block/rbd.c

index 454bf9c34882f33d673ccbaf0c8afa4f3ee18ad4..c16f74547804ccb957275f6d59b705b0ba35eb6b 100644 (file)
@@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
 
        switch (req_op(rq)) {
        case REQ_OP_DISCARD:
+       case REQ_OP_WRITE_ZEROES:
                op_type = OBJ_OP_DISCARD;
                break;
        case REQ_OP_WRITE:
@@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
        q->limits.discard_granularity = segment_size;
        q->limits.discard_alignment = segment_size;
        blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
+       blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
 
        if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
                q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;