There's a race between adding a block group to the list of the unused
block groups and removing an unused block group (cleaner kthread) that
leads to freeing extents that are in use or a crash during transaction
commmit. Basically the cleaner kthread, when executing
btrfs_delete_unused_bgs(), might catch the newly added block group to
the list fs_info->unused_bgs and clear the range representing the whole
group from fs_info->freed_extents[] before the task that added the block
group to the list (running update_block_group()) marked the last freed
extent as dirty in fs_info->freed_extents (pinned_extents).
That is:
CPU 1 CPU 2
btrfs_delete_unused_bgs()
update_block_group()
add block group to
fs_info->unused_bgs
got block group from the list
clear_extent_bits for the whole
block group range in freed_extents[]
set_extent_dirty for the
range covering the freed
extent in freed_extents[]
(fs_info->pinned_extents)
block group deleted, and a new block
group with the same logical address is
created
reserve space from the new block group
for new data or metadata - the reserved
space overlaps the range specified by
CPU 1 for set_extent_dirty()
commit transaction
find all ranges marked as dirty in
fs_info->pinned_extents, clear them
and add them to the free space cache
Alternatively, if CPU 2 doesn't create a new block group with the same
logical address, we get a crash/BUG_ON at transaction commit when unpining
extent ranges because we can't find a block group for the range marked as
dirty by CPU 1. Sample trace:
[ 2163.426462] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2163.426640] Modules linked in: btrfs xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32c_generic libcrc32c dm_mod nfsd auth_rpc
gss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse parport_pc parport i2c_piix4 processor thermal_sys i2ccore evdev button pcspkr microcode serio_raw ext4 crc16 jbd2 mbcache
sg sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata e1000 scsi_mod virtio_pci virtio_ring virtio
[ 2163.428209] CPU: 0 PID: 11858 Comm: btrfs-transacti Tainted: G W 3.17.0-rc5-btrfs-next-1+ #1
[ 2163.428519] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2163.428875] task:
ffff88009f2c0650 ti:
ffff8801356bc000 task.ti:
ffff8801356bc000
[ 2163.429157] RIP: 0010:[<
ffffffffa037728e>] [<
ffffffffa037728e>] unpin_extent_range.isra.58+0x62/0x192 [btrfs]
[ 2163.429562] RSP: 0018:
ffff8801356bfda8 EFLAGS:
00010246
[ 2163.429802] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000000
[ 2163.429990] RDX:
0000000041bfffff RSI:
0000000001c00000 RDI:
ffff880024307080
[ 2163.430042] RBP:
ffff8801356bfde8 R08:
0000000000000068 R09:
ffff88003734f118
[ 2163.430042] R10:
ffff8801356bfcb8 R11:
fffffffffffffb69 R12:
ffff8800243070d0
[ 2163.430042] R13:
0000000083c04000 R14:
ffff8800751b0f00 R15:
ffff880024307000
[ 2163.430042] FS:
0000000000000000(0000) GS:
ffff88013f400000(0000) knlGS:
0000000000000000
[ 2163.430042] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
[ 2163.430042] CR2:
00007ff10eb43fc0 CR3:
0000000004cb8000 CR4:
00000000000006f0
[ 2163.430042] Stack:
[ 2163.430042]
ffff8800243070d0 0000000083c08000 0000000083c07fff ffff88012d6bc800
[ 2163.430042]
ffff8800243070d0 ffff8800751b0f18 ffff8800751b0f00 0000000000000000
[ 2163.430042]
ffff8801356bfe18 ffffffffa037a481 0000000083c04000 0000000083c07fff
[ 2163.430042] Call Trace:
[ 2163.430042] [<
ffffffffa037a481>] btrfs_finish_extent_commit+0xac/0xbf [btrfs]
[ 2163.430042] [<
ffffffffa038c06d>] btrfs_commit_transaction+0x6ee/0x882 [btrfs]
[ 2163.430042] [<
ffffffffa03881f1>] transaction_kthread+0xf2/0x1a4 [btrfs]
[ 2163.430042] [<
ffffffffa03880ff>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
[ 2163.430042] [<
ffffffff8105966b>] kthread+0xb7/0xbf
[ 2163.430042] [<
ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
[ 2163.430042] [<
ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
[ 2163.430042] [<
ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
So fix this by making update_block_group() first set the range as dirty
in pinned_extents before adding the block group to the unused_bgs list.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
spin_unlock(&cache->space_info->lock);
} else {
old_val -= num_bytes;
+ btrfs_set_block_group_used(&cache->item, old_val);
+ cache->pinned += num_bytes;
+ cache->space_info->bytes_pinned += num_bytes;
+ cache->space_info->bytes_used -= num_bytes;
+ cache->space_info->disk_used -= num_bytes * factor;
+ spin_unlock(&cache->lock);
+ spin_unlock(&cache->space_info->lock);
+ set_extent_dirty(info->pinned_extents,
+ bytenr, bytenr + num_bytes - 1,
+ GFP_NOFS | __GFP_NOFAIL);
/*
* No longer have used bytes in this block group, queue
* it for deletion.
}
spin_unlock(&info->unused_bgs_lock);
}
- btrfs_set_block_group_used(&cache->item, old_val);
- cache->pinned += num_bytes;
- cache->space_info->bytes_pinned += num_bytes;
- cache->space_info->bytes_used -= num_bytes;
- cache->space_info->disk_used -= num_bytes * factor;
- spin_unlock(&cache->lock);
- spin_unlock(&cache->space_info->lock);
-
- set_extent_dirty(info->pinned_extents,
- bytenr, bytenr + num_bytes - 1,
- GFP_NOFS | __GFP_NOFAIL);
}
btrfs_put_block_group(cache);
total -= num_bytes;