git.stricted.de - GitHub/LineageOS/G12/android_kernel_amlogic

author	Alex Williamson <alex.williamson@redhat.com>
	Fri, 6 Feb 2015 17:58:56 +0000 (10:58 -0700)
committer	Alex Williamson <alex.williamson@redhat.com>
	Fri, 6 Feb 2015 17:58:56 +0000 (10:58 -0700)
commit	6fe1010d6d9c02cf3556ab076585104551a6ee7e
tree	a4067ec65d2adef950cd233db2998c725b0a6905	tree \| snapshot (tar.gz zip)
parent	e36f014edff70fc02b3d3d79cead1d58f289332e	commit \| diff

vfio/type1: DMA unmap chunking

When unmapping DMA entries we try to rely on the IOMMU API behavior
that allows the IOMMU to unmap a larger area than requested, up to
the size of the original mapping.  This works great when the IOMMU
supports superpages *and* they're in use.  Otherwise, each PAGE_SIZE
increment is unmapped separately, resulting in poor performance.

Instead we can use the IOVA-to-physical-address translation provided
by the IOMMU API and unmap using the largest contiguous physical
memory chunk available, which is also how vfio/type1 would have
mapped the region.  For a synthetic 1TB guest VM mapping and shutdown
test on Intel VT-d (2M IOMMU pagesize support), this achieves about
a 30% overall improvement mapping standard 4K pages, regardless of
IOMMU superpage enabling, and about a 40% improvement mapping 2M
hugetlbfs pages when IOMMU superpages are not available.  Hugetlbfs
with IOMMU superpages enabled is effectively unchanged.

Unfortunately the same algorithm does not work well on IOMMUs with
fine-grained superpages, like AMD-Vi, costing about 25% extra since
the IOMMU will automatically unmap any power-of-two contiguous
mapping we've provided it.  We add a routine and a domain flag to
detect this feature, leaving AMD-Vi unaffected by this unmap
optimization.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>