vfs: decrapify dput(), fix cache behavior under normal load
We do not want to dirty the dentry->d_flags cacheline in dput() just to
set the DCACHE_REFERENCED flag when it is already set in the common case
anyway. This way the first cacheline of the dentry (which contains the
RCU lookup information etc) can stay shared among multiple CPU's.
This finishes off some of the details of all the scalability patches
merged during the merge window.
Also don't mark dentry_kill() for inlining, since it's the uncommon path
and inlining it just makes the common path slower due to extra function
entry/exit overhead.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>