cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy()
Because ->pre_destroy() could fail and can't be called under
cgroup_mutex, cgroup destruction did something very ugly.
1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.
2. Release cgroup_mutex and call ->pre_destroy().
3. Re-grab cgroup_mutex and verify it can still be destroyed; fail
otherwise.
4. Continue destroying.
In addition to being ugly, it has been always broken in various ways.
For example, memcg ->pre_destroy() expects the cgroup to be inactive
after it's done but tasks can be attached and detached between #2 and
#3 and the conditions that memcg verified in ->pre_destroy() might no
longer hold by the time control reaches #3.
Now that ->pre_destroy() is no longer allowed to fail. We can switch
to the following.
1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.
2. Deactivate CSS's and mark the cgroup removed thus preventing any
further operations which can invalidate the verification from #1.
3. Release cgroup_mutex and call ->pre_destroy().
4. Re-grab cgroup_mutex and continue destroying.
After this change, controllers can safely assume that ->pre_destroy()
will only be called only once for a given cgroup and, once
->pre_destroy() is called, the cgroup will stay dormant till it's
destroyed.
This removes the only reason ->pre_destroy() can fail - new task being
attached or child cgroup being created inbetween. Error out path is
removed and ->pre_destroy() invocation is open coded in
cgroup_rmdir().
v2: cgroup_call_pre_destroy() removal moved to this patch per Michal.
Commit message updated per Glauber.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>