| 1 | |
| 2 | The Resource Counter |
| 3 | |
| 4 | The resource counter, declared at include/linux/res_counter.h, |
| 5 | is supposed to facilitate the resource management by controllers |
| 6 | by providing common stuff for accounting. |
| 7 | |
| 8 | This "stuff" includes the res_counter structure and routines |
| 9 | to work with it. |
| 10 | |
| 11 | |
| 12 | |
| 13 | 1. Crucial parts of the res_counter structure |
| 14 | |
| 15 | a. unsigned long long usage |
| 16 | |
| 17 | The usage value shows the amount of a resource that is consumed |
| 18 | by a group at a given time. The units of measurement should be |
| 19 | determined by the controller that uses this counter. E.g. it can |
| 20 | be bytes, items or any other unit the controller operates on. |
| 21 | |
| 22 | b. unsigned long long max_usage |
| 23 | |
| 24 | The maximal value of the usage over time. |
| 25 | |
| 26 | This value is useful when gathering statistical information about |
| 27 | the particular group, as it shows the actual resource requirements |
| 28 | for a particular group, not just some usage snapshot. |
| 29 | |
| 30 | c. unsigned long long limit |
| 31 | |
| 32 | The maximal allowed amount of resource to consume by the group. In |
| 33 | case the group requests for more resources, so that the usage value |
| 34 | would exceed the limit, the resource allocation is rejected (see |
| 35 | the next section). |
| 36 | |
| 37 | d. unsigned long long failcnt |
| 38 | |
| 39 | The failcnt stands for "failures counter". This is the number of |
| 40 | resource allocation attempts that failed. |
| 41 | |
| 42 | c. spinlock_t lock |
| 43 | |
| 44 | Protects changes of the above values. |
| 45 | |
| 46 | |
| 47 | |
| 48 | 2. Basic accounting routines |
| 49 | |
| 50 | a. void res_counter_init(struct res_counter *rc, |
| 51 | struct res_counter *rc_parent) |
| 52 | |
| 53 | Initializes the resource counter. As usual, should be the first |
| 54 | routine called for a new counter. |
| 55 | |
| 56 | The struct res_counter *parent can be used to define a hierarchical |
| 57 | child -> parent relationship directly in the res_counter structure, |
| 58 | NULL can be used to define no relationship. |
| 59 | |
| 60 | c. int res_counter_charge(struct res_counter *rc, unsigned long val, |
| 61 | struct res_counter **limit_fail_at) |
| 62 | |
| 63 | When a resource is about to be allocated it has to be accounted |
| 64 | with the appropriate resource counter (controller should determine |
| 65 | which one to use on its own). This operation is called "charging". |
| 66 | |
| 67 | This is not very important which operation - resource allocation |
| 68 | or charging - is performed first, but |
| 69 | * if the allocation is performed first, this may create a |
| 70 | temporary resource over-usage by the time resource counter is |
| 71 | charged; |
| 72 | * if the charging is performed first, then it should be uncharged |
| 73 | on error path (if the one is called). |
| 74 | |
| 75 | If the charging fails and a hierarchical dependency exists, the |
| 76 | limit_fail_at parameter is set to the particular res_counter element |
| 77 | where the charging failed. |
| 78 | |
| 79 | d. int res_counter_charge_locked |
| 80 | (struct res_counter *rc, unsigned long val, bool force) |
| 81 | |
| 82 | The same as res_counter_charge(), but it must not acquire/release the |
| 83 | res_counter->lock internally (it must be called with res_counter->lock |
| 84 | held). The force parameter indicates whether we can bypass the limit. |
| 85 | |
| 86 | e. u64 res_counter_uncharge[_locked] |
| 87 | (struct res_counter *rc, unsigned long val) |
| 88 | |
| 89 | When a resource is released (freed) it should be de-accounted |
| 90 | from the resource counter it was accounted to. This is called |
| 91 | "uncharging". The return value of this function indicate the amount |
| 92 | of charges still present in the counter. |
| 93 | |
| 94 | The _locked routines imply that the res_counter->lock is taken. |
| 95 | |
| 96 | f. u64 res_counter_uncharge_until |
| 97 | (struct res_counter *rc, struct res_counter *top, |
| 98 | unsinged long val) |
| 99 | |
| 100 | Almost same as res_cunter_uncharge() but propagation of uncharge |
| 101 | stops when rc == top. This is useful when kill a res_coutner in |
| 102 | child cgroup. |
| 103 | |
| 104 | 2.1 Other accounting routines |
| 105 | |
| 106 | There are more routines that may help you with common needs, like |
| 107 | checking whether the limit is reached or resetting the max_usage |
| 108 | value. They are all declared in include/linux/res_counter.h. |
| 109 | |
| 110 | |
| 111 | |
| 112 | 3. Analyzing the resource counter registrations |
| 113 | |
| 114 | a. If the failcnt value constantly grows, this means that the counter's |
| 115 | limit is too tight. Either the group is misbehaving and consumes too |
| 116 | many resources, or the configuration is not suitable for the group |
| 117 | and the limit should be increased. |
| 118 | |
| 119 | b. The max_usage value can be used to quickly tune the group. One may |
| 120 | set the limits to maximal values and either load the container with |
| 121 | a common pattern or leave one for a while. After this the max_usage |
| 122 | value shows the amount of memory the container would require during |
| 123 | its common activity. |
| 124 | |
| 125 | Setting the limit a bit above this value gives a pretty good |
| 126 | configuration that works in most of the cases. |
| 127 | |
| 128 | c. If the max_usage is much less than the limit, but the failcnt value |
| 129 | is growing, then the group tries to allocate a big chunk of resource |
| 130 | at once. |
| 131 | |
| 132 | d. If the max_usage is much less than the limit, but the failcnt value |
| 133 | is 0, then this group is given too high limit, that it does not |
| 134 | require. It is better to lower the limit a bit leaving more resource |
| 135 | for other groups. |
| 136 | |
| 137 | |
| 138 | |
| 139 | 4. Communication with the control groups subsystem (cgroups) |
| 140 | |
| 141 | All the resource controllers that are using cgroups and resource counters |
| 142 | should provide files (in the cgroup filesystem) to work with the resource |
| 143 | counter fields. They are recommended to adhere to the following rules: |
| 144 | |
| 145 | a. File names |
| 146 | |
| 147 | Field name File name |
| 148 | --------------------------------------------------- |
| 149 | usage usage_in_<unit_of_measurement> |
| 150 | max_usage max_usage_in_<unit_of_measurement> |
| 151 | limit limit_in_<unit_of_measurement> |
| 152 | failcnt failcnt |
| 153 | lock no file :) |
| 154 | |
| 155 | b. Reading from file should show the corresponding field value in the |
| 156 | appropriate format. |
| 157 | |
| 158 | c. Writing to file |
| 159 | |
| 160 | Field Expected behavior |
| 161 | ---------------------------------- |
| 162 | usage prohibited |
| 163 | max_usage reset to usage |
| 164 | limit set the limit |
| 165 | failcnt reset to zero |
| 166 | |
| 167 | |
| 168 | |
| 169 | 5. Usage example |
| 170 | |
| 171 | a. Declare a task group (take a look at cgroups subsystem for this) and |
| 172 | fold a res_counter into it |
| 173 | |
| 174 | struct my_group { |
| 175 | struct res_counter res; |
| 176 | |
| 177 | <other fields> |
| 178 | } |
| 179 | |
| 180 | b. Put hooks in resource allocation/release paths |
| 181 | |
| 182 | int alloc_something(...) |
| 183 | { |
| 184 | if (res_counter_charge(res_counter_ptr, amount) < 0) |
| 185 | return -ENOMEM; |
| 186 | |
| 187 | <allocate the resource and return to the caller> |
| 188 | } |
| 189 | |
| 190 | void release_something(...) |
| 191 | { |
| 192 | res_counter_uncharge(res_counter_ptr, amount); |
| 193 | |
| 194 | <release the resource> |
| 195 | } |
| 196 | |
| 197 | In order to keep the usage value self-consistent, both the |
| 198 | "res_counter_ptr" and the "amount" in release_something() should be |
| 199 | the same as they were in the alloc_something() when the releasing |
| 200 | resource was allocated. |
| 201 | |
| 202 | c. Provide the way to read res_counter values and set them (the cgroups |
| 203 | still can help with it). |
| 204 | |
| 205 | c. Compile and run :) |