Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | Documentation for /proc/sys/vm/* kernel version 2.2.10 |
2 | (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> | |
3 | ||
4 | For general info and legal blurb, please look in README. | |
5 | ||
6 | ============================================================== | |
7 | ||
8 | This file contains the documentation for the sysctl files in | |
9 | /proc/sys/vm and is valid for Linux kernel version 2.2. | |
10 | ||
11 | The files in this directory can be used to tune the operation | |
12 | of the virtual memory (VM) subsystem of the Linux kernel and | |
13 | the writeout of dirty data to disk. | |
14 | ||
15 | Default values and initialization routines for most of these | |
16 | files can be found in mm/swap.c. | |
17 | ||
18 | Currently, these files are in /proc/sys/vm: | |
19 | - overcommit_memory | |
20 | - page-cluster | |
21 | - dirty_ratio | |
22 | - dirty_background_ratio | |
23 | - dirty_expire_centisecs | |
24 | - dirty_writeback_centisecs | |
195cf453 | 25 | - highmem_is_dirtyable (only if CONFIG_HIGHMEM set) |
1da177e4 LT |
26 | - max_map_count |
27 | - min_free_kbytes | |
28 | - laptop_mode | |
29 | - block_dump | |
9d0243bc | 30 | - drop-caches |
1743660b | 31 | - zone_reclaim_mode |
9614634f | 32 | - min_unmapped_ratio |
0ff38490 | 33 | - min_slab_ratio |
fadd8fbd | 34 | - panic_on_oom |
fef1bdd6 | 35 | - oom_dump_tasks |
fe071d7e | 36 | - oom_kill_allocating_task |
ed032189 | 37 | - mmap_min_address |
f0c0b2b8 | 38 | - numa_zonelist_order |
d5dbac87 NA |
39 | - nr_hugepages |
40 | - nr_overcommit_hugepages | |
dd8632a1 | 41 | - nr_trim_pages (only if CONFIG_MMU=n) |
1da177e4 LT |
42 | |
43 | ============================================================== | |
44 | ||
2da02997 DR |
45 | dirty_bytes, dirty_ratio, dirty_background_bytes, |
46 | dirty_background_ratio, dirty_expire_centisecs, | |
195cf453 BG |
47 | dirty_writeback_centisecs, highmem_is_dirtyable, |
48 | vfs_cache_pressure, laptop_mode, block_dump, swap_token_timeout, | |
49 | drop-caches, hugepages_treat_as_movable: | |
1da177e4 LT |
50 | |
51 | See Documentation/filesystems/proc.txt | |
52 | ||
53 | ============================================================== | |
54 | ||
55 | overcommit_memory: | |
56 | ||
57 | This value contains a flag that enables memory overcommitment. | |
58 | ||
59 | When this flag is 0, the kernel attempts to estimate the amount | |
60 | of free memory left when userspace requests more memory. | |
61 | ||
62 | When this flag is 1, the kernel pretends there is always enough | |
63 | memory until it actually runs out. | |
64 | ||
65 | When this flag is 2, the kernel uses a "never overcommit" | |
66 | policy that attempts to prevent any overcommit of memory. | |
67 | ||
68 | This feature can be very useful because there are a lot of | |
69 | programs that malloc() huge amounts of memory "just-in-case" | |
70 | and don't use much of it. | |
71 | ||
72 | The default value is 0. | |
73 | ||
74 | See Documentation/vm/overcommit-accounting and | |
75 | security/commoncap.c::cap_vm_enough_memory() for more information. | |
76 | ||
77 | ============================================================== | |
78 | ||
79 | overcommit_ratio: | |
80 | ||
81 | When overcommit_memory is set to 2, the committed address | |
82 | space is not permitted to exceed swap plus this percentage | |
83 | of physical RAM. See above. | |
84 | ||
85 | ============================================================== | |
86 | ||
87 | page-cluster: | |
88 | ||
89 | The Linux VM subsystem avoids excessive disk seeks by reading | |
90 | multiple pages on a page fault. The number of pages it reads | |
91 | is dependent on the amount of memory in your machine. | |
92 | ||
93 | The number of pages the kernel reads in at once is equal to | |
94 | 2 ^ page-cluster. Values above 2 ^ 5 don't make much sense | |
95 | for swap because we only cluster swap data in 32-page groups. | |
96 | ||
97 | ============================================================== | |
98 | ||
99 | max_map_count: | |
100 | ||
101 | This file contains the maximum number of memory map areas a process | |
102 | may have. Memory map areas are used as a side-effect of calling | |
103 | malloc, directly by mmap and mprotect, and also when loading shared | |
104 | libraries. | |
105 | ||
106 | While most applications need less than a thousand maps, certain | |
107 | programs, particularly malloc debuggers, may consume lots of them, | |
108 | e.g., up to one or two maps per allocation. | |
109 | ||
110 | The default value is 65536. | |
111 | ||
112 | ============================================================== | |
113 | ||
114 | min_free_kbytes: | |
115 | ||
116 | This is used to force the Linux VM to keep a minimum number | |
117 | of kilobytes free. The VM uses this number to compute a pages_min | |
118 | value for each lowmem zone in the system. Each lowmem zone gets | |
119 | a number of reserved free pages based proportionally on its size. | |
8ad4b1fb | 120 | |
d9195881 | 121 | Some minimal amount of memory is needed to satisfy PF_MEMALLOC |
24950898 PM |
122 | allocations; if you set this to lower than 1024KB, your system will |
123 | become subtly broken, and prone to deadlock under high loads. | |
124 | ||
125 | Setting this too high will OOM your machine instantly. | |
126 | ||
8ad4b1fb RS |
127 | ============================================================== |
128 | ||
129 | percpu_pagelist_fraction | |
130 | ||
131 | This is the fraction of pages at most (high mark pcp->high) in each zone that | |
132 | are allocated for each per cpu page list. The min value for this is 8. It | |
133 | means that we don't allow more than 1/8th of pages in each zone to be | |
134 | allocated in any single per_cpu_pagelist. This entry only changes the value | |
135 | of hot per cpu pagelists. User can specify a number like 100 to allocate | |
136 | 1/100th of each zone to each per cpu page list. | |
137 | ||
138 | The batch value of each per cpu pagelist is also updated as a result. It is | |
139 | set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8) | |
140 | ||
141 | The initial value is zero. Kernel does not use this value at boot time to set | |
142 | the high water marks for each per cpu page list. | |
1743660b CL |
143 | |
144 | =============================================================== | |
145 | ||
146 | zone_reclaim_mode: | |
147 | ||
5d3f083d | 148 | Zone_reclaim_mode allows someone to set more or less aggressive approaches to |
1b2ffb78 CL |
149 | reclaim memory when a zone runs out of memory. If it is set to zero then no |
150 | zone reclaim occurs. Allocations will be satisfied from other zones / nodes | |
151 | in the system. | |
152 | ||
153 | This is value ORed together of | |
154 | ||
155 | 1 = Zone reclaim on | |
156 | 2 = Zone reclaim writes dirty pages out | |
157 | 4 = Zone reclaim swaps pages | |
158 | ||
159 | zone_reclaim_mode is set during bootup to 1 if it is determined that pages | |
160 | from remote zones will cause a measurable performance reduction. The | |
1743660b | 161 | page allocator will then reclaim easily reusable pages (those page |
1b2ffb78 CL |
162 | cache pages that are currently not used) before allocating off node pages. |
163 | ||
164 | It may be beneficial to switch off zone reclaim if the system is | |
165 | used for a file server and all of memory should be used for caching files | |
166 | from disk. In that case the caching effect is more important than | |
167 | data locality. | |
168 | ||
169 | Allowing zone reclaim to write out pages stops processes that are | |
170 | writing large amounts of data from dirtying pages on other nodes. Zone | |
171 | reclaim will write out dirty pages if a zone fills up and so effectively | |
172 | throttle the process. This may decrease the performance of a single process | |
173 | since it cannot use all of system memory to buffer the outgoing writes | |
174 | anymore but it preserve the memory on other nodes so that the performance | |
175 | of other processes running on other nodes will not be affected. | |
1743660b | 176 | |
1b2ffb78 CL |
177 | Allowing regular swap effectively restricts allocations to the local |
178 | node unless explicitly overridden by memory policies or cpuset | |
179 | configurations. | |
1743660b | 180 | |
fadd8fbd KH |
181 | ============================================================= |
182 | ||
9614634f CL |
183 | min_unmapped_ratio: |
184 | ||
185 | This is available only on NUMA kernels. | |
186 | ||
0ff38490 | 187 | A percentage of the total pages in each zone. Zone reclaim will only |
9614634f CL |
188 | occur if more than this percentage of pages are file backed and unmapped. |
189 | This is to insure that a minimal amount of local pages is still available for | |
190 | file I/O even if the node is overallocated. | |
191 | ||
192 | The default is 1 percent. | |
193 | ||
194 | ============================================================= | |
195 | ||
0ff38490 CL |
196 | min_slab_ratio: |
197 | ||
198 | This is available only on NUMA kernels. | |
199 | ||
200 | A percentage of the total pages in each zone. On Zone reclaim | |
201 | (fallback from the local zone occurs) slabs will be reclaimed if more | |
202 | than this percentage of pages in a zone are reclaimable slab pages. | |
203 | This insures that the slab growth stays under control even in NUMA | |
204 | systems that rarely perform global reclaim. | |
205 | ||
206 | The default is 5 percent. | |
207 | ||
208 | Note that slab reclaim is triggered in a per zone / node fashion. | |
209 | The process of reclaiming slab memory is currently not node specific | |
210 | and may not be fast. | |
211 | ||
212 | ============================================================= | |
213 | ||
fadd8fbd KH |
214 | panic_on_oom |
215 | ||
2b744c01 | 216 | This enables or disables panic on out-of-memory feature. |
fadd8fbd | 217 | |
2b744c01 YG |
218 | If this is set to 0, the kernel will kill some rogue process, |
219 | called oom_killer. Usually, oom_killer can kill rogue processes and | |
220 | system will survive. | |
221 | ||
222 | If this is set to 1, the kernel panics when out-of-memory happens. | |
223 | However, if a process limits using nodes by mempolicy/cpusets, | |
224 | and those nodes become memory exhaustion status, one process | |
225 | may be killed by oom-killer. No panic occurs in this case. | |
226 | Because other nodes' memory may be free. This means system total status | |
227 | may be not fatal yet. | |
fadd8fbd | 228 | |
2b744c01 YG |
229 | If this is set to 2, the kernel panics compulsorily even on the |
230 | above-mentioned. | |
231 | ||
232 | The default value is 0. | |
233 | 1 and 2 are for failover of clustering. Please select either | |
234 | according to your policy of failover. | |
ed032189 | 235 | |
fe071d7e DR |
236 | ============================================================= |
237 | ||
fef1bdd6 DR |
238 | oom_dump_tasks |
239 | ||
240 | Enables a system-wide task dump (excluding kernel threads) to be | |
241 | produced when the kernel performs an OOM-killing and includes such | |
242 | information as pid, uid, tgid, vm size, rss, cpu, oom_adj score, and | |
243 | name. This is helpful to determine why the OOM killer was invoked | |
244 | and to identify the rogue task that caused it. | |
245 | ||
246 | If this is set to zero, this information is suppressed. On very | |
247 | large systems with thousands of tasks it may not be feasible to dump | |
248 | the memory state information for each one. Such systems should not | |
249 | be forced to incur a performance penalty in OOM conditions when the | |
250 | information may not be desired. | |
251 | ||
252 | If this is set to non-zero, this information is shown whenever the | |
253 | OOM killer actually kills a memory-hogging task. | |
254 | ||
255 | The default value is 0. | |
256 | ||
257 | ============================================================= | |
258 | ||
fe071d7e DR |
259 | oom_kill_allocating_task |
260 | ||
261 | This enables or disables killing the OOM-triggering task in | |
262 | out-of-memory situations. | |
263 | ||
264 | If this is set to zero, the OOM killer will scan through the entire | |
265 | tasklist and select a task based on heuristics to kill. This normally | |
266 | selects a rogue memory-hogging task that frees up a large amount of | |
267 | memory when killed. | |
268 | ||
269 | If this is set to non-zero, the OOM killer simply kills the task that | |
270 | triggered the out-of-memory condition. This avoids the expensive | |
271 | tasklist scan. | |
272 | ||
273 | If panic_on_oom is selected, it takes precedence over whatever value | |
274 | is used in oom_kill_allocating_task. | |
275 | ||
276 | The default value is 0. | |
277 | ||
ed032189 EP |
278 | ============================================================== |
279 | ||
280 | mmap_min_addr | |
281 | ||
282 | This file indicates the amount of address space which a user process will | |
283 | be restricted from mmaping. Since kernel null dereference bugs could | |
284 | accidentally operate based on the information in the first couple of pages | |
285 | of memory userspace processes should not be allowed to write to them. By | |
286 | default this value is set to 0 and no protections will be enforced by the | |
287 | security module. Setting this value to something like 64k will allow the | |
288 | vast majority of applications to work correctly and provide defense in depth | |
289 | against future potential kernel bugs. | |
290 | ||
f0c0b2b8 KH |
291 | ============================================================== |
292 | ||
293 | numa_zonelist_order | |
294 | ||
295 | This sysctl is only for NUMA. | |
296 | 'where the memory is allocated from' is controlled by zonelists. | |
297 | (This documentation ignores ZONE_HIGHMEM/ZONE_DMA32 for simple explanation. | |
298 | you may be able to read ZONE_DMA as ZONE_DMA32...) | |
299 | ||
300 | In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following. | |
301 | ZONE_NORMAL -> ZONE_DMA | |
302 | This means that a memory allocation request for GFP_KERNEL will | |
303 | get memory from ZONE_DMA only when ZONE_NORMAL is not available. | |
304 | ||
305 | In NUMA case, you can think of following 2 types of order. | |
306 | Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL | |
307 | ||
308 | (A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL | |
309 | (B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA. | |
310 | ||
311 | Type(A) offers the best locality for processes on Node(0), but ZONE_DMA | |
312 | will be used before ZONE_NORMAL exhaustion. This increases possibility of | |
313 | out-of-memory(OOM) of ZONE_DMA because ZONE_DMA is tend to be small. | |
314 | ||
315 | Type(B) cannot offer the best locality but is more robust against OOM of | |
316 | the DMA zone. | |
317 | ||
318 | Type(A) is called as "Node" order. Type (B) is "Zone" order. | |
319 | ||
320 | "Node order" orders the zonelists by node, then by zone within each node. | |
321 | Specify "[Nn]ode" for zone order | |
322 | ||
323 | "Zone Order" orders the zonelists by zone type, then by node within each | |
324 | zone. Specify "[Zz]one"for zode order. | |
325 | ||
326 | Specify "[Dd]efault" to request automatic configuration. Autoconfiguration | |
327 | will select "node" order in following case. | |
328 | (1) if the DMA zone does not exist or | |
329 | (2) if the DMA zone comprises greater than 50% of the available memory or | |
330 | (3) if any node's DMA zone comprises greater than 60% of its local memory and | |
331 | the amount of local memory is big enough. | |
332 | ||
333 | Otherwise, "zone" order will be selected. Default order is recommended unless | |
334 | this is causing problems for your system/application. | |
d5dbac87 NA |
335 | |
336 | ============================================================== | |
337 | ||
338 | nr_hugepages | |
339 | ||
340 | Change the minimum size of the hugepage pool. | |
341 | ||
342 | See Documentation/vm/hugetlbpage.txt | |
343 | ||
344 | ============================================================== | |
345 | ||
346 | nr_overcommit_hugepages | |
347 | ||
348 | Change the maximum size of the hugepage pool. The maximum is | |
349 | nr_hugepages + nr_overcommit_hugepages. | |
350 | ||
351 | See Documentation/vm/hugetlbpage.txt | |
dd8632a1 PM |
352 | |
353 | ============================================================== | |
354 | ||
355 | nr_trim_pages | |
356 | ||
357 | This is available only on NOMMU kernels. | |
358 | ||
359 | This value adjusts the excess page trimming behaviour of power-of-2 aligned | |
360 | NOMMU mmap allocations. | |
361 | ||
362 | A value of 0 disables trimming of allocations entirely, while a value of 1 | |
363 | trims excess pages aggressively. Any value >= 1 acts as the watermark where | |
364 | trimming of allocations is initiated. | |
365 | ||
366 | The default value is 1. | |
367 | ||
368 | See Documentation/nommu-mmap.txt for more information. |