GitHub/LineageOS/android_kernel_samsung_universal7580.git
18 years ago[PATCH] uml: add locking to xtime accesses
Jeff Dike [Fri, 30 Jun 2006 08:55:56 +0000 (01:55 -0700)]
[PATCH] uml: add locking to xtime accesses

do_timer must be called with xtime_lock held.  I'm not sure boot_timer_handler
needs this, however I don't think it hurts: it simply disables irq and takes a
spinlock.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: unregister useless console when it's not needed
Jeff Dike [Fri, 30 Jun 2006 08:55:55 +0000 (01:55 -0700)]
[PATCH] uml: unregister useless console when it's not needed

-mm in combination with an FC5 init started dying with 'stderr=1' because init
didn't like the lack of /dev/console and exited.  The problem was that the
stderr console, which is intended to dump printk output to the terminal before
the regular console is initialized, isn't a tty, and so can't make
/dev/console operational.

However, since it is registered first, the normal console, when it is
registered, doesn't become the preferred console, and isn't attached to
/dev/console.  Thus, /dev/console is never operational.

This patch makes the stderr console unregister itself in an initcall, which is
late enough that the normal console is registered.  When that happens, the
normal console will become the preferred console and will be able to run
/dev/console.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: fix off-by-one bug in VM file creation
Jeff Dike [Fri, 30 Jun 2006 08:55:55 +0000 (01:55 -0700)]
[PATCH] uml: fix off-by-one bug in VM file creation

Fix an off-by-one bug in temp file creation.  Seeking to the desired length
and writing a byte resulted in the file being one byte longer than expected.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: fix /proc/mounts parsing boundary condition
Jeff Dike [Fri, 30 Jun 2006 08:55:54 +0000 (01:55 -0700)]
[PATCH] uml: fix /proc/mounts parsing boundary condition

When parsing /proc/mounts looking for a tmpfs mount on /dev/shm, if a string
that we are looking for if split across reads, then it won't be recognized.

Fix this by refilling the buffer whenever we advance the cursor.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] UML: fix the INIT_ENV_ARG_LIMIT dependencies
Adrian Bunk [Fri, 30 Jun 2006 08:55:51 +0000 (01:55 -0700)]
[PATCH] UML: fix the INIT_ENV_ARG_LIMIT dependencies

Fix the INIT_ENV_ARG_LIMIT dependencies to what seems to have been
intended.

Spotted by Jean-Luc Leger.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] add smp_setup_processor_id()
Andrew Morton [Fri, 30 Jun 2006 08:55:50 +0000 (01:55 -0700)]
[PATCH] add smp_setup_processor_id()

Presently, smp_processor_id() isn't necessarily set up until setup_arch().
But it's used in boot_cpu_init() and printk() and perhaps in other places,
prior to setup_arch() being called.

So provide a new smp_setup_processor_id() which is called before anything
else, wire it up for Voyager (which boots on a CPU other than #0, and broke).

Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SELinux: Add security hook definition for getioprio and insert hooks
David Quigley [Fri, 30 Jun 2006 08:55:49 +0000 (01:55 -0700)]
[PATCH] SELinux: Add security hook definition for getioprio and insert hooks

Add a new security hook definition for the sys_ioprio_get operation.  At
present, the SELinux hook function implementation for this hook is
identical to the getscheduler implementation but a separate hook is
introduced to allow this check to be specialized in the future if
necessary.

This patch also creates a helper function get_task_ioprio which handles the
access check in addition to retrieving the ioprio value for the task.

Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SELinux: update USB code with new kill_proc_info_as_uid
David Quigley [Fri, 30 Jun 2006 08:55:48 +0000 (01:55 -0700)]
[PATCH] SELinux: update USB code with new kill_proc_info_as_uid

This patch updates the USB core to save and pass the sending task secid when
sending signals upon AIO completion so that proper security checking can be
applied by security modules.

Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SELinux: add security hook call to kill_proc_info_as_uid
David Quigley [Fri, 30 Jun 2006 08:55:47 +0000 (01:55 -0700)]
[PATCH] SELinux: add security hook call to kill_proc_info_as_uid

This patch adds a call to the extended security_task_kill hook introduced by
the prior patch to the kill_proc_info_as_uid function so that these signals
can be properly mediated by security modules.  It also updates the existing
hook call in check_kill_permission.

Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SELinux: extend task_kill hook to handle signals sent by AIO completion
David Quigley [Fri, 30 Jun 2006 08:55:46 +0000 (01:55 -0700)]
[PATCH] SELinux: extend task_kill hook to handle signals sent by AIO completion

This patch extends the security_task_kill hook to handle signals sent by AIO
completion.  In this case, the secid of the task responsible for the signal
needs to be obtained and saved earlier, so a security_task_getsecid() hook is
added, and then this saved value is passed subsequently to the extended
task_kill hook for use in checking.

Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] slab: consolidate code to free slabs from freelist
Christoph Lameter [Fri, 30 Jun 2006 08:55:45 +0000 (01:55 -0700)]
[PATCH] slab: consolidate code to free slabs from freelist

Post and discussion:
http://marc.theaimsgroup.com/?t=115074342800003&r=1&w=2

Code in __shrink_node() duplicates code in cache_reap()

Add a new function drain_freelist that removes slabs with objects that are
already free and use that in various places.

This eliminates the __node_shrink() function and provides the interrupt
holdoff reduction from slab_free to code that used to call __node_shrink.

[akpm@osdl.org: build fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Light weight event counters
Christoph Lameter [Fri, 30 Jun 2006 08:55:45 +0000 (01:55 -0700)]
[PATCH] Light weight event counters

The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat.  They have no
essential function for the VM.

We use a simple increment of per cpu variables.  In order to avoid the most
severe races we disable preempt.  Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter.  However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter.  For many architectures this will be reduced by the compiler to a
single instruction.  This single instruction is atomic for i386 and x86_64.
 And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
  on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
  Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Use Zoned VM Counters for NUMA statistics
Christoph Lameter [Fri, 30 Jun 2006 08:55:44 +0000 (01:55 -0700)]
[PATCH] Use Zoned VM Counters for NUMA statistics

The numa statistics are really event counters.  But they are per node and
so we have had special treatment for these counters through additional
fields on the pcp structure.  We can now use the per zone nature of the
zoned VM counters to realize these.

This will shrink the size of the pcp structure on NUMA systems.  We will
have some room to add additional per zone counters that will all still fit
in the same cacheline.

 Bits Prior pcp size    Size after patch We can add
 ------------------------------------------------------------------
 64 128 bytes (16 words) 80 bytes (10 words) 48
 32  76 bytes (19 words) 56 bytes (14 words) 8 (64 byte cacheline)
72 (128 byte)

Remove the special statistics for numa and replace them with zoned vm
counters.  This has the side effect that global sums of these events now
show up in /proc/vmstat.

Also take the opportunity to move the zone_statistics() function from
page_alloc.c into vmstat.c.

Discussions:
V2 http://marc.theaimsgroup.com/?t=115048227000002&r=1&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned-vm-counters: remove read_page_state()
Andrew Morton [Fri, 30 Jun 2006 08:55:43 +0000 (01:55 -0700)]
[PATCH] zoned-vm-counters: remove read_page_state()

No callers.

Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: remove useless struct wbs
Christoph Lameter [Fri, 30 Jun 2006 08:55:42 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: remove useless struct wbs

Remove writeback state

We can remove some functions now that were needed to calculate the page state
for writeback control since these statistics are now directly available.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_bounce to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:41 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_bounce to per zone counter

Conversion of nr_bounce to a per zone counter

nr_bounce is only used for proc output.  So it could be left as an event
counter.  However, the event counters may not be accurate and nr_bounce is
categorizing types of pages in a zone.  So we really need this to also be a
per zone counter.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_unstable to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:40 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_unstable to per zone counter

Conversion of nr_unstable to a per zone counter

We need to do some special modifications to the nfs code since there are
multiple cases of disposition and we need to have a page ref for proper
accounting.

This converts the last critical page state of the VM and therefore we need to
remove several functions that were depending on GET_PAGE_STATE_LAST in order
to make the kernel compile again.  We are only left with event type counters
in page state.

[akpm@osdl.org: bugfixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_writeback to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:40 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_writeback to per zone counter

Conversion of nr_writeback to per zone counter.

This removes the last page_state counter from arch/i386/mm/pgtable.c so we
drop the page_state from there.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_dirty to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:39 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_dirty to per zone counter

This makes nr_dirty a per zone counter.  Looping over all processors is
avoided during writeback state determination.

The counter aggregation for nr_dirty had to be undone in the NFS layer since
we summed up the page counts from multiple zones.  Someone more familiar with
NFS should probably review what I have done.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_pagetables to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:38 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_pagetables to per zone counter

Conversion of nr_page_table_pages to a per zone counter

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_slab to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:38 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_slab to per zone counter

- Allows reclaim to access counter without looping over processor counts.

- Allows accurate statistics on how many pages are used in a zone by
  the slab. This may become useful to balance slab allocations over
  various zones.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: zone_reclaim: remove /proc/sys/vm/zone_reclaim_interval
Christoph Lameter [Fri, 30 Jun 2006 08:55:37 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: zone_reclaim: remove /proc/sys/vm/zone_reclaim_interval

The zone_reclaim_interval was necessary because we were not able to determine
how many unmapped pages exist in a zone.  Therefore we had to scan in
intervals to figure out if any pages were unmapped.

With the zoned counters and NR_ANON_PAGES we now know the number of pagecache
pages and the number of mapped pages in a zone.  So we can simply skip the
reclaim if there is an insufficient number of unmapped pages.  We use
SWAP_CLUSTER_MAX as the boundary.

Drop all support for /proc/sys/vm/zone_reclaim_interval.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPED
Christoph Lameter [Fri, 30 Jun 2006 08:55:36 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPED

The current NR_FILE_MAPPED is used by zone reclaim and the dirty load
calculation as the number of mapped pagecache pages.  However, that is not
true.  NR_FILE_MAPPED includes the mapped anonymous pages.  This patch
separates those and therefore allows an accurate tracking of the anonymous
pages per zone.

It then becomes possible to determine the number of unmapped pages per zone
and we can avoid scanning for unmapped pages if there are none.

Also it may now be possible to determine the mapped/unmapped ratio in
get_dirty_limit.  Isnt the number of anonymous pages irrelevant in that
calculation?

Note that this will change the meaning of the number of mapped pages reported
in /proc/vmstat /proc/meminfo and in the per node statistics.  This may affect
user space tools that monitor these counters!  NR_FILE_MAPPED works like
NR_FILE_DIRTY.  It is only valid for pagecache pages.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: remove NR_FILE_MAPPED from scan control structure
Christoph Lameter [Fri, 30 Jun 2006 08:55:36 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: remove NR_FILE_MAPPED from scan control structure

We can now access the number of pages in a mapped state in an inexpensive way
in shrink_active_list.  So drop the nr_mapped field from scan_control.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_pagecache to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:35 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_pagecache to per zone counter

Currently a single atomic variable is used to establish the size of the page
cache in the whole machine.  The zoned VM counters have the same method of
implementation as the nr_pagecache code but also allow the determination of
the pagecache size per zone.

Remove the special implementation for nr_pagecache and make it a zoned counter
named NR_FILE_PAGES.

Updates of the page cache counters are always performed with interrupts off.
We can therefore use the __ variant here.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: convert nr_mapped to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:34 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: convert nr_mapped to per zone counter

nr_mapped is important because it allows a determination of how many pages of
a zone are not mapped, which would allow a more efficient means of determining
when we need to reclaim memory in a zone.

We take the nr_mapped field out of the page state structure and define a new
per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off
from NR_MAPPED in the next patch).

We replace the use of nr_mapped in various kernel locations.  This avoids the
looping over all processors in try_to_free_pages(), writeback, reclaim (swap +
zone reclaim).

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: basic ZVC (zoned vm counter) implementation
Christoph Lameter [Fri, 30 Jun 2006 08:55:33 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: basic ZVC (zoned vm counter) implementation

Per zone counter infrastructure

The counters that we currently have for the VM are split per processor.  The
processor however has not much to do with the zone these pages belong to.  We
cannot tell f.e.  how many ZONE_DMA pages are dirty.

So we are blind to potentially inbalances in the usage of memory in various
zones.  F.e.  in a NUMA system we cannot tell how many pages are dirty on a
particular node.  If we knew then we could put measures into the VM to balance
the use of memory between different zones and different nodes in a NUMA
system.  For example it would be possible to limit the dirty pages per node so
that fast local memory is kept available even if a process is dirtying huge
amounts of pages.

Another example is zone reclaim.  We do not know how many unmapped pages exist
per zone.  So we just have to try to reclaim.  If it is not working then we
pause and try again later.  It would be better if we knew when it makes sense
to reclaim unmapped pages from a zone.  This patchset allows the determination
of the number of unmapped pages per zone.  We can remove the zone reclaim
interval with the counters introduced here.

Futhermore the ability to have various usage statistics available will allow
the development of new NUMA balancing algorithms that may be able to improve
the decision making in the scheduler of when to move a process to another node
and hopefully will also enable automatic page migration through a user space
program that can analyse the memory load distribution and then rebalance
memory use in order to increase performance.

The counter framework here implements differential counters for each processor
in struct zone.  The differential counters are consolidated when a threshold
is exceeded (like done in the current implementation for nr_pageache), when
slab reaping occurs or when a consolidation function is called.

Consolidation uses atomic operations and accumulates counters per zone in the
zone structure and also globally in the vm_stat array.  VM functions can
access the counts by simply indexing a global or zone specific array.

The arrangement of counters in an array also simplifies processing when output
has to be generated for /proc/*.

Counters can be updated by calling inc/dec_zone_page_state or
_inc/dec_zone_page_state analogous to *_page_state.  The second group of
functions can be called if it is known that interrupts are disabled.

Special optimized increment and decrement functions are provided.  These can
avoid certain checks and use increment or decrement instructions that an
architecture may provide.

We also add a new CONFIG_DMA_IS_NORMAL that signifies that an architecture can
do DMA to all memory and therefore ZONE_NORMAL will not be populated.  This is
only currently set for IA64 SGI SN2 and currently only affects
node_page_state().  In the best case node_page_state can be reduced to
retrieving a single counter for the one zone on the node.

[akpm@osdl.org: cleanups]
[akpm@osdl.org: export vm_stat[] for filesystems]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h
Christoph Lameter [Fri, 30 Jun 2006 08:55:32 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h

NOTE: ZVC are *not* the lightweight event counters.  ZVCs are reliable whereas
event counters do not need to be.

Zone based VM statistics are necessary to be able to determine what the state
of memory in one zone is.  In a NUMA system this can be helpful for local
reclaim and other memory optimizations that may be able to shift VM load in
order to get more balanced memory use.

It is also useful to know how the computing load affects the memory
allocations on various zones.  This patchset allows the retrieval of that data
from userspace.

The patchset introduces a framework for counters that is a cross between the
existing page_stats --which are simply global counters split per cpu-- and the
approach of deferred incremental updates implemented for nr_pagecache.

Small per cpu 8 bit counters are added to struct zone.  If the counter exceeds
certain thresholds then the counters are accumulated in an array of
atomic_long in the zone and in a global array that sums up all zone values.
The small 8 bit counters are next to the per cpu page pointers and so they
will be in high in the cpu cache when pages are allocated and freed.

Access to VM counter information for a zone and for the whole machine is then
possible by simply indexing an array (Thanks to Nick Piggin for pointing out
that approach).  The access to the total number of pages of various types does
no longer require the summing up of all per cpu counters.

Benefits of this patchset right now:

- Ability for UP and SMP configuration to determine how memory
  is balanced between the DMA, NORMAL and HIGHMEM zones.

- loops over all processors are avoided in writeback and
  reclaim paths. We can avoid caching the writeback information
  because the needed information is directly accessible.

- Special handling for nr_pagecache removed.

- zone_reclaim_interval vanishes since VM stats can now determine
  when it is worth to do local reclaim.

- Fast inline per node page state determination.

- Accurate counters in /sys/devices/system/node/node*/meminfo. Current
  counters are counting simply which processor allocated a page somewhere
  and guestimate based on that. So the counters were not useful to show
  the actual distribution of page use on a specific zone.

- The swap_prefetch patch requires per node statistics in order to
  figure out when processors of a node can prefetch. This patch provides
  some of the needed numbers.

- Detailed VM counters available in more /proc and /sys status files.

References to earlier discussions:
V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2

Performance tests with AIM7 did not show any regressions.  Seems to be a tad
faster even.  Tested on ia64/NUMA.  Builds fine on i386, SMP / UP.  Includes
fixes for s390/arm/uml arch code.

This patch:

Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.

Create vmstat.c/vmstat.h by separating the counter code and the proc
functions.

Move the vm_stat_text array before zoneinfo_show.

[akpm@osdl.org: s390 build fix]
[akpm@osdl.org: HOTPLUG_CPU build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix ISTALLION=y
Adrian Bunk [Fri, 30 Jun 2006 08:55:30 +0000 (01:55 -0700)]
[PATCH] fix ISTALLION=y

drivers/char/istallion.c: In function Ă˘\80\98stli_initbrdsâ\80\99:
drivers/char/istallion.c:4150: error: implicit declaration of function Ă˘\80\98stli_parsebrdâ\80\99
drivers/char/istallion.c:4150: error: Ă˘\80\98stli_brdspâ\80\99 undeclared (first use in this function)
drivers/char/istallion.c:4150: error: (Each undeclared identifier is reported only once
drivers/char/istallion.c:4150: error: for each function it appears in.)
drivers/char/istallion.c:4164: error: implicit declaration of function Ă˘\80\98stli_argbrdsâ\80\99

While I was at it, I also removed the #ifdef MODULE around the initialation
code to allow it to perhaps work when built into the kernel and made a
needlessly global function static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] msr.c: use register_hotcpu_notifier()
Andrew Morton [Fri, 30 Jun 2006 08:55:29 +0000 (01:55 -0700)]
[PATCH] msr.c: use register_hotcpu_notifier()

register_cpu_notifier() cannot do anything in a module, in a
!CONFIG_HOTPLUG_CPU kernel.

Cc: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix platform_device_put/del mishaps
Ingo Molnar [Fri, 30 Jun 2006 08:55:29 +0000 (01:55 -0700)]
[PATCH] fix platform_device_put/del mishaps

This fixes drivers/char/pc8736x_gpio.c and drivers/char/scx200_gpio.c to
use the platform_device_del/put ops correctly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix drivers/video/imacfb.c compilation
Ingo Molnar [Fri, 30 Jun 2006 08:55:27 +0000 (01:55 -0700)]
[PATCH] fix drivers/video/imacfb.c compilation

Fix build error on x86_64.  There's nothing even remotely close to
imacmp_seg in the kernel, so I removed the whole line.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Edgar Hucek <hostmaster@ed-soft.at>
Cc: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfashe...
Linus Torvalds [Fri, 30 Jun 2006 00:44:21 +0000 (17:44 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/mfasheh/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks()
  ocfs2: clean up some osb fields
  ocfs2: fix init of uuid_net_key
  ocfs2: silence a debug print
  ocfs2: silence ENOENT during lookup of broken links
  ocfs2: Cleanup message prints
  ocfs2: silence -EEXIST from ocfs2_extent_map_insert/lookup
  [PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() static
  ocfs2: warn the user on a dead timeout mismatch
  ocfs2: OCFS2_FS must depend on SYSFS
  ocfs2: Compile-time disabling of ocfs2 debugging output.
  configfs: Clear up a few extra spaces where there should be TABs.
  configfs: Release memory in configfs_example.

18 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Fri, 30 Jun 2006 00:43:43 +0000 (17:43 -0700)]
Merge /pub/scm/linux/kernel/git/davem/net-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
  [TIPC]: Initial activation message now includes TIPC version number
  [TIPC]: Improve response to requests for node/link information
  [TIPC]: Fixed skb_under_panic caused by tipc_link_bundle_buf
  [IrDA]: Fix the AU1000 FIR dependencies
  [IrDA]: Fix RCU lock pairing on error path
  [XFRM]: unexport xfrm_state_mtu
  [NET]: make skb_release_data() static
  [NETFILTE] ipv4: Fix typo (Bugzilla #6753)
  [IrDA]: MCS7780 usb_driver struct should be static
  [BNX2]: Turn off link during shutdown
  [BNX2]: Use dev_kfree_skb() instead of the _irq version
  [ATM]: basic sysfs support for ATM devices
  [ATM]: [suni] change suni_init to __devinit
  [ATM]: [iphase] should be __devinit not __init
  [ATM]: [idt77105] should be __devinit not __init
  [BNX2]: Add NETIF_F_TSO_ECN
  [NET]: Add ECN support for TSO
  [AF_UNIX]: Datagram getpeersec
  [NET]: Fix logical error in skb_gso_ok
  [PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000
  ...

18 years ago[TIPC]: Initial activation message now includes TIPC version number
Allan Stephens [Thu, 29 Jun 2006 19:33:51 +0000 (12:33 -0700)]
[TIPC]: Initial activation message now includes TIPC version number

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TIPC]: Improve response to requests for node/link information
Allan Stephens [Thu, 29 Jun 2006 19:33:20 +0000 (12:33 -0700)]
[TIPC]: Improve response to requests for node/link information

Now allocates reply space for "get links" request based on number of actual
links, not number of potential links.  Also, limits reply to "get links" and
"get nodes" requests to 32KB to match capabilities of tipc-config utility
that issued request.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
18 years ago[TIPC]: Fixed skb_under_panic caused by tipc_link_bundle_buf
Allan Stephens [Thu, 29 Jun 2006 19:32:46 +0000 (12:32 -0700)]
[TIPC]: Fixed skb_under_panic caused by tipc_link_bundle_buf

Now determines tailroom of bundle buffer by directly inspection of buffer.
Previously, buffer was assumed to have a max capacity equal to the link MTU,
but the addition of link MTU negotiation means that the link MTU can increase
after the bundle buffer is allocated.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IrDA]: Fix the AU1000 FIR dependencies
Adrian Bunk [Fri, 30 Jun 2006 00:03:19 +0000 (17:03 -0700)]
[IrDA]: Fix the AU1000 FIR dependencies

AU1000 FIR is broken, it should depend on SOC_AU1000.

Spotted by Jean-Luc Leger.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IrDA]: Fix RCU lock pairing on error path
Josh Triplett [Fri, 30 Jun 2006 00:02:31 +0000 (17:02 -0700)]
[IrDA]: Fix RCU lock pairing on error path

irlan_client_discovery_indication calls rcu_read_lock and rcu_read_unlock, but
returns without unlocking in an error case.  Fix that by replacing the return
with a goto so that the rcu_read_unlock always gets executed.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Samuel Ortiz samuel@sortiz.org <>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[XFRM]: unexport xfrm_state_mtu
Adrian Bunk [Thu, 29 Jun 2006 20:04:41 +0000 (13:04 -0700)]
[XFRM]: unexport xfrm_state_mtu

This patch removes the unused EXPORT_SYMBOL(xfrm_state_mtu).

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: make skb_release_data() static
Adrian Bunk [Thu, 29 Jun 2006 20:02:35 +0000 (13:02 -0700)]
[NET]: make skb_release_data() static

skb_release_data() no longer has any users in other files.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTE] ipv4: Fix typo (Bugzilla #6753)
Matt LaPlante [Thu, 29 Jun 2006 19:51:15 +0000 (12:51 -0700)]
[NETFILTE] ipv4: Fix typo (Bugzilla #6753)

This patch fixes bugzilla #6753, a typo in the netfilter Kconfig

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IrDA]: MCS7780 usb_driver struct should be static
Adrian Bunk [Thu, 29 Jun 2006 19:39:07 +0000 (12:39 -0700)]
[IrDA]: MCS7780 usb_driver struct should be static

This patch makes a needlessly global struct static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BNX2]: Turn off link during shutdown
Michael Chan [Thu, 29 Jun 2006 19:38:15 +0000 (12:38 -0700)]
[BNX2]: Turn off link during shutdown

Minor change in shutdown logic to effect a link down.

Update version to 1.4.43.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BNX2]: Use dev_kfree_skb() instead of the _irq version
Michael Chan [Thu, 29 Jun 2006 19:37:41 +0000 (12:37 -0700)]
[BNX2]: Use dev_kfree_skb() instead of the _irq version

Change all dev_kfree_skb_irq() and dev_kfree_skb_any() to
dev_kfree_skb().  These calls are never used in irq context.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ATM]: basic sysfs support for ATM devices
Roman Kagan [Thu, 29 Jun 2006 19:36:34 +0000 (12:36 -0700)]
[ATM]: basic sysfs support for ATM devices

Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ATM]: [suni] change suni_init to __devinit
Chas Williams [Thu, 29 Jun 2006 19:35:49 +0000 (12:35 -0700)]
[ATM]: [suni] change suni_init to __devinit

Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ATM]: [iphase] should be __devinit not __init
Chas Williams [Thu, 29 Jun 2006 19:35:32 +0000 (12:35 -0700)]
[ATM]: [iphase] should be __devinit not __init

Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ATM]: [idt77105] should be __devinit not __init
Chas Williams [Thu, 29 Jun 2006 19:35:02 +0000 (12:35 -0700)]
[ATM]: [idt77105] should be __devinit not __init

Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BNX2]: Add NETIF_F_TSO_ECN
Michael Chan [Thu, 29 Jun 2006 19:31:21 +0000 (12:31 -0700)]
[BNX2]: Add NETIF_F_TSO_ECN

Add NETIF_F_TSO_ECN feature for all bnx2 hardware.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Add ECN support for TSO
Michael Chan [Thu, 29 Jun 2006 19:30:00 +0000 (12:30 -0700)]
[NET]: Add ECN support for TSO

In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet.  Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.

This patch adds the ability to turn on NETIF_F_TSO and ECN using
GSO if necessary to handle TSO packets with CWR set.  Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev->
features flag.

All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.

With help from Herbert Xu <herbert@gondor.apana.org.au>.

Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock
flag is completely removed.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[AF_UNIX]: Datagram getpeersec
Catherine Zhang [Thu, 29 Jun 2006 19:27:47 +0000 (12:27 -0700)]
[AF_UNIX]: Datagram getpeersec

This patch implements an API whereby an application can determine the
label of its peer's Unix datagram sockets via the auxiliary data mechanism of
recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of the peer of a Unix datagram socket.  The application
can then use this security context to determine the security context for
processing on behalf of the peer who sent the packet.

Patch design and implementation:

The design and implementation is very similar to the UDP case for INET
sockets.  Basically we build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).  To retrieve the security
context, the application first indicates to the kernel such desire by
setting the SO_PASSSEC option via getsockopt.  Then the application
retrieves the security context using the auxiliary data mechanism.

An example server application for Unix datagram socket should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
        cmsg_hdr->cmsg_level == SOL_SOCKET &&
        cmsg_hdr->cmsg_type == SCM_SECURITY) {
        memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
}

sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
a server socket to receive security context of the peer.

Testing:

We have tested the patch by setting up Unix datagram client and server
applications.  We verified that the server can retrieve the security context
using the auxiliary data mechanism of recvmsg.

Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Fix logical error in skb_gso_ok
Herbert Xu [Thu, 29 Jun 2006 19:25:53 +0000 (12:25 -0700)]
[NET]: Fix logical error in skb_gso_ok

The test in skb_gso_ok is backwards.  Noticed by Michael Chan
<mchan@broadcom.com>.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000
Shuya MAEDA [Wed, 28 Jun 2006 08:40:35 +0000 (01:40 -0700)]
[PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000

Signed-off-by: Shuya MAEDA <maeda-sxb@necst.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Make illegal_highdma more anal
Herbert Xu [Tue, 27 Jun 2006 20:33:10 +0000 (13:33 -0700)]
[NET]: Make illegal_highdma more anal

Rather than having illegal_highdma as a macro when HIGHMEM is off, we
can turn it into an inline function that returns zero.  This will catch
callers that give it bad arguments.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TCP]: Export accept queue len of a TCP listening socket via rx_queue
Sridhar Samudrala [Tue, 27 Jun 2006 20:29:00 +0000 (13:29 -0700)]
[TCP]: Export accept queue len of a TCP listening socket via rx_queue

While debugging a TCP server hang issue, we noticed that currently there is
no way for a user to get the acceptq backlog value for a TCP listen socket.

All the standard networking utilities that display socket info like netstat,
ss and /proc/net/tcp have 2 fields called rx_queue and tx_queue. These
fields do not mean much for listening sockets. This patch uses one of these
unused fields(rx_queue) to export the accept queue len for listening sockets.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETLINK]: Encapsulate eff_cap usage within security framework.
Darrel Goeddel [Tue, 27 Jun 2006 20:26:11 +0000 (13:26 -0700)]
[NETLINK]: Encapsulate eff_cap usage within security framework.

This patch encapsulates the usage of eff_cap (in netlink_skb_params) within
the security framework by extending security_netlink_recv to include a required
capability parameter and converting all direct usage of eff_caps outside
of the lsm modules to use the interface.  It also updates the SELinux
implementation of the security_netlink_send and security_netlink_recv
hooks to take advantage of the sid in the netlink_skb_params struct.
This also enables SELinux to perform auditing of netlink capability checks.
Please apply, for 2.6.18 if possible.

Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Added GSO header verification
Herbert Xu [Tue, 27 Jun 2006 20:22:38 +0000 (13:22 -0700)]
[NET]: Added GSO header verification

When GSO packets come from an untrusted source (e.g., a Xen guest domain),
we need to verify the header integrity before passing it to the hardware.

Since the first step in GSO is to verify the header, we can reuse that
code by adding a new bit to gso_type: SKB_GSO_DODGY.  Packets with this
bit set can only be fed directly to devices with the corresponding bit
NETIF_F_GSO_ROBUST.  If the device doesn't have that bit, then the skb
is fed to the GSO engine which will allow the packet to be sent to the
hardware if it passes the header check.

This patch changes the sg flag to a full features flag.  The same method
can be used to implement TSO ECN support.  We simply have to mark packets
with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
NETIF_F_TSO_ECN can accept them.  The GSO engine can either fully segment
the packet, or segment the first MTU and pass the rest to the hardware for
further segmentation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: statistic match: add missing Kconfig help text
Patrick McHardy [Tue, 27 Jun 2006 10:02:14 +0000 (03:02 -0700)]
[NETFILTER]: statistic match: add missing Kconfig help text

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: ip_queue/nfnetlink_queue: drop bridge port references when dev disappears
Patrick McHardy [Tue, 27 Jun 2006 10:01:48 +0000 (03:01 -0700)]
[NETFILTER]: ip_queue/nfnetlink_queue: drop bridge port references when dev disappears

When a device that is acting as a bridge port is unregistered, the
ip_queue/nfnetlink_queue notifier doesn't check if its one of
physindev/physoutdev and doesn't release the references if it is.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: xt_sctp: fix --chunk-types matching
Jorge Matias [Tue, 27 Jun 2006 10:01:25 +0000 (03:01 -0700)]
[NETFILTER]: xt_sctp: fix --chunk-types matching

xt_sctp uses an incorrect header offset when --chunk-types is used.

Signed-off-by: Jorge Matias <jorge.matias@motorola.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: xt_tcpudp: fix double unregistration in error path
Yuri Gushin [Tue, 27 Jun 2006 10:01:03 +0000 (03:01 -0700)]
[NETFILTER]: xt_tcpudp: fix double unregistration in error path

"xt_unregister_match(AF_INET, &tcp_matchstruct)" is called twice,
leaving "udp_matchstruct" registered, in case of a failure in the
registration of the udp6 structure.

Signed-off-by: Yuri Gushin <yuri@ecl-labs.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: nf_conntrack: Fix undefined references to local_bh_*
Yasuyuki Kozakai [Tue, 27 Jun 2006 10:00:35 +0000 (03:00 -0700)]
[NETFILTER]: nf_conntrack: Fix undefined references to local_bh_*

  CC      net/netfilter/nf_conntrack_proto_sctp.o
net/netfilter/nf_conntrack_proto_sctp.c: In function `sctp_print_conntrack':
net/netfilter/nf_conntrack_proto_sctp.c:206: warning: implicit declaration of function `local_bh_disable'
net/netfilter/nf_conntrack_proto_sctp.c:208: warning: implicit declaration of function `local_bh_enable'
  CC      net/netfilter/nf_conntrack_netlink.o
net/netfilter/nf_conntrack_netlink.c: In function `ctnetlink_dump_table':
net/netfilter/nf_conntrack_netlink.c:429: warning: implicit declaration of function `local_bh_disable'
net/netfilter/nf_conntrack_netlink.c:452: warning: implicit declaration of function `local_bh_enable'

Spotted by Toralf Förster

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: x_tables: fix xt_register_table error propagation
Patrick McHardy [Tue, 27 Jun 2006 10:00:09 +0000 (03:00 -0700)]
[NETFILTER]: x_tables: fix xt_register_table error propagation

When xt_register_table fails the error is not properly propagated back.
Based on patch by Lepton Wu <ytht.net@gmail.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SUNHME]: Mark SBUS probing routines as __devinit.
David S. Miller [Thu, 29 Jun 2006 23:20:12 +0000 (16:20 -0700)]
[SUNHME]: Mark SBUS probing routines as __devinit.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64]: Print symbol name of regs->tpc on kernel unaligned accesses.
David S. Miller [Thu, 29 Jun 2006 22:48:59 +0000 (15:48 -0700)]
[SPARC64]: Print symbol name of regs->tpc on kernel unaligned accesses.

This makes things easier to track down, especially in modules.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SERIO] i8042-sparcio.h: Convert to of_driver framework.
David S. Miller [Thu, 29 Jun 2006 22:42:29 +0000 (15:42 -0700)]
[SERIO] i8042-sparcio.h: Convert to of_driver framework.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64]: time: Kill unnecessary asm/{fhc,sbus,ebus,isa}.h includes.
David S. Miller [Thu, 29 Jun 2006 22:28:05 +0000 (15:28 -0700)]
[SPARC64]: time: Kill unnecessary asm/{fhc,sbus,ebus,isa}.h includes.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64] power: Convert to of_driver.
David S. Miller [Thu, 29 Jun 2006 22:22:46 +0000 (15:22 -0700)]
[SPARC64] power: Convert to of_driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64] auxio: Remove asm/{sbus,ebus}.h includes.
David S. Miller [Thu, 29 Jun 2006 22:22:22 +0000 (15:22 -0700)]
[SPARC64] auxio: Remove asm/{sbus,ebus}.h includes.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SERIAL] sunsab: Fix section mis-match errors.
David S. Miller [Thu, 29 Jun 2006 22:18:50 +0000 (15:18 -0700)]
[SERIAL] sunsab: Fix section mis-match errors.

sunsab_init_one() needs to be __devinit, not __init

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SERIAL] sunsab: Convert to of_driver framework.
David S. Miller [Thu, 29 Jun 2006 22:17:47 +0000 (15:17 -0700)]
[SERIAL] sunsab: Convert to of_driver framework.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SERIAL] sun{su,zilog}: Add missing MODULE_*() niceties.
David S. Miller [Thu, 29 Jun 2006 22:14:17 +0000 (15:14 -0700)]
[SERIAL] sun{su,zilog}: Add missing MODULE_*() niceties.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SERIAL] sunsu: Convert to of_driver framework.
David S. Miller [Thu, 29 Jun 2006 22:14:03 +0000 (15:14 -0700)]
[SERIAL] sunsu: Convert to of_driver framework.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SERIAL] sunzilog: Fix bugs in device deregristration.
David S. Miller [Thu, 29 Jun 2006 22:13:40 +0000 (15:13 -0700)]
[SERIAL] sunzilog: Fix bugs in device deregristration.

1) Need to unregister 2 ports per of_device.
2) Need to of_iounmap() 1 mapping per of_device.
3) Need to free up the IRQ only after all devices
   have been unregistered.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SERIAL] sunzilog: Convert to of_driver.
David S. Miller [Thu, 29 Jun 2006 22:13:17 +0000 (15:13 -0700)]
[SERIAL] sunzilog: Convert to of_driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC]: sparc32 side of of_device layer IRQ resolution.
David S. Miller [Thu, 29 Jun 2006 22:08:02 +0000 (15:08 -0700)]
[SPARC]: sparc32 side of of_device layer IRQ resolution.

Happily, life is much simpler on 32-bit sparc systems.
The "intr" property, preferred over the "interrupts"
property is used-as.  Some minor translations of this
value happen on sun4d systems.

The stage is now set to rewrite the sparc serial driver
probing to use the of_driver framework, and then to convert
all SBUS, EBUS, and ISA drivers in-kind so that we can nuke
all those special bus frameworks.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64]: of_device layer IRQ resolution
David S. Miller [Thu, 29 Jun 2006 22:07:37 +0000 (15:07 -0700)]
[SPARC64]: of_device layer IRQ resolution

Do IRQ determination generically by parsing the PROM properties,
and using IRQ controller drivers for final resolution.

One immediate positive effect is that all of the IRQ frobbing
in the EBUS, ISA, and PCI controller layers has been eliminated.
We just look up the of_device and use the properly computed
value.

The PCI controller irq_build() routines are gone and no longer
used.  Unfortunately sbus_build_irq() has to remain as there is
a direct reference to this in the sunzilog driver.  That can be
killed off once the sparc32 side of this is written and the
sunzilog driver is transformed into an "of" bus driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64]: Fix typo in clock_probe().
David S. Miller [Thu, 29 Jun 2006 21:43:37 +0000 (14:43 -0700)]
[SPARC64]: Fix typo in clock_probe().

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64] clock: Only probe central fhc clock on Enterprise boxes.
David S. Miller [Thu, 29 Jun 2006 21:39:40 +0000 (14:39 -0700)]
[SPARC64] clock: Only probe central fhc clock on Enterprise boxes.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64] power: Do not pass SA_SHIRQ to request_irq().
David S. Miller [Thu, 29 Jun 2006 21:39:11 +0000 (14:39 -0700)]
[SPARC64] power: Do not pass SA_SHIRQ to request_irq().

This needs to be a unique interrupt source because we do
not have a register or similar to poll to make sure the
IRQ is really for us.  We do not have any dev_id to pass
in anyways, and the generic IRQ layer is now enforcing
that when SA_SHIRQ is specified, dev_id must be non-NULL.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64]: Fix typo in isa_dev_get_irq_using_imap().
David S. Miller [Thu, 29 Jun 2006 21:38:51 +0000 (14:38 -0700)]
[SPARC64]: Fix typo in isa_dev_get_irq_using_imap().

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64]: Let irq_install_pre_handler() get called multiple times.
David S. Miller [Thu, 29 Jun 2006 21:38:21 +0000 (14:38 -0700)]
[SPARC64]: Let irq_install_pre_handler() get called multiple times.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC]: Kill interrupt stuff and linux_phandle from device_node.
David S. Miller [Thu, 29 Jun 2006 21:37:09 +0000 (14:37 -0700)]
[SPARC]: Kill interrupt stuff and linux_phandle from device_node.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC]: Convert clock drivers to of_driver framework.
David S. Miller [Thu, 29 Jun 2006 21:36:52 +0000 (14:36 -0700)]
[SPARC]: Convert clock drivers to of_driver framework.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64] auxio: Convert to pure of_device driver.
David S. Miller [Thu, 29 Jun 2006 21:36:35 +0000 (14:36 -0700)]
[SPARC64] auxio: Convert to pure of_device driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC]: Convert all FB SBUS drivers to of_driver framework.
David S. Miller [Thu, 29 Jun 2006 21:35:52 +0000 (14:35 -0700)]
[SPARC]: Convert all FB SBUS drivers to of_driver framework.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC]: Add of_io{remap,unmap}().
David S. Miller [Thu, 29 Jun 2006 21:35:33 +0000 (14:35 -0700)]
[SPARC]: Add of_io{remap,unmap}().

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC]: Encode I/O space into resource flags on sparc32.
David S. Miller [Thu, 29 Jun 2006 21:35:14 +0000 (14:35 -0700)]
[SPARC]: Encode I/O space into resource flags on sparc32.

On sparc64 we don't need to do this because the resource
values are large enough to encode the full physical address.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC]: Beginnings of generic of_device framework.
David S. Miller [Thu, 29 Jun 2006 21:34:50 +0000 (14:34 -0700)]
[SPARC]: Beginnings of generic of_device framework.

The idea is to fully construct the device register and
interrupt values into these of_device objects, and convert
all of SBUS, EBUS, ISA drivers to use this new stuff.

Much ideas and code taken from Ben H.'s powerpc work.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC]: Add of_n_{addr,size}_cells().
David S. Miller [Thu, 29 Jun 2006 21:34:12 +0000 (14:34 -0700)]
[SPARC]: Add of_n_{addr,size}_cells().

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[SPARC64]: Kill starfire_cookie from SBUS/PCI.
David S. Miller [Thu, 29 Jun 2006 21:27:13 +0000 (14:27 -0700)]
[SPARC64]: Kill starfire_cookie from SBUS/PCI.

Totally unused.

We need to traverse the list of global IRQ translaters,
so storing it in the per-bus structures was useless.

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years agoocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks()
Florin Malita [Sat, 3 Jun 2006 23:30:10 +0000 (19:30 -0400)]
ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks()

Signed-off-by: Florin Malita <fmalita@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
18 years agoocfs2: clean up some osb fields
Mark Fasheh [Thu, 4 May 2006 19:03:26 +0000 (12:03 -0700)]
ocfs2: clean up some osb fields

Get rid of osb->uuid, osb->proc_sub_dir, and osb->osb_id. Those fields were
unused, or could easily be removed. As a result, we also no longer need
MAX_OSB_ID or ocfs2_globals_lock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
18 years agoocfs2: fix init of uuid_net_key
Mark Fasheh [Thu, 4 May 2006 18:49:22 +0000 (11:49 -0700)]
ocfs2: fix init of uuid_net_key

ocfs2_initialize_super() should be copying from the beginning of the uuid.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
18 years agoocfs2: silence a debug print
Mark Fasheh [Fri, 28 Apr 2006 00:53:22 +0000 (17:53 -0700)]
ocfs2: silence a debug print

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
18 years agoocfs2: silence ENOENT during lookup of broken links
Sunil Mushran [Thu, 27 Apr 2006 23:44:13 +0000 (16:44 -0700)]
ocfs2: silence ENOENT during lookup of broken links

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
18 years agoocfs2: Cleanup message prints
Sunil Mushran [Thu, 27 Apr 2006 23:41:31 +0000 (16:41 -0700)]
ocfs2: Cleanup message prints

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
18 years agoocfs2: silence -EEXIST from ocfs2_extent_map_insert/lookup
Joel Becker [Thu, 27 Apr 2006 23:36:14 +0000 (16:36 -0700)]
ocfs2: silence -EEXIST from ocfs2_extent_map_insert/lookup

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
18 years ago[PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() static
Adrian Bunk [Fri, 31 Mar 2006 14:53:55 +0000 (16:53 +0200)]
[PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() static

dlm_lockres_master_requery() became global without any external usage.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>