Arthur Jones [Fri, 25 Jul 2008 08:49:06 +0000 (01:49 -0700)]
edac: i5100 fix unmask ecc bits
Explicitly unmask ECC errors we are interested in reporting.
Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arthur Jones [Fri, 25 Jul 2008 08:49:06 +0000 (01:49 -0700)]
edac: i5100 fix enable ecc hardware
It is possible that the BIOS did not enable ECC at boot time. We check
for that case and fail to load if it is true.
Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arthur Jones [Fri, 25 Jul 2008 08:49:05 +0000 (01:49 -0700)]
edac: i5100 fix missing bits
The error mask we use to trigger ECC notifications is missing many bits of
interest. We add these bits here so that all possible ECC errors can be
reported.
Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arthur Jones [Fri, 25 Jul 2008 08:49:04 +0000 (01:49 -0700)]
edac: i5100 new intel chipset driver
Preliminary support for the Intel 5100 MCH. CE and UE errors are reported
along with the current DIMM label information and other memory parameters.
Reasons why this is preliminary:
1) This chip has 2 independent memory controllers which, for best
perforance, use interleaved accesses to the DDR2 memory. This
architecture does not map very well to the current edac data structures
which depend on symmetric channel access to the interleaved data.
Without core changes, the best I could do for now is to map both memory
controllers to different csrows (first all ranks of controller 0, then
all ranks of controller 1). Someone much more familiar with the edac
core than I will probably need to come up with a more general data
structure to handle the interleaving and de-interleaving of the two
memory controllers.
2) I have not yet tackled the de-interleaving of the rank/controller
address space into the physical address space of the CPU. There is
nothing fundamentally missing, it is just ending up to be a lot of
code, and I'd rather keep it separate for now, esp since it doesn't
work yet...
3) The code depends on a particular i5100 chip select to DIMM mainboard
chip select mapping. This mapping seems obvious to me in order to
support dual and single ranked memory, but it is not unique and DIMM
labels could be wrong on other mainboards. There is no way to query
this mapping that I know of.
4) The code requires that the i5100 is in 32GB mode. Only 4 ranks per
controller, 2 ranks per DIMM are supported. I do not have hardware
(nor do I expect to have hardware anytime soon) for the 48GB (6 ranks
per controller) mode.
5) The serial presence detect code should be broken out into a "real"
i2c driver so that decode-dimms.pl can work.
Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Fri, 25 Jul 2008 08:49:02 +0000 (01:49 -0700)]
fuse: lockd support
If fuse filesystem doesn't define it's own lock operations, then allow the
lock manager to work with fuse.
Adding lockd support for remote locking is also possible, but more rarely
used, so leave it till later.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Fri, 25 Jul 2008 08:49:02 +0000 (01:49 -0700)]
fuse: nfs export special lookups
Implement the get_parent export operation by sending a LOOKUP request with
".." as the name.
Implement looking up an inode by node ID after it has been evicted from
the cache. This is done by seding a LOOKUP request with "." as the name
(for all file types, not just directories).
The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to
indicate that it supports these special lookups.
Thanks to John Muir for the original implementation of this feature.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Fri, 25 Jul 2008 08:49:01 +0000 (01:49 -0700)]
fuse: add fuse_lookup_name() helper
Add a new helper function which sends a LOOKUP request with the supplied
name. This will be used by the next patch to send special LOOKUP requests
with "." and ".." as the name.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Fri, 25 Jul 2008 08:49:00 +0000 (01:49 -0700)]
fuse: add export operations
Implement export_operations, to allow fuse filesystems to be exported to
NFS. This feature has been in the out-of-tree fuse module, and is widely
used and tested.
It has not been originally merged into mainline, because doing the NFS
export in userspace was thought to be a cleaner and more efficient way of
doing it, than through the kernel.
While that is true, it would also have involved a lot of duplicated effort
at reimplementing NFS exporting (all the different versions of the
protocol). This effort was unfortunately not undertaken by anyone, so we
are left with doing it the easy but less efficient way.
If this feature goes in, the out-of-tree fuse module can go away,
which would have several advantages:
- not having to maintain two versions
- less confusion for users
- no bugs due to kernel API changes
Comment from hch:
- Use the same fh_type values as XFS, since we use the same fh encoding.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Fri, 25 Jul 2008 08:48:59 +0000 (01:48 -0700)]
fuse: prepare lookup for nfs export
Use d_splice_alias() instead of d_add() in fuse lookup code, to allow NFS
exporting.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Fri, 25 Jul 2008 08:48:58 +0000 (01:48 -0700)]
locks: allow ->lock() to return FILE_LOCK_DEFERRED
Allow filesystem's ->lock() method to call posix_lock_file() instead of
posix_lock_file_wait(), and return FILE_LOCK_DEFERRED. This makes it
possible to implement a such a ->lock() function, that works with the lock
manager, which needs the call to be asynchronous.
Now the vfs_lock_file() helper can be used, so this is a cleanup as well.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Fri, 25 Jul 2008 08:48:57 +0000 (01:48 -0700)]
locks: cleanup code duplication
Extract common code into a function.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Fri, 25 Jul 2008 08:48:57 +0000 (01:48 -0700)]
locks: add special return value for asynchronous locks
Use a special error value FILE_LOCK_DEFERRED to mean that a locking
operation returned asynchronously. This is returned by
posix_lock_file() for sleeping locks to mean that the lock has been
queued on the block list, and will be woken up when it might become
available and needs to be retried (either fl_lmops->fl_notify() is
called or fl_wait is woken up).
f_op->lock() to mean either the above, or that the filesystem will
call back with fl_lmops->fl_grant() when the result of the locking
operation is known. The filesystem can do this for sleeping as well
as non-sleeping locks.
This is to make sure, that return values of -EAGAIN and -EINPROGRESS by
filesystems are not mistaken to mean an asynchronous locking.
This also makes error handling in fs/locks.c and lockd/svclock.c slightly
cleaner.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Fri, 25 Jul 2008 08:48:55 +0000 (01:48 -0700)]
lockd: dont return EAGAIN for a permanent error
Fix nlm_fopen() to return NLM_FAILED (or NLM_LCK_DENIED_NOLOCKS) instead
of NLM_LCK_DENIED. The latter means the lock request failed because of a
conflicting lock (i.e. a temporary error), which is wrong in this case.
Also fix the client to return ENOLCK instead of EAGAIN if a blocking lock
request returns with NLM_LOCK_DENIED.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vegard Nossum [Fri, 25 Jul 2008 08:48:55 +0000 (01:48 -0700)]
taskstats: remove initialization of static per-cpu variable
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keika Kobayashi [Fri, 25 Jul 2008 08:48:54 +0000 (01:48 -0700)]
per-task-delay-accounting: update document and getdelays.c for memory reclaim
Update document and make getdelays.c show delay accounting for memory reclaim.
For making a distinction between "swapping in pages" and "memory reclaim"
in getdelays.c, MEM is changed to SWAP.
Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keika Kobayashi [Fri, 25 Jul 2008 08:48:53 +0000 (01:48 -0700)]
per-task-delay-accounting: update taskstats for memory reclaim delay
Add members for memory reclaim delay to taskstats, and accumulate them in
__delayacct_add_tsk() .
Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keika Kobayashi [Fri, 25 Jul 2008 08:48:52 +0000 (01:48 -0700)]
per-task-delay-accounting: add memory reclaim delay
Sometimes, application responses become bad under heavy memory load.
Applications take a bit time to reclaim memory. The statistics, how long
memory reclaim takes, will be useful to measure memory usage.
This patch adds accounting memory reclaim to per-task-delay-accounting for
accounting the time of do_try_to_free_pages().
<i.e>
- When System is under low memory load,
memory reclaim may not occur.
$ free
total used free shared buffers cached
Mem:
8197800 1577300 6620500 0 4808
1516724
-/+ buffers/cache: 55768
8142032
Swap:
16386292 0
16386292
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0
5069748 10612
3014060 0 0 0 0 3 26 0 0 100 0
0 0 0
5069748 10612
3014060 0 0 0 0 4 22 0 0 100 0
0 0 0
5069748 10612
3014060 0 0 0 0 3 18 0 0 100 0
Measure the time of tar command.
$ ls -s test.dat
1501472 test.dat
$ time tar cvf test.tar test.dat
real 0m13.388s
user 0m0.116s
sys 0m5.304s
$ ./delayget -d -p <pid>
CPU count real total virtual total delay total
428
5528345500 5477116080 62749891
IO count delay total
338
8078977189
SWAP count delay total
0 0
RECLAIM count delay total
0 0
- When system is under heavy memory load
memory reclaim may occur.
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0
7159032 49724 1812 3012 0 0 0 0 3 24 0 0 100 0
0 0
7159032 49724 1812 3012 0 0 0 0 4 24 0 0 100 0
0 0
7159032 49848 1812 3012 0 0 0 0 3 22 0 0 100 0
In this case, one process uses more 8G memory
by execution of malloc() and memset().
$ time tar cvf test.tar test.dat
real 1m38.563s <- increased by 85 sec
user 0m0.140s
sys 0m7.060s
$ ./delayget -d -p <pid>
CPU count real total virtual total delay total
9021
7140446250 7315277975 923201824
IO count delay total
8965
90466349669
SWAP count delay total
3
21036367
RECLAIM count delay total
740
61011951153
In the later case, the value of RECLAIM is increasing.
So, taskstats can show how much memory reclaim influences TAT.
Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujistu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 25 Jul 2008 08:48:50 +0000 (01:48 -0700)]
tsacct: fix bacct_add_tsk()'s use of do_div()
Fix bacct_add_tsk()'s use of do_div() on an s64 by making ac_etime a u64
instead and dividing that.
Possibly this should be guarded lest the interval calculation turn up
negative, but the possible negativity of the result of the division is
cast away, and it shouldn't end up negative anyway.
This was introduced by patch
f3cef7a99469afc159fec3a61b42dc7ca5b6824f.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Righi [Fri, 25 Jul 2008 08:48:49 +0000 (01:48 -0700)]
task IO accounting: provide distinct tgid/tid I/O statistics
Report per-thread I/O statistics in /proc/pid/task/tid/io and aggregate
parent I/O statistics in /proc/pid/io. This approach follows the same
model used to account per-process and per-thread CPU times.
As a practial application, this allows for example to quickly find the top
I/O consumer when a process spawns many child threads that perform the
actual I/O work, because the aggregated I/O statistics can always be found
in /proc/pid/io.
[ Oleg Nesterov points out that we should check that the task is still
alive before we iterate over the threads, but also says that we can do
that fixup on top of this later. - Linus ]
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: Matt Heaton <matt@hostmonster.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Acked-by-with-comments: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:49 +0000 (01:48 -0700)]
bsdacct: fix and add comments around acct_process()
Fix the one describing what this function is and add one more - about
locking absence around pid namespaces loop.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:48 +0000 (01:48 -0700)]
bsdacct: account dying tasks in all relevant namespaces
This just makes the acct_proces walk the pid namespaces from current up to
the top and account a task in each with the accounting turned on.
ns->parent access if safe lockless, since current it still alive and holds
its namespace, which in turn holds its parent.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:47 +0000 (01:48 -0700)]
bsdacct: turn acct off for all pidns-s on umount time
All the bsd_acct_strcts with opened accounting are linked into a global
list. So, the acct_auto_close(_mnt) walks one and drops the accounting
for each.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:47 +0000 (01:48 -0700)]
bsdacct: switch from global bsd_acct_struct instance to per-pidns one
Allocate the structure on the first call to sys_acct(). After this each
namespace, that ordered the accounting, will live with this structure till
its own death.
Two notes
- routines, that close the accounting on fs umount time use
the init_pid_ns's acct by now;
- accounting routine accounts to dying task's namespace
(also by now).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:46 +0000 (01:48 -0700)]
bsdacct: make internal code work with passed bsd_acct_struct, not global
This adds the appropriate pointer to all the internal (i.e. static)
functions that work with global acct instance. API calls pass a global
instance to them (while we still have such).
Mostly this is a s/acct_globals./acct->/ over the file.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:45 +0000 (01:48 -0700)]
bsdacct: turn the acct_lock from on-the-struct to global
Don't use per-bsd-acct-struct lock, but work with a global one.
This lock is taken for short periods, so it doesn't seem it'll become a
bottleneck, but it will allow us to easily avoid many locking difficulties
in the future.
So this is a mostly s/acct_globals.lock/acct_lock/ over the file.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:44 +0000 (01:48 -0700)]
bsdacct: make check timer accept a bsd_acct_struct argument
We're going to have many bsd_acct_struct instances, not just one, so the
timer (currently working with a global one) has to know which one to work
with.
Use a handy setup_timer macro for it (thanks to Oleg for one).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:44 +0000 (01:48 -0700)]
bsdacct: "truthify" a comment near acct_process
The acct_process does not accept any arguments actually.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:43 +0000 (01:48 -0700)]
pidns: add the struct bsd_acct_struct pointer on pid_namespace struct
All the bsdacct-related info will be stored in the area, pointer by this
one.
It will be NULL automatically for all new namespaces.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:42 +0000 (01:48 -0700)]
pidns: use kzalloc when allocating new pid_namespace struct
It makes many fields initialization implicit helping in auto-setting
#ifdef-ed fields (bsd-acct related pointer will be such).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:42 +0000 (01:48 -0700)]
bsdacct: rename acct_gbls to bsd_acct_struct
After I fixed access to task->tgid in kernel/acct.c, Oleg pointed out some
bad side effects with this accounting vs pid namespaces interaction. I.e.
when some task in pid namespace sets this accounting up, this blocks all
the others from doing the same. Restricting this to init namespace only
could help, but didn't look a graceful solution.
So here is the approach to make this accounting work with pid namespaces
properly.
The idea is simple - when a task dies it accounts itself in each namespace
it is visible from and which set the accounting up.
For example here are the commands run and the output of lastcomm from init
and sub namespaces:
init_ns# accton pacct
sub_ns# accton pacct (this is a different file - sub ns is run in
a chroot-ed environment)
init_ns# cat /dev/null
sub_ns# ls /dev/null
init_ns# accton
sub_ns# accton
sub_ns# lastcomm -f pacct
ls 0 [136,0] 0.00 secs Thu May 15 10:30
accton 0 [136,0] 0.00 secs Thu May 15 10:30
init_ns# lastcomm -f pacct
accton root pts/0 0.00 secs Thu May 15 14:30 << got from sub
cat root pts/1 0.00 secs Thu May 15 14:30
ls root pts/0 0.00 secs Thu May 15 14:30 << got from sub
accton root pts/1 0.00 secs Thu May 15 14:30
That was the summary, the details are in patches.
This patch:
It will be visible in pid_namespace.h file, so fix its name to look better
outside the acct.c file.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jonathan Lim [Fri, 25 Jul 2008 08:48:40 +0000 (01:48 -0700)]
accounting: account for user time when updating memory integrals
Adapt acct_update_integrals() to include user time when calculating the time
difference. The units of acct_rss_mem1 and acct_vm_mem1 are also changed from
pages-jiffies to pages-usecs to avoid calling jiffies_to_usecs() in
xacct_add_tsk() which might overflow.
Signed-off-by: Jonathan Lim <jlim@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 25 Jul 2008 08:48:40 +0000 (01:48 -0700)]
unexport uts_sem
With the removal of the Solaris binary emulation the export of
uts_sem became unused.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Fri, 25 Jul 2008 08:48:39 +0000 (01:48 -0700)]
markers: fix sparse integer as NULL pointer warning
kernel/trace/trace_sysprof.c:164:20: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathieu Desnoyers [Fri, 25 Jul 2008 08:48:38 +0000 (01:48 -0700)]
markers: use rcu_barrier_sched() and call_rcu_sched()
rcu_barrier_sched() and call_rcu_sched() were introduced in 2.6.26 for the
Markers. Change the marker code to use them.
It can be seen as a fix since the marker code was using an ugly,
temporary, #ifdef hack to work around CONFIG_PREEMPT_RCU.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Paul McKenney <paulmck@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthias Kaehlcke [Fri, 25 Jul 2008 08:48:38 +0000 (01:48 -0700)]
aoe: convert emsgs_sema into a completion
ATA over Ethernet: The semaphore emsgs_sema is used for signalling an
event, convert it in a completion.
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:37 +0000 (01:48 -0700)]
pidns: remove find_task_by_pid, unused for a long time
It seems to me that it was a mistake marking this function as deprecated
and scheduling it for removal, rather than resolutely removing it after
the last caller's death.
Anyway - better late, then never.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:36 +0000 (01:48 -0700)]
pidns: remove now unused find_pid function.
This one had the only users so far - the kill_proc, which is removed, so
drop this (invalid in namespaced world) call too.
And of course - erase all references on it from comments.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Fri, 25 Jul 2008 08:48:35 +0000 (01:48 -0700)]
pidns: remove now unused kill_proc function
This function operated on a pid_t to kill a task, which is no longer valid
in a containerized system.
It has finally lost all its users and we can safely remove it from the
tree.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Kennedy [Fri, 25 Jul 2008 08:48:35 +0000 (01:48 -0700)]
shrink struct pid by removing padding on 64 bit builds
When struct pid is built on a 64 bit platform gcc has to insert padding to
maintain the correct alignment, by simply reordering its members the
memory usage shrinks from 88 bytes to 80.
I've successfully run with this patch on my desktop AMD64 machine.
There are no significant kernel size changes to a default config.X86_64
on the latest git v2.6.26-rc1
text data bss dec hex filename
5404828 976760 734280
7115868 6c945c vmlinux
5404811 976760 734280
7115851 6c944b vmlinux.pid-patch
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 25 Jul 2008 08:48:34 +0000 (01:48 -0700)]
proper pid{hash,map}_init() prototypes
This patch adds proper prototypes for pid{hash,map}_init() in
include/linux/pid_namespace.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Hemminger [Fri, 25 Jul 2008 08:48:32 +0000 (01:48 -0700)]
sysctl: allow override of /proc/sys/net with CAP_NET_ADMIN
Extend the permission check for networking sysctl's to allow modification
when current process has CAP_NET_ADMIN capability and is not root. This
version uses the until now unused permissions hook to override the mode
value for /proc/sys/net if accessed by a user with capabilities.
Found while working with Quagga. It is impossible to turn forwarding
on/off through the command interface because Quagga uses secure coding
practice of dropping privledges during initialization and only raising via
capabilities when necessary. Since the dameon has reset real/effective
uid after initialization, all attempts to access /proc/sys/net variables
will fail.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morgan <morgan@kernel.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 25 Jul 2008 08:48:31 +0000 (01:48 -0700)]
sysctl: check for bogus modes
Catch, e. g., 644/0644 typo.
Signed-off-by: Alexey Dobriyan <adobriyan@parallels.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Sterba [Fri, 25 Jul 2008 08:48:31 +0000 (01:48 -0700)]
proc: misplaced export of find_get_pid
Move EXPORT_SYMBOL right after the func
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 25 Jul 2008 08:48:30 +0000 (01:48 -0700)]
proc: move Kconfig to fs/proc/Kconfig
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 25 Jul 2008 08:48:30 +0000 (01:48 -0700)]
proc: remove pathetic remount code
MS_RMT_MASK will unmask changes in do_remount_sb() anyway.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 25 Jul 2008 08:48:29 +0000 (01:48 -0700)]
proc: always do ->release
Current two-stage scheme of removing PDE emphasizes one bug in proc:
open
rmmod
remove_proc_entry
close
->release won't be called because ->proc_fops were cleared. In simple
cases it's small memory leak.
For every ->open, ->release has to be done. List of openers is introduced
which is traversed at remove_proc_entry() if neeeded.
Discussions with Al long ago (sigh).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 25 Jul 2008 08:48:28 +0000 (01:48 -0700)]
move proc_kmsg_operations to fs/proc/internal.h
This patch moves the extern of struct proc_kmsg_operations to
fs/proc/internal.h and adds an #include "internal.h" to fs/proc/kmsg.c
so that the latter sees the former.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 25 Jul 2008 08:48:28 +0000 (01:48 -0700)]
unexport proc_clear_tty
With the removal of the Solaris binary emulation the export of
proc_clear_tty became unused.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 25 Jul 2008 08:48:27 +0000 (01:48 -0700)]
dell_rbu: use memory_read_from_buffer()
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Abhay Salunke <Abhay_Salunke@dell.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 25 Jul 2008 08:48:26 +0000 (01:48 -0700)]
fs/partitions/efi: convert to pr_debug
convert the local Dprintk() compile time debug printk wrappers to the
generic pr_debug() wrapper.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Abdel Benamrouche [Fri, 25 Jul 2008 08:48:26 +0000 (01:48 -0700)]
block/ioctl.c and fs/partition/check.c: check value returned by add_partition()
Now that add_partition() has been aught to propagate errors, let's check them.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Abdel Benamrouche <draconux@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Abdel Benamrouche [Fri, 25 Jul 2008 08:48:25 +0000 (01:48 -0700)]
fs/partition/check.c: fix return value warning
fs/partitions/check.c:381: warning: ignoring return value of ___device_add___,
declared with attribute warn_unused_result
[akpm@linux-foundation.org: multiple-return-statements-per-function are evil]
Signed-off-by: Abdel Benamrouche <draconux@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 25 Jul 2008 08:48:24 +0000 (01:48 -0700)]
dcdbas: use memory_read_from_buffer()
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Doug Warzecha <Douglas_Warzecha@dell.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 25 Jul 2008 08:48:23 +0000 (01:48 -0700)]
firmware: use memory_read_from_buffer()
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Markus Rechberger <markus.rechberger@amd.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Fri, 25 Jul 2008 08:48:23 +0000 (01:48 -0700)]
drivers/misc/phantom: note PCI
Tell users that the driver is only for PCI devices to stop asking for
support of firewire and parallel devices.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Fri, 25 Jul 2008 08:48:22 +0000 (01:48 -0700)]
Char: mxser, various cleanups
- remove unused macro
- some whitespace cleanup
- useless debug prints removal
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Fri, 25 Jul 2008 08:48:22 +0000 (01:48 -0700)]
Char: mxser, remove predefined isa support
Remove a support of ISA addresses predefined at compile time. It is
unused (filled by zeroes) and prolongs the code. Don't initialize global
array and add `ioaddr' module param description.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Fri, 25 Jul 2008 08:48:21 +0000 (01:48 -0700)]
Char: mxser, prints cleanup
- use dev_* for printing in pci probe function
- move ISA p[rints directly into isa find function, do not postpone it.
Remove macros bound to it then.
- prepend some prints by "mxser: " to know what it belongs to
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Fri, 25 Jul 2008 08:48:20 +0000 (01:48 -0700)]
Char: mxser, update documentation
Update Documentation/moxa-smartio to the later document from the mxser
package.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Fri, 25 Jul 2008 08:48:20 +0000 (01:48 -0700)]
Char: mxser, globals cleanup
- remove unused mxvar_diagflag
- move mxser_msr into the only user/function
- GMStatus, hmm, fix race-prone access to it. We need only one instance for
real, not MXSER_PORTS. Move it to MOXA_GETMSTATUS ioctl.
- mxser_mon_ext, almost the same, but alloc it on heap, since it has more than
2 kilos.
- fix indexing, `i' is not the index value, `i * MXSER_PORTS_PER_BOARD + j' is
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Fri, 25 Jul 2008 08:48:19 +0000 (01:48 -0700)]
Char: mxser, ioctl cleanup
- remove break ctl from ioctl handler, it's never reached, since
tty_ops->break_ctl is defined (mxser break handling is done in software)
- mark MOXA_GET_MAJOR as deprecated
- fix TIOCGICOUNT (some retval non-checks of put_user). Use copy_to_user
to whole structure instead.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 25 Jul 2008 08:48:18 +0000 (01:48 -0700)]
nwflash: use simple_read_from_buffer()
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 25 Jul 2008 08:48:17 +0000 (01:48 -0700)]
dsp56k: BKL pushdown
Push the BKL down into the driver ioctl methods
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 25 Jul 2008 08:48:17 +0000 (01:48 -0700)]
ds1302: push down the BKL into the driver ioctl code
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 25 Jul 2008 08:48:16 +0000 (01:48 -0700)]
ppdev: wrap ioctl handler in driver and push lock down
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 25 Jul 2008 08:48:16 +0000 (01:48 -0700)]
ixj: push BKL into driver and wrap ioctls
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 25 Jul 2008 08:48:15 +0000 (01:48 -0700)]
sx: push BKL down into the firmware ioctl handler
Also fix the capability checking for firmware load.
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 25 Jul 2008 08:48:14 +0000 (01:48 -0700)]
rio: push down the BKL into the firmware ioctl handler
TTY side is already done.
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 25 Jul 2008 08:48:14 +0000 (01:48 -0700)]
mwave: ioctl BKL pushdown
Push the BKL down to the point it wraps the actual mwave method handlers
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Yani Ioannou <yani.ioannou@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 25 Jul 2008 08:48:13 +0000 (01:48 -0700)]
ip2: push BKL down for the firmware interface
(The tty side is already done)
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 25 Jul 2008 08:48:12 +0000 (01:48 -0700)]
efirtc: push down the BKL
Push it down as far as the EFI method calls. Someone who knows EFI can do
the other bits. Also fix another wrong unknown ioctl return.
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 25 Jul 2008 08:48:11 +0000 (01:48 -0700)]
#if 0 hpet_unregister()
This patch #if 0's the unused hpet_unregister().
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 25 Jul 2008 08:48:11 +0000 (01:48 -0700)]
proper extern for mwave_s_mdd
This patch adds a proper extern for mwave_s_mdd in
drivers/char/mwave/mwavedd.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Edgar E. Iglesias [Fri, 25 Jul 2008 08:48:10 +0000 (01:48 -0700)]
elf: use ELF_CORE_EFLAGS for kcore ELF header flags
ELF_CORE_EFLAGS is already used by the binfmt_elf coredumper to set correct
arch specific ELF header flags on coredumps. Use it for kcore dumps as well.
At the moment, this affects the CRIS and the H8300 arch.
Signed-off-by: Edgar E. Iglesias <edgar@axis.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 25 Jul 2008 08:48:09 +0000 (01:48 -0700)]
pty: remove unused UNIX98_PTY_COUNT options
The h8300 and sparc options somehow survived when the code stopped using
CONFIG_UNIX98_PTY_COUNT.
Reviewed-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadia Derbey [Fri, 25 Jul 2008 08:48:08 +0000 (01:48 -0700)]
ipc: do not use a negative value to re-enable msgmni automatic recomputing
This patch proposes an alternative to the "magical
positive-versus-negative number trick" Andrew complained about last week
in http://lkml.org/lkml/2008/6/24/418.
This had been introduced with the patches that scale msgmni to the amount
of lowmem. With these patches, msgmni has a registered notification
routine that recomputes msgmni value upon memory add/remove or ipc
namespace creation/ removal.
When msgmni is changed from user space (i.e. value written to the proc
file), that notification routine is unregistered, and the way to make it
registered back is to write a negative value into the proc file. This is
the "magical positive-versus-negative number trick".
To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni.
This file acts as ON/OFF for msgmni automatic recomputing.
With this patch, the process is the following:
1) kernel boots in "automatic recomputing mode"
/proc/sys/kernel/msgmni contains the value that has been computed (depends
on lowmem)
/proc/sys/kernel/automatic_msgmni contains "1"
2) echo <val> > /proc/sys/kernel/msgmni
. sets msg_ctlmni to <val>
. de-activates automatic recomputing (i.e. if, say, some memory is added
msgmni won't be recomputed anymore)
. /proc/sys/kernel/automatic_msgmni now contains "0"
3) echo "0" > /proc/sys/kernel/automatic_msgmni
. de-activates msgmni automatic recomputing
this has the same effect as 2) except that msg_ctlmni's value stays
blocked at its current value)
3) echo "1" > /proc/sys/kernel/automatic_msgmni
. recomputes msgmni's value based on the current available memory size
and number of ipc namespaces
. re-activates automatic recomputing for msgmni.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 25 Jul 2008 08:48:07 +0000 (01:48 -0700)]
ipc: use simple_read_from_buffer()
Also this patch kills unneccesary trailing NULL character.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Fri, 25 Jul 2008 08:48:06 +0000 (01:48 -0700)]
ipc/sem.c: rewrite undo list locking
The attached patch:
- reverses the locking order of ulp->lock and sem_lock:
Previously, it was first ulp->lock, then inside sem_lock.
Now it's the other way around.
- converts the undo structure to rcu.
Benefits:
- With the old locking order, IPC_RMID could not kfree the undo structures.
The stale entries remained in the linked lists and were released later.
- The patch fixes a a race in semtimedop(): if both IPC_RMID and a semget() that
recreates exactly the same id happen between find_alloc_undo() and sem_lock,
then semtimedop() would access already kfree'd memory.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Fri, 25 Jul 2008 08:48:06 +0000 (01:48 -0700)]
ipc/sem.c: convert sem_array.sem_pending to struct list_head
sem_array.sem_pending is a double linked list, the attached patch converts
it to struct list_head.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Fri, 25 Jul 2008 08:48:05 +0000 (01:48 -0700)]
ipc/sem.c: remove unused entries from struct sem_queue
sem_queue.sma and sem_queue.id were never used, the attached patch removes
them.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Fri, 25 Jul 2008 08:48:04 +0000 (01:48 -0700)]
ipc/sem.c: convert undo structures to struct list_head
The undo structures contain two linked lists, the attached patch replaces
them with generic struct list_head lists.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadia Derbey [Fri, 25 Jul 2008 08:48:03 +0000 (01:48 -0700)]
ipc: get rid of ipc_lock_down()
Remove the ipc_lock_down() routines: they used to call idr_find() locklessly
(given that the ipc ids lock was already held), so they are not needed
anymore.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadia Derbey [Fri, 25 Jul 2008 08:48:03 +0000 (01:48 -0700)]
ipc: call idr_find() without locking in ipc_lock()
Call idr_find() locklessly from ipc_lock(), since the idr tree is now RCU
protected.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadia Derbey [Fri, 25 Jul 2008 08:48:02 +0000 (01:48 -0700)]
idr: make idr_remove rcu-safe
Introduce the free_layer() routine: it is the one that actually frees memory
after a grace period has elapsed.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadia Derbey [Fri, 25 Jul 2008 08:48:01 +0000 (01:48 -0700)]
idr: make idr_find rcu-safe
Make idr_find rcu-safe: it can now be called inside an rcu_read critical
section.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadia Derbey [Fri, 25 Jul 2008 08:48:00 +0000 (01:48 -0700)]
idr: make idr_get_new* rcu-safe
Make the idr_get_new* routines rcu-safe.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadia Derbey [Fri, 25 Jul 2008 08:47:59 +0000 (01:47 -0700)]
idr: error checking factorization
Do some code factorization in the return code analysis.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadia Derbey [Fri, 25 Jul 2008 08:47:59 +0000 (01:47 -0700)]
idr: fix a printk call
Fix the incomplete printk call.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadia Derbey [Fri, 25 Jul 2008 08:47:58 +0000 (01:47 -0700)]
idr: rename some of the idr APIs internal routines
This is a trivial patch that renames:
. alloc_layer to get_from_free_list since it idr_pre_get that actually
allocates memory.
. free_layer to move_to_free_list since memory is not actually freed there.
This makes things more clear for the next patches.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadia Derbey [Fri, 25 Jul 2008 08:47:57 +0000 (01:47 -0700)]
idr: change the idr structure
After scalability problems have been detected when using the sysV ipcs, I have
proposed to use an RCU based implementation of the IDR api instead (see
threads http://lkml.org/lkml/2008/4/11/212 and
http://lkml.org/lkml/2008/4/29/295).
This resulted in many people asking to convert the idr API and make it rcu
safe (because most of the code was duplicated and thus unmaintanable and
unreviewable).
So here is a first attempt.
The important change wrt to the idr API itself is during idr removes: idr
layers are freed after a grace period, instead of being moved to the free
list.
The important change wrt to ipcs, is that idr_find() can now be called
locklessly inside a rcu read critical section.
Here are the results I've got for the pmsg test sent by Manfred:
2.6.25-rc3-mm1 2.6.25-rc3-mm1+ 2.6.25-mm1 Patched 2.6.25-mm1
1
1168441 1064021 876000 947488
2
1094264 921059
1549592 1730685
3
2082520 1738165 1694370 2324880
4
2079929 1695521 404553
2400408
5
2898758 406566 391283
3246580
6
2921417 261275 263249
3752148
7
3308761 126056 191742
4243142
8
3329456 100129 141722
4275780
1st column: stock 2.6.25-rc3-mm1
2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs)
3nd column: stock 2.6.25-mm1
4th column: 2.6.25-mm1 + this pacth series.
This patch:
Add an rcu_head to the idr_layer structure in order to free it after a grace
period.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chandru [Fri, 25 Jul 2008 08:47:55 +0000 (01:47 -0700)]
calgary iommu: use the first kernels TCE tables in kdump
kdump kernel fails to boot with calgary iommu and aacraid driver on a x366
box. The ongoing dma's of aacraid from the first kernel continue to exist
until the driver is loaded in the kdump kernel. Calgary is initialized
prior to aacraid and creation of new tce tables causes wrong dma's to
occur. Here we try to get the tce tables of the first kernel in kdump
kernel and use them. While in the kdump kernel we do not allocate new tce
tables but instead read the base address register contents of calgary
iommu and use the tables that the registers point to. With these changes
the kdump kernel and hence aacraid now boots normally.
Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 25 Jul 2008 08:47:54 +0000 (01:47 -0700)]
workqueues: do CPU_UP_CANCELED if CPU_UP_PREPARE fails
The bug was pointed out by Akinobu Mita <akinobu.mita@gmail.com>, and this
patch is based on his original patch.
workqueue_cpu_callback(CPU_UP_PREPARE) expects that if it returns
NOTIFY_BAD, _cpu_up() will send CPU_UP_CANCELED then.
However, this is not true since
"cpu hotplug: cpu: deliver CPU_UP_CANCELED only to NOTIFY_OKed callbacks with CPU_UP_PREPARE"
commit:
a0d8cdb652d35af9319a9e0fb7134de2a276c636
The callback which has returned NOTIFY_BAD will not receive
CPU_UP_CANCELED. Change the code to fulfil the CPU_UP_CANCELED logic if
CPU_UP_PREPARE fails.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 25 Jul 2008 08:47:53 +0000 (01:47 -0700)]
workqueues: schedule_on_each_cpu() can use schedule_work_on()
schedule_on_each_cpu() can use schedule_work_on() to avoid the code
duplication.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 25 Jul 2008 08:47:53 +0000 (01:47 -0700)]
workqueues: queue_work() can use queue_work_on()
queue_work() can use queue_work_on() to avoid the code duplication.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 25 Jul 2008 08:47:52 +0000 (01:47 -0700)]
workqueues: lockdep annotations for flush_work()
Add lockdep annotations to flush_work() and update the comment.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jarek Poplawski <jarkao2@o2.pl>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 25 Jul 2008 08:47:51 +0000 (01:47 -0700)]
S390 topology: don't use kthread() for arch_reinit_sched_domains()
Now that it is safe to use get_online_cpus() we can revert
[S390] cpu topology: Fix possible deadlock.
commit:
fd781fa25c9e9c6fd1599df060b05e7c4ad724e5
and call arch_reinit_sched_domains() directly from topology_work_fn().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 25 Jul 2008 08:47:50 +0000 (01:47 -0700)]
workqueues: make get_online_cpus() useable for work->func()
workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under
cpu_maps_update_begin(). This means that the multithreaded workqueues
can't use get_online_cpus() due to the possible deadlock, very bad and
very old problem.
Introduce the new state, CPU_POST_DEAD, which is called after
cpu_hotplug_done() but before cpu_maps_update_done().
Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD.
This means that create/destroy functions can't rely on get_online_cpus()
any longer and should take cpu_add_remove_lock instead.
[akpm@linux-foundation.org: fix CONFIG_SMP=n]
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 25 Jul 2008 08:47:49 +0000 (01:47 -0700)]
workqueues: schedule_on_each_cpu: use flush_work()
Change schedule_on_each_cpu() to use flush_work() instead of
flush_workqueue(), this way we don't wait for other work_struct's which
can be queued meanwhile.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jarek Poplawski <jarkao2@gmail.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 25 Jul 2008 08:47:49 +0000 (01:47 -0700)]
workqueues: implement flush_work()
Most of users of flush_workqueue() can be changed to use cancel_work_sync(),
but sometimes we really need to wait for the completion and cancelling is not
an option. schedule_on_each_cpu() is good example.
Add the new helper, flush_work(work), which waits for the completion of the
specific work_struct. More precisely, it "flushes" the result of of the last
queue_work() which is visible to the caller.
For example, this code
queue_work(wq, work);
/* WINDOW */
queue_work(wq, work);
flush_work(work);
doesn't necessary work "as expected". What can happen in the WINDOW above is
- wq starts the execution of work->func()
- the caller migrates to another CPU
now, after the 2nd queue_work() this work is active on the previous CPU, and
at the same time it is queued on another. In this case flush_work(work) may
return before the first work->func() completes.
It is trivial to add another helper
int flush_work_sync(struct work_struct *work)
{
return flush_work(work) || wait_on_work(work);
}
which works "more correctly", but it has to iterate over all CPUs and thus
it much slower than flush_work().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Acked-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 25 Jul 2008 08:47:47 +0000 (01:47 -0700)]
workqueues: insert_work: use "list_head *" instead of "int tail"
insert_work() inserts the new work_struct before or after cwq->worklist,
depending on the "int tail" parameter. Change it to accept "list_head *"
instead, this shrinks .text a bit and allows us to insert the barrier
after specific work_struct.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jarek Poplawski <jarkao2@gmail.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>