Daniel Walker [Thu, 18 Oct 2007 10:06:07 +0000 (03:06 -0700)]
whitespace fixes: fork
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Walker [Thu, 18 Oct 2007 10:06:06 +0000 (03:06 -0700)]
whitespace fixes: DMA channel allocator
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Walker [Thu, 18 Oct 2007 10:06:06 +0000 (03:06 -0700)]
whitespace fixes: audit filtering
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Walker [Thu, 18 Oct 2007 10:06:05 +0000 (03:06 -0700)]
whitespace fixes: relayfs
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Cc: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Walker [Thu, 18 Oct 2007 10:06:04 +0000 (03:06 -0700)]
whitespace fixes: cpuset
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Walker [Thu, 18 Oct 2007 10:06:04 +0000 (03:06 -0700)]
whitespace fixes: process accounting
Lots of converting spaces to tabs.
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Walker [Thu, 18 Oct 2007 10:06:03 +0000 (03:06 -0700)]
whitespace fixes: time syscalls
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Juhl [Thu, 18 Oct 2007 10:06:03 +0000 (03:06 -0700)]
mxser: fix compiler warning when building without CONFIG_PCI
drivers/char/mxser.c:386: warning: 'mxser_get_PCI_conf' declared 'static' but never defined
when building without CONFIG_PCI.
[jesper.juhl@gmail.com: Fix warning: 'CheckIsMoxaMust' defined but not used]
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Thu, 18 Oct 2007 10:06:02 +0000 (03:06 -0700)]
mxser: remove commented crap
This is years dead code and it keeps turning up in confusing ways when
grepping for stuff.
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Thu, 18 Oct 2007 10:06:02 +0000 (03:06 -0700)]
Char: mxser_new, remove useless comments in mxser_cards
mxser_new, remove useless comments in mxser_cards
It was rest from times, where info about the card was separated (name,
ports number and flags).
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Thu, 18 Oct 2007 10:06:01 +0000 (03:06 -0700)]
Char: mxser_new, move to PCI_VDEVICE
mxser_new, move to PCI_VDEVICE
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Thu, 18 Oct 2007 10:06:01 +0000 (03:06 -0700)]
Char: mxser_new, upgrade to 1.10
mxser_new, upgrade to 1.10
This adds support for new (5 cards) hardware.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morgan [Thu, 18 Oct 2007 10:05:59 +0000 (03:05 -0700)]
V3 file capabilities: alter behavior of cap_setpcap
The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1,
can change the capabilities of another process, p2. This is not the
meaning that was intended for this capability at all, and this
implementation came about purely because, without filesystem capabilities,
there was no way to use capabilities without one process bestowing them on
another.
Since we now have a filesystem support for capabilities we can fix the
implementation of CAP_SETPCAP.
The most significant thing about this change is that, with it in effect, no
process can set the capabilities of another process.
The capabilities of a program are set via the capability convolution
rules:
pI(post-exec) = pI(pre-exec)
pP(post-exec) = (X(aka cap_bset) & fP) | (pI(post-exec) & fI)
pE(post-exec) = fE ? pP(post-exec) : 0
at exec() time. As such, the only influence the pre-exec() program can
have on the post-exec() program's capabilities are through the pI
capability set.
The correct implementation for CAP_SETPCAP (and that enabled by this patch)
is that it can be used to add extra pI capabilities to the current process
- to be picked up by subsequent exec()s when the above convolution rules
are applied.
Here is how it works:
Let's say we have a process, p. It has capability sets, pE, pP and pI.
Generally, p, can change the value of its own pI to pI' where
(pI' & ~pI) & ~pP = 0.
That is, the only new things in pI' that were not present in pI need to
be present in pP.
The role of CAP_SETPCAP is basically to permit changes to pI beyond
the above:
if (pE & CAP_SETPCAP) {
pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0 */
}
This capability is useful for things like login, which (say, via
pam_cap) might want to raise certain inheritable capabilities for use
by the children of the logged-in user's shell, but those capabilities
are not useful to or needed by the login program itself.
One such use might be to limit who can run ping. You set the
capabilities of the 'ping' program to be "= cap_net_raw+i", and then
only shells that have (pI & CAP_NET_RAW) will be able to run
it. Without CAP_SETPCAP implemented as described above, login(pam_cap)
would have to also have (pP & CAP_NET_RAW) in order to raise this
capability and pass it on through the inheritable set.
Signed-off-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:58 +0000 (03:05 -0700)]
sysctl: deprecate sys_sysctl in a user space visible fashion.
After adding checking to register_sysctl_table and finding a whole new set
of bugs. Missed by countless code reviews and testers I have finally lost
patience with the binary sysctl interface.
The binary sysctl interface has been sort of deprecated for years and
finding a user space program that uses the syscall is more difficult then
finding a needle in a haystack. Problems continue to crop up, with the in
kernel implementation. So since supporting something that no one uses is
silly, deprecate sys_sysctl with a sufficient grace period and notice that
the handful of user space applications that care can be fixed or replaced.
The /proc/sys sysctl interface that people use will continue to be
supported indefinitely.
This patch moves the tested warning about sysctls from the path where
sys_sysctl to a separate path called from both implementations of
sys_sysctl, and it adds a proper entry into
Documentation/feature-removal-schedule.
Allowing us to revisit this in a couple years time and actually kill
sys_sysctl.
[lethal@linux-sh.org: sysctl: Fix syscall disabled build]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:57 +0000 (03:05 -0700)]
sysctl: for irda update sysctl_checks list of binary paths
It turns out that the net/irda code didn't register any of it's binary paths
in the global sysctl.h header file so I missed them completely when making an
authoritative list of binary sysctl paths in the kernel. So add them to the
list of valid binary sysctl paths.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:57 +0000 (03:05 -0700)]
sysctl: update sysctl_check_table
Well it turns out after I dug into the problems a little more I was returning
a few false positives so this patch updates my logic to remove them.
- Don't complain about 0 ctl_names in sysctl_check_binary_path
It is valid for someone to remove the sysctl binary interface
and still keep the same sysctl proc interface.
- Count ctl_names and procnames as matching if they both don't
exist.
- Only warn about missing min&max when the generic functions care.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:54 +0000 (03:05 -0700)]
sysctl: Error on bad sysctl tables
After going through the kernels sysctl tables several times it has become
clear that code review and testing is just not effective in prevent
problematic sysctl tables from being used in the stable kernel. I certainly
can't seem to fix the problems as fast as they are introduced.
Therefore this patch adds sysctl_check_table which is called when a sysctl
table is registered and checks to see if we have a problematic sysctl table.
The biggest part of the code is the table of valid binary sysctl entries, but
since we have frozen our set of binary sysctls this table should not need to
change, and it makes it much easier to detect when someone unintentionally
adds a new binary sysctl value.
As best as I can determine all of the several hundred errors spewed on boot up
now are legitimate.
[bunk@kernel.org: kernel/sysctl_check.c must #include <linux/string.h>]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:33 +0000 (03:05 -0700)]
sysctl: properly register the irda binary sysctl numbers
Grumble. These numbers should have been in sysctl.h from the beginning if we
ever expected anyone to use them. Oh well put them there now so we can find
them and make maintenance easier.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:32 +0000 (03:05 -0700)]
sysctl: remove the cad_pid binary sysctl path
It looks like we inadvertently killed the cad_pid binary sysctl support when
cap_pid was changed to be a struct pid. Since no one has complained just
remove the binary path.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:32 +0000 (03:05 -0700)]
sysctl: remove broken netfilter binary sysctls
No one has bothered to set strategy routine for the the netfilter sysctls that
return jiffies to be sysctl_jiffies.
So it appears the sys_sysctl path is unused and untested, so this patch
removes the binary sysctl numbers.
Which fixes the netfilter oops in 2.6.23-rc2-mm2 for me.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:31 +0000 (03:05 -0700)]
sysctl: simplify the pty sysctl logic
Instead of having a bunch of ifdefs in sysctl.c move all of the pty sysctl
logic into drivers/char/pty.c
As well as cleaning up the logic this prevents sysctl_check_table from
complaining that the root table has a NULL data pointer on something with
generic methods.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:30 +0000 (03:05 -0700)]
sysctl: parport remove binary paths
The sysctl binary paths don't look as if they even code work, .data is not
filled in, and all of the proc_handlers look at extra1 and there is not
strategy routine.
So just kill the binary paths.
In addition this patch removes the setting of extra1 on directories. It
doesn't look like the parport code ever examines it, and it's bad sysctl form.
[bunk@kernel.org: remove parport_device_num()]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:29 +0000 (03:05 -0700)]
sysctl: remove the binary interface for aio-nr, aio-max-nr, acpi_video_flags
aio-nr, aio-max-nr, acpi_video_flags are unsigned long values which sysctl
does not handle properly with a 64bit kernel and a 32bit user space.
Since no one is likely to be using the binary sysctl values and the ascii
interface still works, this patch just removes support for the binary sysctl
interface from the kernel.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:29 +0000 (03:05 -0700)]
sysctl: ipv4 remove binary sysctl paths where they are broken
Currently tcp_available_congestion_control does not even attempt being read
from sys_sysctl, and ipfrag_max_dist while it works allows setting of invalid
values using sys_sysctl.
So just kill the binary sys_sysctl support for these sysctls. If the support
is not important enough to test and get right it probably isn't important
enough to keep.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:28 +0000 (03:05 -0700)]
sysctl: remove broken cdrom binary sysctls
The binary interface for the cdrom sysctls can't possilby work. So remove the
binary sysctls and update the test for finding out which sysctl table entry we
are dealy with to use the procname and not the ctl_name (which I am removing).
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:27 +0000 (03:05 -0700)]
sysctl: x86_64 remove unnecessary binary paths
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:27 +0000 (03:05 -0700)]
sysctl: remove broken sunrpc debug binary sysctls
This is debug code so no need to support binary sysctl, and the binary sysctls
as they were written were not consistent with what showed up in /proc so
remove the binary sysctl support.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:26 +0000 (03:05 -0700)]
sysctl: ipv6 route flushing (kill binary path)
We don't preoperly support the sysctl binary path for flushing the ipv6
routes. So remove support for a binary path.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:25 +0000 (03:05 -0700)]
sysctl: fix neighbour table sysctls.
- In ipv6 ndisc_ifinfo_syctl_change so it doesn't depend on binary
sysctl names for a function that works with proc.
- In neighbour.c reorder the table to put the possibly unused entries
at the end so we can remove them by terminating the table early.
- In neighbour.c kill the entries with questionable binary sysctl
handling behavior.
- In neighbour.c if we don't have a strategy routine remove the
binary path. So we don't the default sysctl strategy routine
on data that is not ready for it.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:24 +0000 (03:05 -0700)]
sysctl: remove binary sysctl support where it clearly doesn't work
These functions are all wrapper functions for the proc interface that are
needed for them to work correctly.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:24 +0000 (03:05 -0700)]
sysctl mqueue: remove the binary sysctl numbers
Because of a conflict with FS_INODE_NR none of the binary sysctl numbers use
by mqueue, were available to user space. So just remove them.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:23 +0000 (03:05 -0700)]
sysctl: Factor out sysctl_data.
There as been no easy way to wrap the default sysctl strategy routine except
for returning 0. Which is not always what we want. The few instances I have
seen that want different behaviour have written their own version of
sysctl_data. While not too hard it is unnecessary code and has the potential
for extra bugs.
So to make these situations easier and make that part of sysctl more symetric
I have factord sysctl_data out of do_sysctl_strategy and exported as a
function everyone can use.
Further having sysctl_data be an explicit function makes checking for badly
formed sysctl tables much easier.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Thu, 18 Oct 2007 10:05:22 +0000 (03:05 -0700)]
sysctl core: Stop using the unnecessary ctl_table typedef
In sysctl.h the typedef struct ctl_table ctl_table violates coding style isn't
needed and is a bit of a nuisance because it makes it harder to recognize
ctl_table is a type name.
So this patch removes it from the generic sysctl code. Hopefully I will have
enough energy to send the rest of my patches will follow and to remove it from
the rest of the kernel.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Thu, 18 Oct 2007 10:05:22 +0000 (03:05 -0700)]
CIFS: ignore mode change if it's just for clearing setuid/setgid bits
If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing
the setuid/setgid bits. For CIFS, skip the mode change and let the server
handle it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Thu, 18 Oct 2007 10:05:21 +0000 (03:05 -0700)]
NFS: if ATTR_KILL_S*ID bits are set, then skip mode change
If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing
the setuid/setgid bits. For NFS, skip the mode change and let the server
handle it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Thu, 18 Oct 2007 10:05:20 +0000 (03:05 -0700)]
VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations
When an unprivileged process attempts to modify a file that has the setuid or
setgid bits set, the VFS will attempt to clear these bits. The VFS will set
the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid mask, and then call
notify_change to clear these bits and set the mode accordingly.
With a networked filesystem (NFS and CIFS in particular but likely others),
the client machine or process may not have credentials that allow for setting
the mode. In some situations, this can lead to file corruption, an operation
failing outright because the setattr fails, or to races that lead to a mode
change being reverted.
In this situation, we'd like to just leave the handling of this to the server
and ignore these bits. The problem is that by the time the setattr op is
called, the VFS has already reinterpreted the ATTR_KILL_* bits into a mode
change. The setattr operation has no way to know its intent.
The following patch fixes this by making notify_change no longer clear the
ATTR_KILL_SUID and ATTR_KILL_SGID bits in the ia_valid before handing it off
to the setattr inode op. setattr can then check for the presence of these
bits, and if they're set it can assume that the mode change was only for the
purposes of clearing these bits.
This means that we now have an implicit assumption that notify_change is never
called with ATTR_MODE and either ATTR_KILL_S*ID bit set. Nothing currently
enforces that, so this patch also adds a BUG() if that occurs.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Thu, 18 Oct 2007 10:05:19 +0000 (03:05 -0700)]
reiserfs: turn of ATTR_KILL_S*ID at beginning of reiserfs_setattr
reiserfs_setattr can call notify_change recursively using the same
iattr struct. This could cause it to trip the BUG() in notify_change.
Fix reiserfs to clear those bits near the beginning of the function.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Thu, 18 Oct 2007 10:05:19 +0000 (03:05 -0700)]
knfsd: only set ATTR_KILL_S*ID if ATTR_MODE isn't being explicitly set
It's theoretically possible for a single SETATTR call to come in that sets the
mode and the uid/gid. In that case, don't set the ATTR_KILL_S*ID bits since
that would trip the BUG() in notify_change. Just fix up the mode to have the
same effect.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Thu, 18 Oct 2007 10:05:17 +0000 (03:05 -0700)]
ecryptfs: allow lower fs to interpret ATTR_KILL_S*ID
Make sure ecryptfs doesn't trip the BUG() in notify_change. This also allows
the lower filesystem to interpret ATTR_KILL_S*ID in its own way.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Thu, 18 Oct 2007 10:05:16 +0000 (03:05 -0700)]
cpu hotplug: intel_cacheinfo: fix cpu hotplug error handling
- Fix resource leakage in error case within detect_cache_attributes()
- Don't register hotcpu notifier when cache_add_dev() returns error
- Introduce cache_dev_map cpumask to track whether cache interface for
CPU is successfully added by cache_add_dev() or not.
cache_add_dev() may fail with out of memory error. In order to
avoid cache_remove_dev() with that uninitialized cache interface when
CPU_DEAD event is delivered we need to have the cache_dev_map cpumask.
(We cannot change cache_add_dev() from CPU_ONLINE event handler
to CPU_UP_PREPARE event handler. Because cache_add_dev() needs
to do cpuid and store the results with its CPU online.)
[nix.or.die@googlemail.com: fix a section mismatch warning]
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Thu, 18 Oct 2007 10:05:15 +0000 (03:05 -0700)]
cpu hotplug: mce: fix cpu hotplug error handling
- Clear kobject in percpu device_mce before calling sysdev_register() with
Because mce_create_device() may fail and it leaves kobject filled with
junk. It will be the problem when mce_create_device() will be called
next time.
- Fix error handling in mce_create_device()
Error handling should not do sysdev_remove_file() with not yet added
attributes.
- Don't register hotcpu notifier when mce_create_device() returns error
- Do mce_create_device() in CPU_UP_PREPARE instead of CPU_ONLINE
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Thu, 18 Oct 2007 10:05:14 +0000 (03:05 -0700)]
cpu hotplug: msr: fix cpu hotplug error handling
Do msr_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Thu, 18 Oct 2007 10:05:13 +0000 (03:05 -0700)]
cpu hotplug: thermal_throttle: fix cpu hotplug error handling
Do thermal_throttle_add_dev() in CPU_UP_PREPARE instead of CPU_ONLINE.
Cc: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Thu, 18 Oct 2007 10:05:12 +0000 (03:05 -0700)]
cpu hotplug: topology: remove topology_dev_map
By previous cpu hotplug notifier change, we don't need to track topology_dev
existence for each cpu by topology_dev_map.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Thu, 18 Oct 2007 10:05:12 +0000 (03:05 -0700)]
cpu hotplug: cpu: deliver CPU_UP_CANCELED only to NOTIFY_OKed callbacks with CPU_UP_PREPARE
The functions in a CPU notifier chain is called with CPU_UP_PREPARE event
before making the CPU online. If one of the callback returns NOTIFY_BAD, it
stops to deliver CPU_UP_PREPARE event, and CPU online operation is canceled.
Then CPU_UP_CANCELED event is delivered to the functions in a CPU notifier
chain again.
This CPU_UP_CANCELED event is delivered to the functions which have been
called with CPU_UP_PREPARE, not delivered to the functions which haven't been
called with CPU_UP_PREPARE.
The problem that makes existing cpu hotplug error handlings complex is that
the CPU_UP_CANCELED event is delivered to the function that has returned
NOTIFY_BAD, too.
Usually we don't expect to call destructor function against the object that
has failed to initialize. It is like:
err = register_something();
if (err) {
unregister_something();
return err;
}
So it is natural to deliver CPU_UP_CANCELED event only to the functions that
have returned NOTIFY_OK with CPU_UP_PREPARE event and not to call the function
that have returned NOTIFY_BAD. This is what this patch is doing.
Otherwise, every cpu hotplug notifiler has to track whether notifiler event is
failed or not for each cpu. (drivers/base/topology.c is doing this with
topology_dev_map)
Similary this patch makes same thing with CPU_DOWN_PREPARE and CPU_DOWN_FAILED
evnets.
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Thu, 18 Oct 2007 10:05:11 +0000 (03:05 -0700)]
cpu hotplug: slab: fix memory leak in cpu hotplug error path
This patch fixes memory leak in error path.
In reality, we don't need to call cpuup_canceled(cpu) for now. But upcoming
cpu hotplug error handling change needs this.
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Thu, 18 Oct 2007 10:05:09 +0000 (03:05 -0700)]
cpu hotplug: slab: cleanup cpuup_callback()
cpuup_callback() is too long. This patch factors out CPU_UP_CANCELLED and
CPU_UP_PREPARE handlings from cpuup_callback().
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Thu, 18 Oct 2007 10:05:08 +0000 (03:05 -0700)]
update checkpatch.pl to version 0.11
This version brings a more cautious checkpatch.pl by default. The more
subjective checks are only applied with the --strict option. It also
brings the usual slew of corrections for false positives. Of note:
- new tree detection, the source tree will be found via the executable
- a major revamp of the unary detection to make it more parser like
- a new summary at the bottom of the report
- --strict option for subjective checks
- --file to enable checking on complete files
- support for use in emacs "compile" window
Andy Whitcroft (27):
Version: 0.11
fix up cat_vet for the case where there are no control characters
any cast to a pointer introduces a type
cpp unary operator detection needs to float
attributes are also valid in type definitions
sizeof may be a bareword and makes its argument unary
unary checks for #ifdef et al need to find end of line
add new --file mode to handle raw source files
add --strict/--subjective which enables the subjective tests
add some additional standard type suffixes
cpp #elif is also a unary prefix
case is not a function name
widen asm volatile exceptions
__kprobes is a type attribute
typeof is a unary operator
function open parenthesis checks should check all occurances
expand sizeof() binary exceptions
linux/irq.h should not be recommended
work harder to find the kernel root and add --root=
fix --emacs mode line numbers and string concatenation warnings
add a summary to the bottom of the main report
loosen assignment in if checks
update operator spacing to maintain tabs in output
revamp unary detection
corruption/line wrapped patches need only reporting once
revamp s/u/be/le 8/16/32/64 bit types
handle missing ,1 in uni-diff header
Mike D. Day (2):
Adds support to checkpatch.pl for running in the emacs compile window.
checkpatch: Fix line number reporting
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Young [Thu, 18 Oct 2007 10:05:07 +0000 (03:05 -0700)]
param_sysfs_builtin memchr argument fix
If memchr argument is longer than strlen(kp->name), there will be some
weird result.
It will casuse duplicate filenames in sysfs for the "nousb". kernel
warning messages are as bellow:
sysfs: duplicate filename 'usbcore' can not be created
WARNING: at fs/sysfs/dir.c:416 sysfs_add_one()
[<
c01c4750>] sysfs_add_one+0xa0/0xe0
[<
c01c4ab8>] create_dir+0x48/0xb0
[<
c01c4b69>] sysfs_create_dir+0x29/0x50
[<
c024e0fb>] create_dir+0x1b/0x50
[<
c024e3b6>] kobject_add+0x46/0x150
[<
c024e2da>] kobject_init+0x3a/0x80
[<
c053b880>] kernel_param_sysfs_setup+0x50/0xb0
[<
c053b9ce>] param_sysfs_builtin+0xee/0x130
[<
c053ba33>] param_sysfs_init+0x23/0x60
[<
c024d062>] __next_cpu+0x12/0x20
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052a856>] do_initcalls+0x46/0x1e0
[<
c01bdb12>] create_proc_entry+0x52/0x90
[<
c0158d4c>] register_irq_proc+0x9c/0xc0
[<
c01bda94>] proc_mkdir_mode+0x34/0x50
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052aa92>] kernel_init+0x62/0xb0
[<
c0104f83>] kernel_thread_helper+0x7/0x14
=======================
kobject_add failed for usbcore with -EEXIST, don't try to register things with the same name in the same directory.
[<
c024e466>] kobject_add+0xf6/0x150
[<
c053b880>] kernel_param_sysfs_setup+0x50/0xb0
[<
c053b9ce>] param_sysfs_builtin+0xee/0x130
[<
c053ba33>] param_sysfs_init+0x23/0x60
[<
c024d062>] __next_cpu+0x12/0x20
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052a856>] do_initcalls+0x46/0x1e0
[<
c01bdb12>] create_proc_entry+0x52/0x90
[<
c0158d4c>] register_irq_proc+0x9c/0xc0
[<
c01bda94>] proc_mkdir_mode+0x34/0x50
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052aa92>] kernel_init+0x62/0xb0
[<
c0104f83>] kernel_thread_helper+0x7/0x14
=======================
Module 'usbcore' failed to be added to sysfs, error number -17
The system will be unstable now.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 18 Oct 2007 10:05:07 +0000 (03:05 -0700)]
stop using DMA_xxBIT_MASK
Now that we have DMA_BIT_MASK(), these macros are pointless.
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Thu, 18 Oct 2007 10:05:06 +0000 (03:05 -0700)]
unify DMA_..BIT_MASK definitions: v3.1
Remove redundant DMA_..BIT_MASK definitions across two drivers. The
computation of the majority of the bitmasks is done by the compiler. The
initial split of the patch touching each a different file got removed due
to possible git bisect breakage.
Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Reviewed-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tony Breeds [Thu, 18 Oct 2007 10:04:57 +0000 (03:04 -0700)]
Fix discrepancy between VDSO based gettimeofday() and sys_gettimeofday().
On platforms that copy sys_tz into the vdso (currently only x86_64, soon to
include powerpc), it is possible for the vdso to get out of sync if a user
calls (admittedly unusual) settimeofday(NULL, ptr).
This patch adds a hook for architectures that set
CONFIG_GENERIC_TIME_VSYSCALL to ensure when sys_tz is updated they can also
updatee their copy in the vdso.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 18 Oct 2007 10:04:56 +0000 (03:04 -0700)]
Remove struct task_struct::io_wait
Hell knows what happened in commit
63b05203af57e7de4f3bb63b8b81d43bc196d32b
during 2.6.9 development. Commit introduced io_wait field which remained
write-only than and still remains write-only.
Also garbage collect macros which "use" io_wait.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:56 +0000 (03:04 -0700)]
Hibernation: Enter platform hibernation state in a consistent way
Make hibernation_platform_enter() execute the enter-a-sleep-state sequence
instead of the mixed shutdown-with-entering-S4 thing.
Replace the shutting down of devices done by kernel_shutdown_prepare(), before
entering the ACPI S4 sleep state, with suspending them and the shutting down
of sysdevs with calling device_power_down(PMSG_SUSPEND) (just like before
entering S1 or S3, but the target state is now S4). Also, disable the
nonboot CPUs before entering the sleep state (S4), which generally always is a
good idea.
This is known to fix the "double disk spin down during hibernation" on some
machines, eg. HPC nx6325 (ref. http://lkml.org/lkml/2007/8/7/316 and the
following thread). Moreover, it has been reported to make
/sys/class/rtc/rtc0/wakealarm work correctly with hibernation for some users.
It also generally causes the hibernation state (ACPI S4) to be entered faster.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:55 +0000 (03:04 -0700)]
Hibernation: Check if ACPI is enabled during restore in the right place
The following scenario leads to total confusion of the platform firmware on
some boxes (eg. HPC nx6325):
* Hibernate with ACPI enabled
* Resume passing "acpi=off" to the boot kernel
To prevent this from happening it's necessary to check if ACPI is enabled (and
enable it if that's not the case) _right_ _after_ control has been transfered
from the boot kernel to the image kernel, before device_power_up() is called
(ie. with interrupts disabled). Enabling ACPI after calling
device_power_up() turns out to be insufficient.
For this reason, introduce new hibernation callback ->leave() that will be
executed before device_power_up() by the restored image kernel. To make it
work, it also is necessary to move swsusp_suspend() from swsusp.c to disk.c
(it's name is changed to "create_image", which is more up to the point).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:54 +0000 (03:04 -0700)]
Hibernation: Use temporary page tables for kernel text mapping on x86_64
Use temporary page tables for the kernel text mapping during hibernation
restore on x86_64.
Without the patch, the original boot kernel's page tables that represent the
kernel text mapping are used while the core of the image kernel is being
restored. However, in principle, if the boot kernel is not identical to the
image kernel, the location of these page tables in the image kernel need not
be the same, so we should create a safe copy of the kernel text mapping prior
to restoring the core of the image kernel.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:54 +0000 (03:04 -0700)]
Hibernation: Pass CR3 in the image header on x86_64
Since we already pass the address of restore_registers() in the image header,
we can also pass the value of the CR3 register from before the hibernation in
the same way. This will allow us to avoid using init_level4_pgt page tables
during the restore.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:53 +0000 (03:04 -0700)]
Hibernation: Arbitrary boot kernel support on x86_64
Make it possible to restore a hibernation image on x86_64 with the help of a
kernel different from the one in the image.
The idea is to split the core restoration code into two separate parts and to
place each of them in a different page. The first part belongs to the boot
kernel and is executed as the last step of the image kernel's memory
restoration procedure. Before being executed, it is relocated to a safe page
that won't be overwritten while copying the image kernel pages.
The final operation performed by it is a jump to the second part of the core
restoration code that belongs to the image kernel and has just been restored.
This code makes the CPU switch to the image kernel's page tables and restores
the state of general purpose registers (including the stack pointer) from
before the hibernation.
The main issue with this idea is that in order to jump to the second part of
the core restoration code the boot kernel needs to know its address.
However, this address may be passed to it in the image header. Namely, the
part of the image header previously used for checking if the version of the
image kernel is correct can be replaced with some architecture specific data
that will allow the boot kernel to jump to the right address within the image
kernel. These data should also be used for checking if the image kernel is
compatible with the boot kernel (as far as the memory restroration procedure
is concerned). It can be done, for example, with the help of a "magic" value
that has to be equal in both kernels, so that they can be regarded as
compatible.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:52 +0000 (03:04 -0700)]
Hibernation: Arbitrary boot kernel support - generic code
Add the bits needed for supporting arbitrary boot kernels to the common
hibernation code.
To support arbitrary boot kernels, make it possible to replace the 'struct
new_utsname' and the kernel version in the hibernation image header by some
architecture specific data that will be used to verify if the image is valid
and to restore the image.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Machek [Thu, 18 Oct 2007 10:04:51 +0000 (03:04 -0700)]
s2ram: kill old debugging junk
This removes old debugging stuff, that should be no longer neccessary. It
accessed VGA hardware (which may not be ready at this point), and used LEDs
at port 80 for debugging.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andres Salomon [Thu, 18 Oct 2007 10:04:50 +0000 (03:04 -0700)]
serial: turn serial console suspend a boot rather than compile time option
Currently, there's a CONFIG_DISABLE_CONSOLE_SUSPEND that allows one to stop
the serial console from being suspended when the rest of the machine goes
to sleep. This is incredibly useful for debugging power management-related
things; however, having it as a compile-time option has proved to be
incredibly inconvenient for us (OLPC). There are plenty of times that we
want serial console to not suspend, but for the most part we'd like serial
console to be suspended.
This drops CONFIG_DISABLE_CONSOLE_SUSPEND, and replaces it with a kernel
boot parameter (no_console_suspend). By default, the serial console will
be suspended along with the rest of the system; by passing
'no_console_suspend' to the kernel during boot, serial console will remain
alive during suspend.
For now, this is pretty serial console specific; further fixes could be
applied to make this work for things like netconsole.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:49 +0000 (03:04 -0700)]
freezer: measure freezing time
Measure the time of the freezing of tasks, even if it doesn't fail.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:48 +0000 (03:04 -0700)]
freezer: be more verbose
Increase the freezer's verbosity a bit, so that it's easier to read problem
reports related to it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:48 +0000 (03:04 -0700)]
pm_trace displays the wrong time from the RTC
The way in which read_magic_time() displays the date read from the RTC is
apparently confusing to the users (cf.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=250238). Make it
print dates in the standard way.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Thu, 18 Oct 2007 10:04:47 +0000 (03:04 -0700)]
unexport pm_power_off_prepare
This patch removes the unused EXPORT_SYMBOL(pm_power_off_prepare).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:46 +0000 (03:04 -0700)]
freezer: do not send signals to kernel threads
The freezer should not send signals to kernel threads, since that may lead to
subtle problems. In particular, commit
b74d0deb968e1f85942f17080eace015ce3c332c has changed recalc_sigpending_tsk()
so that it doesn't clear TIF_SIGPENDING. For this reason, if the freezer
continues to send fake signals to kernel threads and the freezing of kernel
threads fails, some of them may be running with TIF_SIGPENDING set forever.
Accordingly, recalc_sigpending_tsk() shouldn't set the task's TIF_SIGPENDING
flag if TIF_FREEZE is set.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:45 +0000 (03:04 -0700)]
freezer: introduce freezer-friendly waiting macros
Introduce freezer-friendly wrappers around wait_event_interruptible() and
wait_event_interruptible_timeout(), originally defined in <linux/wait.h>, to
be used in freezable kernel threads. Make some of the freezable kernel
threads use them.
This is necessary for the freezer to stop sending signals to kernel threads,
which is implemented in the next patch.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:45 +0000 (03:04 -0700)]
freezer: prevent new tasks from inheriting TIF_FREEZE set
Tasks should go to the refrigerator only if explicitly requested to do that by
the freezer and not as a result of inheriting the TIF_FREEZE flag set from the
parent. Make it happen.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:44 +0000 (03:04 -0700)]
freezer: do not sync filesystems from freeze_processes
The syncing of filesystems from within the freezer is generally not needed.
Also, if there's an ext3 filesystem loopback-mounted from a FUSE one, the
syncing results in writes to it and deadlocks. Similarly, it will deadlock if
FUSE implements sync.
Change freeze_processes() so that it doesn't execute sys_sync() and make the
suspend and hibernation code path sync filesystems independently of the
freezer.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:43 +0000 (03:04 -0700)]
freezer: document relationship with memory shrinking
One important reason to freeze tasks, which is that we don't want them to
allocate memory after freeing it for the hibernation image, has not been
documented. Fix it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:43 +0000 (03:04 -0700)]
PM: Rename hibernation_ops to platform_hibernation_ops
Rename 'struct hibernation_ops' to 'struct platform_hibernation_ops' in
analogy with 'struct platform_suspend_ops'.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:42 +0000 (03:04 -0700)]
PM: Rework struct hibernation_ops
During hibernation we also need to tell the ACPI core that we're going to put
the system into the S4 sleep state. For this reason, an additional method in
'struct hibernation_ops' is needed, playing the role of set_target() in
'struct platform_suspend_operations'. Moreover, the role of the .prepare()
method is now different, so it's better to introduce another method, that in
general may be different from .prepare(), that will be used to prepare the
platform for creating the hibernation image (.prepare() is used anyway to
notify the platform that we're going to enter the low power state after the
image has been saved).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:41 +0000 (03:04 -0700)]
PM: Make suspend_ops static
The variable suspend_ops representing the set of global platform-specific
suspend-related operations, used by the PM core, need not be exported outside
of kernel/power/main.c . Make it static.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:41 +0000 (03:04 -0700)]
PM: Rework struct platform_suspend_ops
There is no reason why the .prepare() and .finish() methods in 'struct
platform_suspend_ops' should take any arguments, since architectures don't use
these methods' argument in any practically meaningful way (ie. either the
target system sleep state is conveyed to the platform by .set_target(), or
there is only one suspend state supported and it is indicated to the PM core
by .valid(), or .prepare() and .finish() aren't defined at all). There also
is no reason why .finish() should return any result.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:40 +0000 (03:04 -0700)]
PM: Rename struct pm_ops and related things
The name of 'struct pm_ops' suggests that it is related to the power
management in general, but in fact it is only related to suspend. Moreover,
its name should indicate what this structure is used for, so it seems
reasonable to change it to 'struct platform_suspend_ops'. In that case, the
name of the global variable of this type used by the PM core and the names of
related functions should be changed accordingly.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Thu, 18 Oct 2007 10:04:39 +0000 (03:04 -0700)]
PM: Move definition of struct pm_ops to suspend.h
Move the definition of 'struct pm_ops' and related functions from <linux/pm.h>
to <linux/suspend.h> .
There are, at least, the following reasons to do that:
* 'struct pm_ops' is specifically related to suspend and not to the power
management in general.
* As long as 'struct pm_ops' is defined in <linux/pm.h>, any modification of it
causes the entire kernel to be recompiled, which is unnecessary and annoying.
* Some suspend-related features are already defined in <linux/suspend.h>, so it
is logical to move the definition of 'struct pm_ops' into there.
* 'struct hibernation_ops', being the hibernation-related counterpart of
'struct pm_ops', is defined in <linux/suspend.h> .
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Thu, 18 Oct 2007 10:04:37 +0000 (03:04 -0700)]
make kernel/power/main.c:suspend_enter() static
suspend_enter() can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Thu, 18 Oct 2007 10:04:37 +0000 (03:04 -0700)]
logo.c: get rid of mips_machgroup
This has not been any serious user of this ill conceived thing since the
original invention in like '95 so I recently deleted this from everywhere
except the last instance in logo.c. This patch removes the last two
instances in logo.c. They conditions were not useful anyway as when
compiled in they would always evaluate as true.
Last not least this is necessary to get the SGI IP22 and DECstation kernels
to compile again.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Thu, 18 Oct 2007 10:04:36 +0000 (03:04 -0700)]
fb modedb: Refactor confusing mode_option assignment
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maciej W. Rozycki [Thu, 18 Oct 2007 10:04:35 +0000 (03:04 -0700)]
tty_ioctl: fix the baud_table check in encode_baud_rate
The tty_termios_encode_baud_rate() function as defined by tty_ioctl.c has a
problem with the baud_table within. The comparison operators are reversed
and as a result this table's entries never match and BOTHER is always used.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Engelhardt [Thu, 18 Oct 2007 10:04:34 +0000 (03:04 -0700)]
Remove CONFIG_VT_UNICODE
Since default_utf8 is already a sysfs attribute, having an extra
CONFIG_VT_UNICODE compile-time option is redundant, since sysfs attributes can
be set at boot and run time.
Also let Linux VCs default to UTF-8 (as per the discussion at
http://lkml.org/lkml/2007/9/6/99).
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Cc: Bill Nottingham <notting@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by@vergenet.net":Simon [Thu, 18 Oct 2007 10:04:33 +0000 (03:04 -0700)]
Kexec: Update URL in MAINTAINERS file
I'm not sure that the new URL satifies the requirement of status/info, but
it does at least as good a job as the old URL, and contains current
releases of kexec-tools, rather than somewhat ancient versions.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Karsten Keil [Thu, 18 Oct 2007 10:04:32 +0000 (03:04 -0700)]
i4l: Fix random hard freeze with AVM c4 card
The patch
- Includes the call to capilib_data_b3_req in the spinlock. This routine
in turn calls the offending mq_enqueue routine that triggered the
freeze if not locked. This should also fix other indicators of
incosistent capilib_msgidqueue list, that trigger messages like:
Oct 5 03:05:57 BERL0 kernel: kcapi: msgid 3019 ncci 0x30301 not on queue
that we saw several times a day (usually several in a row).
- Fixes all occurrences of c4_dispatch_tx to be called with active
spinlock, there were some instances where no lock was active. Mostly
these are in very infrequently called routines, so the additional
performance penalty is minimal.
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Rainer Brestan <rainer.brestan@frequentis.com>
Signed-off-by: Ralf Schlatterbeck <rsc@runtux.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Karsten Keil [Thu, 18 Oct 2007 10:04:31 +0000 (03:04 -0700)]
i4l: fix random freezes with AVM B1 drivers
This fix the same issue which was debbuged for the C4 controller for the B1
versions.
The capilib_ function modify or traverse a linked list without locking.
This patch extends the existing locking to the calls of these function to
prevent access to a list which is in the middle of a modification.
Signed-off-by: Karsten Keil <kkeil@suse.de>
C: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Thu, 18 Oct 2007 10:04:30 +0000 (03:04 -0700)]
au1100fb: fix modpost warnings
MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x170be8): Section mismatch: reference to .init.data:au1100fb_fix (between 'au1100fb_drv_probe' and 'read_null')
WARNING: vmlinux.o(.text+0x170dc4): Section mismatch: reference to .init.data:au1100fb_var (between 'au1100fb_drv_probe' and 'read_null')
WARNING: vmlinux.o(.text+0x170dd0): Section mismatch: reference to .init.data:au1100fb_fix (between 'au1100fb_drv_probe' and 'read_null')
WARNING: vmlinux.o(.text+0x170de0): Section mismatch: reference to .init.data:au1100fb_var (between 'au1100fb_drv_probe' and 'read_null')
WARNING: vmlinux.o(.text+0x170e70): Section mismatch: reference to .init.data:au1100fb_var (between 'au1100fb_drv_probe' and 'read_null')
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Thu, 18 Oct 2007 10:04:29 +0000 (03:04 -0700)]
netport_con.c: fix build errors and warnings
Fix build broken by
accaa24c492f1aa3b9c37226d868dc59c3007531:
CC drivers/video/console/newport_con.o
drivers/video/console/newport_con.c: In function 'newport_show_logo':
drivers/video/console/newport_con.c:111: error: assignment of read-only location
drivers/video/console/newport_con.c:111: warning: assignment makes integer from pointer without a cast
drivers/video/console/newport_con.c:112: error: assignment of read-only location
drivers/video/console/newport_con.c:112: warning: assignment makes integer from pointer without a cast
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Sandeen [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: lighten up resize transaction requirements
When resizing online, setup_new_group_blocks attempts to reserve a
potentially very large transaction, depending on the current filesystem
geometry. For some journal sizes, there may not be enough room for this
transaction, and the online resize will fail.
The patch below resizes & restarts the transaction as necessary while
setting up the new group, and should work with even the smallest journal.
Tested with something like:
[root@newbox ~]# dd if=/dev/zero of=fsfile bs=1024 count=32768
[root@newbox ~]# mkfs.ext3 -b 1024 fsfile 16384
[root@newbox ~]# mount -o loop fsfile mnt/
[root@newbox ~]# resize2fs /dev/loop0
resize2fs 1.40.2 (12-Jul-2007)
Filesystem at /dev/loop0 is mounted on /root/mnt; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/loop0 to 32768 (1k) blocks.
resize2fs: No space left on device While trying to add group #2
[root@newbox ~]# dmesg | tail -n 1
JBD: resize2fs wants too many credits (258 > 256)
[root@newbox ~]#
With the below change, it works.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Acked-by: Andreas Dilger <adilger@clusterfs.com>
Eric Sandeen [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: fix setup_new_group_blocks locking
setup_new_group_blocks() manipulates the group descriptor block bh
under the block_bitmap bh's lock. It shouldn't matter since nobody
but resize should be touching these blocks, but it's worth fixing up.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: sparse fixes
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert ext4_extent_idx.ei_leaf to ext4_extent_idx.ei_leaf_lo
Convert ext4_extent_idx.ei_leaf ext4_extent_idx.ei_leaf_lo
This helps in finding BUGs due to direct partial access of
these split 48 bit values.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert ext4_extent.ee_start to ext4_extent.ee_start_lo
Convert ext4_extent.ee_start to ext4_extent.ee_start_lo
This helps in finding BUGs due to direct partial access of
these split 48 bit values
Also fix direct partial access in ext4 code
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert s_r_blocks_count and s_free_blocks_count
Convert s_r_blocks_count and s_free_blocks_count to
s_r_blocks_count_lo and s_free_blocks_count_lo
This helps in finding BUGs due to direct partial access of
these split 64 bit values
Also fix direct partial access in ext4 code
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert s_blocks_count to s_blocks_count_lo
Convert s_blocks_count to s_blocks_count_lo
This helps in finding BUGs due to direct partial access of
these split 64 bit values
Also fix direct partial access in ext4 code
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert bg_inode_bitmap and bg_inode_table
Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo
and bg_inode_table_lo. This helps in finding BUGs due to
direct partial access of these split 64 bit values
Also fix one direct partial access
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert bg_block_bitmap to bg_block_bitmap_lo
Convert bg_block_bitmap to bg_block_bitmap_lo
This helps in catching some BUGS due to direct
partial access of these split fields.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Jose R. Santos [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: FLEX_BG Kernel support v2.
This feature relaxes check restrictions on where each block groups meta
data is located within the storage media. This allows for the allocation
of bitmaps or inode tables outside the block group boundaries in cases
where bad blocks forces us to look for new blocks which the owning block
group can not satisfy. This will also allow for new meta-data allocation
schemes to improve performance and scalability.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Fix sparse warnings
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andreas Dilger [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
Ext4: Uninitialized Block Groups
In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
regardless of whether it is in use. This is this the most time consuming part
of the filesystem check. The unintialized block group feature can greatly
reduce e2fsck time by eliminating checking of uninitialized inodes.
With this feature, there is a a high water mark of used inodes for each block
group. Block and inode bitmaps can be uninitialized on disk via a flag in the
group descriptor to avoid reading or scanning them at e2fsck time. A checksum
of each group descriptor is used to ensure that corruption in the group
descriptor's bit flags does not cause incorrect operation.
The feature is enabled through a mkfs option
mke2fs /dev/ -O uninit_groups
A patch adding support for uninitialized block groups to e2fsprogs tools has
been posted to the linux-ext4 mailing list.
The patches have been stress tested with fsstress and fsx. In performance
tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
linearly with the total number of inodes in the filesytem. In ext4 with the
uninitialized block groups feature, the e2fsck time is constant, based
solely on the number of used inodes rather than the total inode count.
Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
greatly reduce e2fsck time for users. With performance improvement of 2-20
times, depending on how full the filesystem is.
The attached graph shows the major improvements in e2fsck times in filesystems
with a large total inode count, but few inodes in use.
In each group descriptor if we have
EXT4_BG_INODE_UNINIT set in bg_flags:
Inode table is not initialized/used in this group. So we can skip
the consistency check during fsck.
EXT4_BG_BLOCK_UNINIT set in bg_flags:
No block in the group is used. So we can skip the block bitmap
verification for this group.
We also add two new fields to group descriptor as a part of
uninitialized group patch.
__le16 bg_itable_unused; /* Unused inodes count */
__le16 bg_checksum; /* crc16(sb_uuid+group+desc) */
bg_itable_unused:
If we have EXT4_BG_INODE_UNINIT not set in bg_flags
then bg_itable_unused will give the offset within
the inode table till the inodes are used. This can be
used by fsck to skip list of inodes that are marked unused.
bg_checksum:
Now that we depend on bg_flags and bg_itable_unused to determine
the block and inode usage, we need to make sure group descriptor
is not corrupt. We add checksum to group descriptor to
detect corruption. If the descriptor is found to be corrupt, we
mark all the blocks and inodes in the group used.
Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Eric Sandeen [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: remove #ifdef CONFIG_EXT4_INDEX
CONFIG_EXT4_INDEX is not an exposed config option in the kernel, and it is
unconditionally defined in ext4_fs.h. tune2fs is already able to turn off
dir indexing, so at this point it's just cluttering up the code. Remove
it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Coly Li [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Remove (partial, never completed) fragment support
Fragment support in ext2/3/4 was never implemented, and it probably will
never be implemented. So remove it from ext4.
Signed-off-by: Coly Li <coyli@suse.de>
Acked-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>