Linus Torvalds [Fri, 7 Dec 2007 18:59:33 +0000 (10:59 -0800)]
Merge branch 'master' of /linux/kernel/git/davem/sparc-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Fix memory controller register access when non-SMP.
Linus Torvalds [Fri, 7 Dec 2007 18:58:19 +0000 (10:58 -0800)]
Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds:
leds: Fix led trigger locking bugs
David S. Miller [Fri, 7 Dec 2007 08:58:55 +0000 (00:58 -0800)]
[SPARC64]: Fix memory controller register access when non-SMP.
get_cpu() always returns zero on non-SMP builds, but we
really want the physical cpu number in this code in order
to do the right thing.
Based upon a non-SMP kernel boot failure report from Bernd Zeimetz.
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Purdie [Sat, 10 Nov 2007 13:29:04 +0000 (13:29 +0000)]
leds: Fix led trigger locking bugs
Convert part of the led trigger core from rw spinlocks to rw
semaphores. We're calling functions which can sleep from invalid
contexts otherwise. Fixes bug #9264.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Linus Torvalds [Fri, 7 Dec 2007 01:50:07 +0000 (17:50 -0800)]
Merge branch 'merge' of /linux/kernel/git/paulus/powerpc
* 'merge' of master.kernel.org:/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] virtex bug fix: Use canonical value for AC97 interrupt xparams
[POWERPC] Update defconfigs
[POWERPC] PS3: Update ps3_defconfig
[POWERPC] Update iseries_defconfig
[POWERPC] Fix hardware IRQ time accounting problem.
Grant Likely [Thu, 6 Dec 2007 19:16:44 +0000 (06:16 +1100)]
[POWERPC] virtex bug fix: Use canonical value for AC97 interrupt xparams
The ml300 and ml403 xparameters.h files use different macros for the
AC97 interrupt pin assignments. This normalizes them to a canonical
value similar to what EDK generates for most other devices. This is
needed to get ml300 support to compile in arch/ppc.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Linus Torvalds [Thu, 6 Dec 2007 22:14:16 +0000 (14:14 -0800)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: suspend: old debugging hacks sneaked back
Freezer: Fix JFFS2 garbage collector freezing issue (rev. 2)
HWMON: coretemp, suspend fix
Freezer: Fix APM emulation breakage
Freezer: Fix s2disk resume from initrd
Len Brown [Thu, 6 Dec 2007 21:52:00 +0000 (16:52 -0500)]
Pull bugzilla-9345 into release branch
Len Brown [Thu, 6 Dec 2007 21:51:29 +0000 (16:51 -0500)]
Pull apm-freeze-fix into release branch
Len Brown [Thu, 6 Dec 2007 21:26:52 +0000 (16:26 -0500)]
Pull suspend-2.6.24 into release branch
Pavel Machek [Thu, 6 Dec 2007 08:50:40 +0000 (09:50 +0100)]
ACPI: suspend: old debugging hacks sneaked back
Old debugging hack sneaked back during x86 merge, this removes it.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Thu, 6 Dec 2007 20:27:09 +0000 (12:27 -0800)]
Merge branch 'for-2.6.24' of git://git./linux/kernel/git/galak/powerpc
* 'for-2.6.24' of git://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc:
[POWERPC] Fix swapper_pg_dir size when CONFIG_PTE_64BIT=y on FSL_BOOKE
Linus Torvalds [Thu, 6 Dec 2007 20:26:17 +0000 (12:26 -0800)]
Merge git://git./linux/kernel/git/kyle/parisc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
[PARISC] lba_pci: pci_claim_resources disabled expansion roms
[PARISC] print more than one character at a time for pdc console
[PARISC] Update parisc-linux MAINTAINERS entries
[PARISC] timer interrupt should not be IRQ_DISABLED
Revert "[PARISC] import necessary bits of libgcc.a"
Kumar Gala [Thu, 6 Dec 2007 19:11:04 +0000 (13:11 -0600)]
[POWERPC] Fix swapper_pg_dir size when CONFIG_PTE_64BIT=y on FSL_BOOKE
The size of swapper_pg_dir is 8k instead of 4k when using 64-bit PTEs
(CONFIG_PTE_64BIT).
This was reported by Cedric Hombourger <chombourger@gmail.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Linus Torvalds [Thu, 6 Dec 2007 17:43:26 +0000 (09:43 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Oprofile: Fix computation of number of counters.
[MIPS] Alchemy: fix IRQ bases
[MIPS] Alchemy: replace ffs() with __ffs()
[MIPS] BCM1480: Fix interrupt routing, take 2.
Linus Torvalds [Thu, 6 Dec 2007 17:41:12 +0000 (09:41 -0800)]
Tiny clean-up of OPROFILE/KPROBES configuration
Make the Kconfig.instrumentation file a bit easier on the eyes, and use
the new ARCH_SUPPORTS_OPROFILE for x86[-64].
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyle McMartin [Thu, 6 Dec 2007 17:38:26 +0000 (09:38 -0800)]
[PARISC] lba_pci: pci_claim_resources disabled expansion roms
radeonfb was HPMC-ing my C8000 by trying to map its expansion rom from
IO_VIEW, instead of PA_VIEW. Fix seems to be to ensure that its disabled
ROM is properly inserted into the resource tree.
FIXME: this will result in a whinging printk for cards which share expansion
ROMS, such as a quad tulip. Thankfully, it isn't harmful.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Ralf Baechle [Thu, 6 Dec 2007 16:53:19 +0000 (16:53 +0000)]
Fix oprofile configuration breakage
The cleanup
09cadedbdc01f1a4bea1f427d4fb4642eaa19da9 broke the oprofile
configuration for MIPS by allowing oprofile support to be built for
kernel models where oprofile doesn't have a chance in hell to work.
Just a dependecy list on a number of architectures is - surprise - broken
and should as per past discussions probably in most considered to be
broken in most cases. So I introduce a dependency for the oprofile
configuration on ARCH_SUPPORTS_OPROFILE.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyle McMartin [Thu, 6 Dec 2007 17:32:15 +0000 (09:32 -0800)]
[PARISC] print more than one character at a time for pdc console
There's really no reason not to print more than one character at a
time to the PDC console... Booting is measurably speedier, and now I don't
have to watch individual characters get drawn.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Kyle McMartin [Mon, 3 Dec 2007 22:04:34 +0000 (22:04 +0000)]
[PARISC] Update parisc-linux MAINTAINERS entries
List changed & reordered so I'm more likely to see patches...
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Kyle McMartin [Wed, 28 Nov 2007 07:17:53 +0000 (02:17 -0500)]
[PARISC] timer interrupt should not be IRQ_DISABLED
The timer interrupt had accidentally been marked IRQ_DISABLED since
IRQ_PER_CPU had been OR-ed in, instead of set. This had been working
by accident for quite a while.
Commit
c642b8391cf8efc3622cc97329a0f46e7cbb70b8 changed the behaviour of
IRQ_PER_CPU interrupts, which previously weren't checked for IRQ_DISABLED.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Kyle McMartin [Wed, 28 Nov 2007 07:07:35 +0000 (02:07 -0500)]
Revert "[PARISC] import necessary bits of libgcc.a"
This reverts commit
efb80e7e097d0888e59fbbe4ded2ac5a256f556d, it turned
out to cause sporadic problems with the timer interrupt on 32-bit kernels.
Needs more investigation.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Ralf Baechle [Thu, 6 Dec 2007 09:12:28 +0000 (09:12 +0000)]
[MIPS] Oprofile: Fix computation of number of counters.
VSMP kernels will split the available performance counters between the two
processors / cores. But don't do this when we're not on a VSMP system ...
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Sergei Shtylyov [Wed, 5 Dec 2007 16:08:26 +0000 (19:08 +0300)]
[MIPS] Alchemy: fix IRQ bases
Do what the commits commits
f3e8d1da389fe2e514e31f6e93c690c8e1243849 and
9d360ab4a7568a8d177280f651a8a772ae52b9b9 failed to achieve -- actually
convert the Alchemy code to irq_cpu.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Sergei Shtylyov [Wed, 5 Dec 2007 16:08:24 +0000 (19:08 +0300)]
[MIPS] Alchemy: replace ffs() with __ffs()
Fix havoc wrought by commit
56f621c7f6f735311eed3f36858b402013023c18 --
au_ffs() and ffs() are equivalent, that patch should have just replaced one
with another. Now replace ffs() with __ffs() which returns an unbiased bit
number.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 6 Dec 2007 17:15:57 +0000 (17:15 +0000)]
[MIPS] BCM1480: Fix interrupt routing, take 2.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Paul Mackerras [Thu, 6 Dec 2007 05:53:35 +0000 (16:53 +1100)]
[POWERPC] Update defconfigs
This updates all the defconfigs in arch/powerpc/configs except iseries
and ps3, which were updated by the preceding commits.
This mostly takes the defaults, except that I turned on tickless idle
and high-resolution timers for everything, and turned off instrumentation
support and "Fair group CPU scheduler" for the smaller/embedded platforms.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Geoff Levand [Wed, 5 Dec 2007 07:13:38 +0000 (18:13 +1100)]
[POWERPC] PS3: Update ps3_defconfig
Update ps3_defconfig.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Stephen Rothwell [Wed, 21 Nov 2007 00:45:52 +0000 (11:45 +1100)]
[POWERPC] Update iseries_defconfig
The notable changes here are the enabling of NO_HZ and HIGH_RES_TIMERS.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Tony Breeds [Tue, 4 Dec 2007 05:51:44 +0000 (16:51 +1100)]
[POWERPC] Fix hardware IRQ time accounting problem.
The commit
fa13a5a1f25f671d084d8884be96fc48d9b68275 (sched: restore
deterministic CPU accounting on powerpc), unconditionally calls
update_process_tick() in system context. In the deterministic
accounting case this is the correct thing to do. However, in the
non-deterministic accounting case we need to not do this, since doing
this results in the time accounted as hardware irq time being
artificially elevated.
Also this collapses 2 consecutive '#ifdef CONFIG_VIRT_CPU_ACCOUNTING'
checks in time.h into one for neatness.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Linus Torvalds [Wed, 5 Dec 2007 17:27:46 +0000 (09:27 -0800)]
Merge git://git./linux/kernel/git/mingo/linux-2.6-sched
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
futex: correctly return -EFAULT not -EINVAL
lockdep: in_range() fix
lockdep: fix debug_show_all_locks()
sched: style cleanups
futex: fix for futex_wait signal stack corruption
Linus Torvalds [Wed, 5 Dec 2007 17:26:52 +0000 (09:26 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
VM/Security: add security hook to do_brk
Security: round mmap hint address above mmap_min_addr
security: protect from stack expantion into low vm addresses
Security: allow capable check to permit mmap or low vm space
SELinux: detect dead booleans
SELinux: do not clear f_op when removing entries
Linus Torvalds [Wed, 5 Dec 2007 17:26:13 +0000 (09:26 -0800)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
[LRO]: fix lro_gen_skb() alignment
[TCP]: NAGLE_PUSH seems to be a wrong way around
[TCP]: Move prior_in_flight collect to more robust place
[TCP] FRTO: Use of existing funcs make code more obvious & robust
[IRDA]: Move ircomm_tty_line_info() under #ifdef CONFIG_PROC_FS
[ROSE]: Trivial compilation CONFIG_INET=n case
[IPVS]: Fix sched registration race when checking for name collision.
[IPVS]: Don't leak sysctl tables if the scheduler registration fails.
Linus Torvalds [Wed, 5 Dec 2007 17:25:53 +0000 (09:25 -0800)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Update defconfig.
[SPARC]: Add missing of_node_put
[SPARC64]: check for possible NULL pointer dereference
[SPARC]: Add missing "space"
[SPARC64]: Add missing "space"
[SPARC64]: Add missing pci_dev_put
[SYSCTL_CHECK]: Fix typo in KERN_SPARC_SCONS_PWROFF entry string.
[SPARC64]: Missing mdesc_release() in ldc_init().
Al Viro [Wed, 5 Dec 2007 08:46:47 +0000 (08:46 +0000)]
remove nonsense force-casts from ocfs2
endianness annotations in networking code had been in place for quite a
while; in particular, sin_port and s_addr are annotated as big-endian.
Code in ocfs2 had __force casts added apparently to shut the sparse
warnings up; of course, these days they only serve to *produce* warnings
for no reason whatsoever...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Wed, 5 Dec 2007 08:32:52 +0000 (08:32 +0000)]
regression: bfs endianness bug
BFS_FILEBLOCKS() expects struct bfs_inode * (on-disk data, with little-
endian fields), not struct bfs_inode_info * (in-core stuff, with host-
endian ones).
It's a macro and fields with the right names are present in
bfs_inode_info, so it compiles, but on big-endian host it gives bogus
results.
Introduced in commit
f433dc56344cb72cc3de5ba0819021cec3aef807 ("Fixes to
the BFS filesystem driver").
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Wed, 5 Dec 2007 08:38:56 +0000 (08:38 +0000)]
fcrypt endianness misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Wed, 5 Dec 2007 08:36:15 +0000 (08:36 +0000)]
no need to mess with KBUILD_CFLAGS on uml-i386 anymore
Now that X86_32 is provided on Kconfig level for uml-i386, there's no
need to play with it explicitly on Makefile level anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Wed, 5 Dec 2007 08:24:38 +0000 (08:24 +0000)]
regression: cifs endianness bug
access_flags_to_mode() gets on-the-wire data (little-endian) and treats
it as host-endian.
Introduced in commit
e01b64001359034d04c695388870936ed3d1b56b ("[CIFS]
enable get mode from ACL when cifsacl mount option specified")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Wed, 5 Dec 2007 07:45:31 +0000 (23:45 -0800)]
VM/Security: add security hook to do_brk
Given a specifically crafted binary do_brk() can be used to get low pages
available in userspace virtual memory and can thus be used to circumvent
the mmap_min_addr low memory protection. Add security checks in do_brk().
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vegard Nossum [Wed, 5 Dec 2007 07:45:30 +0000 (23:45 -0800)]
SLUB's ksize() fails for size > 2048
I can't pass memory allocated by kmalloc() to ksize() if it is allocated by
SLUB allocator and size is larger than (I guess) PAGE_SIZE / 2.
The error of ksize() seems to be that it does not check if the allocation
was made by SLUB or the page allocator.
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Christoph Lameter <clameter@sgi.com>, Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Wed, 5 Dec 2007 07:45:28 +0000 (23:45 -0800)]
proc: fix proc_dir_entry refcounting
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.
This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():
1) PDE leak
remove_proc_entry de_put
----------------- ------
[refcnt = 1]
if (atomic_read(&de->count) == 0)
if (atomic_dec_and_test(&de->count))
if (de->deleted)
/* also not taken! */
free_proc_entry(de);
else
de->deleted = 1;
[refcount=0, deleted=1]
2) use after free
remove_proc_entry de_put
----------------- ------
[refcnt = 1]
if (atomic_dec_and_test(&de->count))
if (atomic_read(&de->count) == 0)
free_proc_entry(de);
/* boom! */
if (de->deleted)
free_proc_entry(de);
BUG: unable to handle kernel paging request at virtual address
6b6b6b6b
printing eip:
c10acdda *pdpt =
00000000338f8001 *pde =
0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-
8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[<
c10acdda>] EFLAGS:
00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX:
6b6b6b6b EBX:
6b6b6b6b ECX:
6b6b6b6b EDX:
fffffffe
ESI:
c128fa3b EDI:
f380bf34 EBP:
ffffffff ESP:
f380be44
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=
f380b000 task=
f38f2570 task.ti=
f380b000)
Stack:
c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
[<
c10ac4f0>] vsnprintf+0x2ad/0x49b
[<
c10ac779>] vscnprintf+0x14/0x1f
[<
c1018e6b>] vprintk+0xc5/0x2f9
[<
c10379f1>] handle_fasteoi_irq+0x0/0xab
[<
c1004f44>] do_IRQ+0x9f/0xb7
[<
c117db3b>] preempt_schedule_irq+0x3f/0x5b
[<
c100264e>] need_resched+0x1f/0x21
[<
c10190ba>] printk+0x1b/0x1f
[<
c107c8ad>] de_put+0x3d/0x50
[<
c107c8f8>] proc_delete_inode+0x38/0x41
[<
c107c8c0>] proc_delete_inode+0x0/0x41
[<
c1066298>] generic_delete_inode+0x5e/0xc6
[<
c1065aa9>] iput+0x60/0x62
[<
c1063c8e>] d_kill+0x2d/0x46
[<
c1063fa9>] dput+0xdc/0xe4
[<
c10571a1>] __fput+0xb0/0xcd
[<
c1054e49>] filp_close+0x48/0x4f
[<
c1055ee9>] sys_close+0x67/0xa5
[<
c10026b6>] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [<
c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:
f380be44
Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen => nobody
can mark PDE deleted.
Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.
Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 5 Dec 2007 07:45:27 +0000 (23:45 -0800)]
jbd: Fix assertion failure in fs/jbd/checkpoint.c
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.
If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).
We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state. The locking there is subtle though (as
everywhere in JBD ;(). We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.
Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either. Better be safe if something
changes in future...
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Wed, 5 Dec 2007 07:45:25 +0000 (23:45 -0800)]
mm: fix XIP file writes
Writing to XIP files at a non-page-aligned offset results in data corruption
because the writes were always sent to the start of the page.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Gardner [Wed, 5 Dec 2007 07:45:24 +0000 (23:45 -0800)]
gpio_cs5535: disable AUX on output
The AMD CS5535/CS5536 GPIO has two alternate output modes: AUX-1 and AUX-2.
When either AUX is enabled, the cs5535_gpio driver cannot control the
output.
Some BIOS code for the Geode processor enables AUX-1 for GPIO-1, which
configures it as the PC BEEP output.
This patch will disable AUX-1 and AUX-2 when the user enables output.
Signed-of-by: Ben Gardner <gardner.ben@gmail.com>
Cc: Richard Knutsson <ricknu-0@student.ltu.se>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Wed, 5 Dec 2007 07:45:24 +0000 (23:45 -0800)]
Avoid potential NULL dereference in unregister_sysctl_table
register_sysctl_table() can return NULL sometimes, e.g. when kmalloc()
returns NULL or when sysctl check fails.
I've also noticed, that many (most?) code in the kernel doesn't check for
the return value from register_sysctl_table() and later simply calls the
unregister_sysctl_table() with potentially NULL argument.
This is unlikely on a common kernel configuration, but in case we're
dealing with modules and/or fault-injection support, there's a slight
possibility of an OOPS.
Changing all the users to check for return code from the registering does
not look like a good solution - there are too many code doing this and
failure in sysctl tables registration is not a good reason to abort module
loading (in most of the cases).
So I think, that we can just have this check in unregister_sysctl_table
just to avoid accidental OOPS-es (actually, the unregister_sysctl_table()
did exactly this, before the start_unregistering() appeared).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:23 +0000 (23:45 -0800)]
Blackfin SPI driver: reconfigure speed_hz and bits_per_word in each spi transfer
- reconfigure SPI baud from speed_hz of each spi transfer
- according to spi_transfer.bits_per_word to reprogram register and setup
correct SPI operation handlers
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:22 +0000 (23:45 -0800)]
Blackfin SPI driver: move hard coded pin_req to board file
Remove some sort of bloaty code, try to get these pin_req arrays built at compile-time
- move this static things to the blackfin board file
- add pin_req array to struct bfin5xx_spi_master
- tested on BF537/BF548 with SPI flash
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:22 +0000 (23:45 -0800)]
Blackfin SPI driver: use void __iomem * for regs_base
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:21 +0000 (23:45 -0800)]
Blackfin SPI driver: use cpu_relax() to replace continue in while busywait
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sonic Zhang [Wed, 5 Dec 2007 07:45:21 +0000 (23:45 -0800)]
spi: spi_bfin: resequence DMA start/stop
Set correct baud for spi mmc and enable SPI only after DMA is started.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:20 +0000 (23:45 -0800)]
spi: spi_bfin: update handling of delay-after-deselect
Move cs_chg_udelay handling (specific to this driver) to cs_deactive(), fixing
a bug when some SPI LCD driver needs delay after cs_deactive.
Fix bug reported by Cameron Barfield <cbarfield@cyberdata.net>
https://blackfin.uclinux.org/gf/project/uclinux-dist/forum/?action=ForumBrowse&forum_id=39&_forum_action=ForumMessageBrowse&thread_id=23630&feedback=Message%20replied.
Cc: Cameron Barfield <cbarfield@cyberdata.net>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:19 +0000 (23:45 -0800)]
spi: spi_bfin: bugfix for 8..16 bit word sizes
Fix bug in u16_cs_chg_reader to read data_len-2 bytes data firstly, then read
out the last 2 bytes data
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:18 +0000 (23:45 -0800)]
spi: spi_bfin: handle multiple spi_masters
Move global SPI regs_base and dma_ch to struct driver_data. Test on BF54x SPI
Flash with 2 spi_master devices enabled.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sonic Zhang [Wed, 5 Dec 2007 07:45:18 +0000 (23:45 -0800)]
spi: spi_bfin: relocate spin/waits
Move spin/waits to more correct locations in bfin SPI driver.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sonic Zhang [Wed, 5 Dec 2007 07:45:17 +0000 (23:45 -0800)]
spi: spi_bfin: change handling of communication parameters
Fix SPI driver to work with SPI flash ST M25P16 on bf548
Currently the SPI driver enables the SPI controller and sets the SPI baud
register for each SPI transfer. But they should never be changed within a SPI
message session, in which several SPI transfers are pumped.
This patch moves SPI setting to the begining of a message session, and
never disables SPI controller until an error occurs.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sonic Zhang [Wed, 5 Dec 2007 07:45:16 +0000 (23:45 -0800)]
spi: spi_bfin, rearrange portmux calls
Move pin muxing to setup and cleanup methods.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sonic Zhang [Wed, 5 Dec 2007 07:45:16 +0000 (23:45 -0800)]
spi: spi_bfin uses portmux for additional busses
Use portmux mechanism to support SPI busses 1 and 2, instead of just the
original bus 0.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:15 +0000 (23:45 -0800)]
spi: spi_bfin uses platform device resources
Update spi driver to support multi-ports by using platform resources; tested
on STAMP537+SPI_MMC, other boards need more testing. Plus other minor
updates.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Wed, 5 Dec 2007 07:45:14 +0000 (23:45 -0800)]
spi: spi_bfin, don't bypass spi framework
Prevent people from setting bits in ctl_reg that the SPI framework already
handles, hopefully we can one day drop ctl_reg completely
Signed-off-by: Mike Frysinger <michael.frysinger@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:14 +0000 (23:45 -0800)]
spi: spi_bfin handles spi_transfer.cs_change
Respect per-transfer cs_change field (protocol tweaking support) by
adding and using cs_active/cs_deactive functions.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:13 +0000 (23:45 -0800)]
spi: spi_bfin cleanups, error handling
Cleanup and error handling
- add error handling in SPI bus driver with selecting clients
- use proper defines to access Blackfin MMRs
- remove useless SSYNCs
- cleaner use of portmux calls
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Hennerich [Wed, 5 Dec 2007 07:45:13 +0000 (23:45 -0800)]
spi: bfin spi uses portmux calls
Use new Blackfin portmux interface, add error handling.
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Wed, 5 Dec 2007 07:45:12 +0000 (23:45 -0800)]
spi: initial BF54x SPI support
Initial BF54x SPI support
- support BF54x SPI0
- clean up some code (whitespace etc)
- will support multiports in the future
- start using portmux calls
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marc Pignat [Wed, 5 Dec 2007 07:45:11 +0000 (23:45 -0800)]
spi: use simplified spi_sync() calling convention
Given the patch which simplifies the spi_sync calling convention, this one
updates the callers of that routine which tried using it according to the
previous specification. (Most didn't.)
Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marc Pignat [Wed, 5 Dec 2007 07:45:10 +0000 (23:45 -0800)]
spi: simplify spi_sync() calling convention
Simplify spi_sync calling convention, eliminating the need to check both
the return value AND the message->status. In consequence, this corrects
misbehaviours of spi_read and spi_write (which only checked the former) and
their callers.
Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Wed, 5 Dec 2007 07:45:10 +0000 (23:45 -0800)]
spi: at25 driver is for EEPROM not FLASH
Add comment to at25 driver that it's for EEPROM chips, not FLASH
chips ... the AT25 series has both types of chip, and sometimes
they're even pin-compatible. The command sets are different, as
is the treatment of erasure. (FLASH needs explicit erasure, but
with EEPROM it's implicit.) Note that all vendors seem to have
this same confusion in their *25* series SPI memory parts.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Wed, 5 Dec 2007 07:45:09 +0000 (23:45 -0800)]
SPI: use mutex not semaphore
Make spi_write_then_read() use a mutex not a binary semaphore.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Wed, 5 Dec 2007 07:45:08 +0000 (23:45 -0800)]
Add EXPORT_SYMBOL(ksize);
mm/slub.c exports ksize(), but mm/slob.c and mm/slab.c don't.
It's used by binfmt_flat, which can be built as a module.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denis Cheng [Wed, 5 Dec 2007 07:45:07 +0000 (23:45 -0800)]
mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init
this call should use the array index j, not i. But with this approach, just
one int i is enough, int j is not needed.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Evgeniy Dushistov [Wed, 5 Dec 2007 07:45:06 +0000 (23:45 -0800)]
ufs: fix nexstep dir block size
This patch fixes regression, introduced since 2.6.16. NextStep variant of
UFS as OpenStep uses directory block size equals to 1024. Without this
change, ufs_check_page fails in many cases.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Dave Bailey <dsbailey@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Kosina [Wed, 5 Dec 2007 07:45:05 +0000 (23:45 -0800)]
RTC: assure proper memory ordering with respect to RTC_DEV_BUSY flag
We must make sure that the RTC_DEV_BUSY flag has proper lock semantics,
i.e. that the RTC_DEV_BUSY stores clearing the flag don't get reordered
before the preceeding stores and loads and vice versa.
Spotted by Nick Piggin.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 5 Dec 2007 07:45:04 +0000 (23:45 -0800)]
fix clone(CLONE_NEWPID)
Currently we are complicating the code in copy_process, the clone ABI, and
if we fix the bugs sys_setsid itself, with an unnecessary open coded
version of sys_setsid.
So just simplify everything and don't special case the session and pgrp of
the initial process in a pid namespace.
Having this special case actually presents to user space the classic linux
startup conditions with session == pgrp == 0 for /sbin/init.
We already handle sending signals to processes in a child pid namespace.
We need to handle sending signals to processes in a parent pid namespace
for cases like SIGCHILD and SIGIO.
This makes nothing extra visible inside a pid namespace. So this extra
special case appears to have no redeeming merits.
Further removing this special case increases the flexibility of how we can
use pid namespaces, by not requiring the initial process in a pid namespace
to be a daemon.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Moyer [Wed, 5 Dec 2007 07:45:02 +0000 (23:45 -0800)]
aio: only account I/O wait time in read_events if there are active requests
On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was
running (but completely idle). The UML code sits in io_getevents waiting for
an event to be submitted and completed.
Fix this by checking ctx->reqs_active before scheduling to determine whether
or not we are waiting for I/O.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Wed, 5 Dec 2007 14:46:09 +0000 (15:46 +0100)]
futex: correctly return -EFAULT not -EINVAL
return -EFAULT not -EINVAL. Found by review.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Oleg Nesterov [Wed, 5 Dec 2007 14:46:09 +0000 (15:46 +0100)]
lockdep: in_range() fix
Torsten Kaiser wrote:
| static inline int in_range(const void *start, const void *addr, const void *end)
| {
| return addr >= start && addr <= end;
| }
| This will return true, if addr is in the range of start (including)
| to end (including).
|
| But debug_check_no_locks_freed() seems does:
| const void *mem_to = mem_from + mem_len
| -> mem_to is the last byte of the freed range, that fits in_range
| lock_from = (void *)hlock->instance;
| -> first byte of the lock
| lock_to = (void *)(hlock->instance + 1);
| -> first byte of the next lock, not last byte of the lock that is being checked!
|
| The test is:
| if (!in_range(mem_from, lock_from, mem_to) &&
| !in_range(mem_from, lock_to, mem_to))
| continue;
| So it tests, if the first byte of the lock is in the range that is freed ->OK
| And if the first byte of the *next* lock is in the range that is freed
| -> Not OK.
We can also simplify in_range checks, we need only 2 comparisons, not 4.
If the lock is not in memory range, it should be either at the left of range
or at the right.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Ingo Molnar [Wed, 5 Dec 2007 14:46:09 +0000 (15:46 +0100)]
lockdep: fix debug_show_all_locks()
fix the oops that can be seen in:
http://bugzilla.kernel.org/attachment.cgi?id=13828&action=view
it is not safe to print the locks of running tasks.
(even with this fix we have a small race - but this is a debug
function after all.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Ingo Molnar [Wed, 5 Dec 2007 14:46:09 +0000 (15:46 +0100)]
sched: style cleanups
style cleanup of various changes that were done recently.
no code changed:
text data bss dec hex filename
23680 2542 28 26250 668a sched.o.before
23680 2542 28 26250 668a sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Wed, 5 Dec 2007 14:46:09 +0000 (15:46 +0100)]
futex: fix for futex_wait signal stack corruption
David Holmes found a bug in the -rt tree with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too. The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.
Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.
Here's the relevant code from kernel/futex.c: (not in order in the file)
[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
struct timespec __user *utime, u32 __user *uaddr2,
u32 val3)
{
struct timespec ts;
ktime_t t, *tp = NULL;
u32 val2 = 0;
int cmd = op & FUTEX_CMD_MASK;
if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
return -EINVAL;
t = timespec_to_ktime(ts);
if (cmd == FUTEX_WAIT)
t = ktime_add(ktime_get(), t);
tp = &t;
}
[...]
return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}
[...]
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
int ret;
int cmd = op & FUTEX_CMD_MASK;
struct rw_semaphore *fshared = NULL;
if (!(op & FUTEX_PRIVATE_FLAG))
fshared = ¤t->mm->mmap_sem;
switch (cmd) {
case FUTEX_WAIT:
ret = futex_wait(uaddr, fshared, val, timeout);
[...]
static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
u32 val, ktime_t *abs_time)
{
[...]
struct restart_block *restart;
restart = ¤t_thread_info()->restart_block;
restart->fn = futex_wait_restart;
restart->arg0 = (unsigned long)uaddr;
restart->arg1 = (unsigned long)val;
restart->arg2 = (unsigned long)abs_time;
restart->arg3 = 0;
if (fshared)
restart->arg3 |= ARG3_SHARED;
return -ERESTART_RESTARTBLOCK;
[...]
static long futex_wait_restart(struct restart_block *restart)
{
u32 __user *uaddr = (u32 __user *)restart->arg0;
u32 val = (u32)restart->arg1;
ktime_t *abs_time = (ktime_t *)restart->arg2;
struct rw_semaphore *fshared = NULL;
restart->fn = do_no_restart_syscall;
if (restart->arg3 & ARG3_SHARED)
fshared = ¤t->mm->mmap_sem;
return (long)futex_wait(uaddr, fshared, val, abs_time);
}
So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK. The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.
This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.
I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.
The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters. This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.
Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for. If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Tue, 4 Dec 2007 08:38:22 +0000 (00:38 -0800)]
[SPARC64]: Update defconfig.
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Tue, 4 Dec 2007 08:33:07 +0000 (00:33 -0800)]
[SPARC]: Add missing of_node_put
There should be an of_node_put when breaking out of a loop that iterates
using for_each_node_by_type.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cyrill Gorcunov [Wed, 21 Nov 2007 01:32:19 +0000 (17:32 -0800)]
[SPARC64]: check for possible NULL pointer dereference
This patch adds checking for possible NULL pointer dereference
if of_find_property() failed.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 20 Nov 2007 07:45:16 +0000 (23:45 -0800)]
[SPARC]: Add missing "space"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 20 Nov 2007 07:43:00 +0000 (23:43 -0800)]
[SPARC64]: Add missing "space"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Tue, 20 Nov 2007 06:50:01 +0000 (22:50 -0800)]
[SPARC64]: Add missing pci_dev_put
There should be a pci_dev_put when breaking out of a loop that iterates
over calls to pci_get_device and similar functions.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 20 Nov 2007 05:35:42 +0000 (21:35 -0800)]
[SYSCTL_CHECK]: Fix typo in KERN_SPARC_SCONS_PWROFF entry string.
Based upon a report by Mikael Pettersson.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Nov 2007 04:17:24 +0000 (20:17 -0800)]
[SPARC64]: Missing mdesc_release() in ldc_init().
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Gallatin [Wed, 5 Dec 2007 10:31:42 +0000 (02:31 -0800)]
[LRO]: fix lro_gen_skb() alignment
Add a field to the lro_mgr struct so that drivers can specify how much
padding is required to align layer 3 headers when a packet is copied
into a freshly allocated skb by inet_lro.c:lro_gen_skb(). Without
padding, skbs generated by LRO will cause alignment warnings on
architectures which require strict alignment (seen on sparc64).
Myri10GE is updated to use this field.
Signed-off-by: Andrew Gallatin <gallatin@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Wed, 5 Dec 2007 10:25:32 +0000 (02:25 -0800)]
[TCP]: NAGLE_PUSH seems to be a wrong way around
The comment in tcp_nagle_test suggests that. This bug is very
very old, even 2.4.0 seems to have it.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Wed, 5 Dec 2007 10:21:35 +0000 (02:21 -0800)]
[TCP]: Move prior_in_flight collect to more robust place
The previous location is after sacktag processing, which affects
counters tcp_packets_in_flight depends on. This may manifest as
wrong behavior if new SACK blocks are present and all is clear
for call to tcp_cong_avoid, which in the case of
tcp_reno_cong_avoid bails out early because it thinks that
TCP is not limited by cwnd.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Wed, 5 Dec 2007 10:20:21 +0000 (02:20 -0800)]
[TCP] FRTO: Use of existing funcs make code more obvious & robust
Though there's little need for everything that tcp_may_send_now
does (actually, even the state had to be adjusted to pass some
checks FRTO does not want to occur), it's more robust to let it
make the decision if sending is allowed. State adjustments
needed:
- Make sure snd_cwnd limit is not hit in there
- Disable nagle (if necessary) through the frto_counter == 2
The result of check for frto_counter in argument to call for
tcp_enter_frto_loss can just be open coded, therefore there
isn't need to store the previous frto_counter past
tcp_may_send_now.
In addition, returns can then be combined.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Wed, 5 Dec 2007 10:18:48 +0000 (02:18 -0800)]
[IRDA]: Move ircomm_tty_line_info() under #ifdef CONFIG_PROC_FS
The function in question is called only from ircomm_tty_read_proc,
which is under this option. Move this helper to the same place.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Wed, 5 Dec 2007 10:18:15 +0000 (02:18 -0800)]
[ROSE]: Trivial compilation CONFIG_INET=n case
The rose_rebuild_header() consists only of some variables in
case INET=n, and gcc will warn us about it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Tue, 4 Dec 2007 08:45:06 +0000 (00:45 -0800)]
[IPVS]: Fix sched registration race when checking for name collision.
The register_ip_vs_scheduler() checks for the scheduler with the
same name under the read-locked __ip_vs_sched_lock, then drops,
takes it for writing and puts the scheduler in list.
This is racy, since we can have a race window between the lock
being re-locked for writing.
The fix is to search the scheduler with the given name right under
the write-locked __ip_vs_sched_lock.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Tue, 4 Dec 2007 08:43:24 +0000 (00:43 -0800)]
[IPVS]: Don't leak sysctl tables if the scheduler registration fails.
In case we load lblc or lblcr module we can leak some sysctl
tables if the call to register_ip_vs_scheduler() fails.
I've looked at the register_ip_vs_scheduler() code and saw, that
the only reason to fail is the name collision, so I think that
with some 3rd party schedulers this becomes a relevant issue. No?
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Paris [Tue, 4 Dec 2007 16:06:55 +0000 (11:06 -0500)]
VM/Security: add security hook to do_brk
Given a specifically crafted binary do_brk() can be used to get low
pages available in userspace virtually memory and can thus be used to
circumvent the mmap_min_addr low memory protection. Add security checks
in do_brk().
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Eric Paris [Mon, 26 Nov 2007 23:47:40 +0000 (18:47 -0500)]
Security: round mmap hint address above mmap_min_addr
If mmap_min_addr is set and a process attempts to mmap (not fixed) with a
non-null hint address less than mmap_min_addr the mapping will fail the
security checks. Since this is just a hint address this patch will round
such a hint address above mmap_min_addr.
gcj was found to try to be very frugal with vm usage and give hint addresses
in the 8k-32k range. Without this patch all such programs failed and with
the patch they happily get a higher address.
This patch is wrappad in CONFIG_SECURITY since mmap_min_addr doesn't exist
without it and there would be no security check possible no matter what. So
we should not bother compiling in this rounding if it is just a waste of
time.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Eric Paris [Mon, 26 Nov 2007 23:47:26 +0000 (18:47 -0500)]
security: protect from stack expantion into low vm addresses
Add security checks to make sure we are not attempting to expand the
stack into memory protected by mmap_min_addr
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Eric Paris [Mon, 26 Nov 2007 23:47:46 +0000 (18:47 -0500)]
Security: allow capable check to permit mmap or low vm space
On a kernel with CONFIG_SECURITY but without an LSM which implements
security_file_mmap it is impossible for an application to mmap addresses
lower than mmap_min_addr. Based on a suggestion from a developer in the
openwall community this patch adds a check for CAP_SYS_RAWIO. It is
assumed that any process with this capability can harm the system a lot
more easily than writing some stuff on the zero page and then trying to
get the kernel to trip over itself. It also means that programs like X
on i686 which use vm86 emulation can work even with mmap_min_addr set.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Stephen Smalley [Mon, 26 Nov 2007 16:12:53 +0000 (11:12 -0500)]
SELinux: detect dead booleans
Instead of using f_op to detect dead booleans, check the inode index
against the number of booleans and check the dentry name against the
boolean name for that index on reads and writes. This prevents
incorrect use of a boolean file opened prior to a policy reload while
allowing valid use of it as long as it still corresponds to the same
boolean in the policy.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>