Patrick McHardy [Tue, 29 Apr 2008 13:44:28 +0000 (21:44 +0800)]
[CRYPTO] authenc: Fix async crypto crash in crypto_authenc_genicv()
crypto_authenc_givencrypt_done uses req->data as struct aead_givcrypt_request,
while it really points to a struct aead_request, causing this crash:
BUG: unable to handle kernel paging request at
6b6b6b6b
IP: [<
dc87517b>] :authenc:crypto_authenc_genicv+0x23/0x109
*pde =
00000000
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
Modules linked in: hifn_795x authenc esp4 aead xfrm4_mode_tunnel sha1_generic hmac crypto_hash]
Pid: 3074, comm: ping Not tainted (2.6.25 #4)
EIP: 0060:[<
dc87517b>] EFLAGS:
00010296 CPU: 0
EIP is at crypto_authenc_genicv+0x23/0x109 [authenc]
EAX:
daa04690 EBX:
daa046e0 ECX:
dab0a100 EDX:
daa046b0
ESI:
6b6b6b6b EDI:
dc872054 EBP:
c033ff60 ESP:
c033ff0c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process ping (pid: 3074, ti=
c033f000 task=
db883a80 task.ti=
dab6c000)
Stack:
00000000 daa046b0 c0215a3e daa04690 dab0a100 00000000 ffffffff db9fd7f0
dba208c0 dbbb1720 00000001 daa04720 00000001 c033ff54 c0119ca9 dc852a75
c033ff60 c033ff60 daa046e0 00000000 00000001 c033ff6c dc87527b 00000001
Call Trace:
[<
c0215a3e>] ? dev_alloc_skb+0x14/0x29
[<
c0119ca9>] ? printk+0x15/0x17
[<
dc87527b>] ? crypto_authenc_givencrypt_done+0x1a/0x27 [authenc]
[<
dc850cca>] ? hifn_process_ready+0x34a/0x352 [hifn_795x]
[<
dc8353c7>] ? rhine_napipoll+0x3f2/0x3fd [via_rhine]
[<
dc851a56>] ? hifn_check_for_completion+0x4d/0xa6 [hifn_795x]
[<
dc851ab9>] ? hifn_tasklet_callback+0xa/0xc [hifn_795x]
[<
c011d046>] ? tasklet_action+0x3f/0x66
[<
c011d230>] ? __do_softirq+0x38/0x7a
[<
c0105a5f>] ? do_softirq+0x3e/0x71
[<
c011d17c>] ? irq_exit+0x2c/0x65
[<
c010e0c0>] ? smp_apic_timer_interrupt+0x5f/0x6a
[<
c01042e4>] ? apic_timer_interrupt+0x28/0x30
[<
dc851640>] ? hifn_handle_req+0x44a/0x50d [hifn_795x]
...
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Theodore Ts'o [Thu, 1 May 2008 01:55:48 +0000 (21:55 -0400)]
Update .gitignore to include include/linux/bounds.h
(which is autogenerated by kbuild)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 1 May 2008 03:13:22 +0000 (20:13 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
ipv6: Compilation fix for compat MCAST_MSFILTER sockopts.
Al Viro [Thu, 1 May 2008 02:52:22 +0000 (03:52 +0100)]
Fix dnotify/close race
We have a race between fcntl() and close() that can lead to
dnotify_struct inserted into inode's list *after* the last descriptor
had been gone from current->files.
Since that's the only point where dnotify_struct gets evicted, we are
screwed - it will stick around indefinitely. Even after struct file in
question is gone and freed. Worse, we can trigger send_sigio() on it at
any later point, which allows to send an arbitrary signal to arbitrary
process if we manage to apply enough memory pressure to get the page
that used to host that struct file and fill it with the right pattern...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 1 May 2008 02:50:03 +0000 (19:50 -0700)]
x86: Mark OPTIMIZE_INLINING broken
So Ingo finally did figure out why UML broke with this option: UML
passes gcc the -fno-unit-at-a-time flag, and apparently that wreaks
havoc with gcc's inlining.
We could turn off -fno-unit-at-a-time for UML for gcc4+ (which is what
x86 does), but there's bad blood about this whole option, and it does
show that the thing is just fragile as heck.
So let tempers cool, and disable the thing, and we can revisit the
decision later.
Cc: Adrian Bunk <bunk@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 1 May 2008 02:31:52 +0000 (19:31 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/x86/linux-2.6-x86-fixes3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes3: (21 commits)
x86: numaq fix
x86: 8K stacks by default
x86: ioremap ram check fix
x86: fix HT cpu booting on 32-bit
x86: optimize inlining off
x86: CONFIG_X86_ELAN fix
x86: Kconfig fix
x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()
x86: use defconfigs from x86/configs/*
toshiba: use ioremap_cached
revert: "x86: ioremap(), extend check to all RAM pages"
x86: don't bother printing compat vdso address
fix: x86: support for new UV apic
x86: fix early-BUG message
x86: iommu_sac_force can become static
x86: add proper header for reboot_force
x86 VISWS: build fix
x86, voyager: fix ioremap_nocache()
hpet: fix
x86: unexport kmap_atomic_to_page
...
Linus Torvalds [Thu, 1 May 2008 00:05:21 +0000 (17:05 -0700)]
Merge git://git./linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
klist: fix coding style errors in klist.h and klist.c
driver core: remove no longer used "struct class_device"
pcmcia: remove pccard_sysfs_interface warnings
devres: support addresses greater than an unsigned long via dev_ioremap
kobject: do not copy vargs, just pass them around
sysfs: sysfs_update_group stub for CONFIG_SYSFS=n
DEBUGFS: Correct location of debugfs API documentation.
driver core: warn about duplicate driver names on the same bus
klist: implement klist_add_{after|before}()
klist: implement KLIST_INIT() and DEFINE_KLIST()
sysfs: Disallow truncation of files in sysfs
Greg Kroah-Hartman [Wed, 30 Apr 2008 23:43:45 +0000 (16:43 -0700)]
klist: fix coding style errors in klist.h and klist.c
Finally clean up the odd spacing in these files.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kay Sievers [Wed, 12 Mar 2008 19:47:35 +0000 (20:47 +0100)]
driver core: remove no longer used "struct class_device"
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Brownell [Mon, 28 Apr 2008 08:03:20 +0000 (01:03 -0700)]
pcmcia: remove pccard_sysfs_interface warnings
Make the PCMCIA core stop using class_interface to hide socket attribute
registration. This removes the associated section mismatch warnings, and
helps get to the point where that mechanism can finally be removed.
Simplify that attribute registration by using an attribute_group.
This is a net shrink in object size.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kumar Gala [Tue, 29 Apr 2008 15:25:48 +0000 (10:25 -0500)]
devres: support addresses greater than an unsigned long via dev_ioremap
Use a resource_size_t instead of unsigned long since some arch's are
capable of having ioremap deal with addresses greater than the size of a
unsigned long.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kay Sievers [Wed, 30 Apr 2008 00:06:29 +0000 (02:06 +0200)]
kobject: do not copy vargs, just pass them around
This prevents a few unneeded copies.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Randy Dunlap [Wed, 30 Apr 2008 16:01:17 +0000 (09:01 -0700)]
sysfs: sysfs_update_group stub for CONFIG_SYSFS=n
scsi_transport_spi uses sysfs_update_group() when CONFIG_SYSFS=n,
so provide a stub for it.
next-
20080423/drivers/scsi/scsi_transport_spi.c:1467: error: implicit declaration of function 'sysfs_update_group'
make[3]: *** [drivers/scsi/scsi_transport_spi.o] Error 1
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Robert P. J. Day [Fri, 25 Apr 2008 12:52:51 +0000 (08:52 -0400)]
DEBUGFS: Correct location of debugfs API documentation.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stas Sergeev [Sat, 26 Apr 2008 15:52:35 +0000 (19:52 +0400)]
driver core: warn about duplicate driver names on the same bus
Currently an attempt to register multiple
drivers with the same name causes the
stack trace with some cryptic error message.
The attached patch adds the necessary check
and the clear error message.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tejun Heo [Tue, 22 Apr 2008 09:58:46 +0000 (18:58 +0900)]
klist: implement klist_add_{after|before}()
Add klist_add_after() and klist_add_before() which puts a new node
after and before an existing node, respectively. This is useful for
callers which need to keep klist ordered. Note that synchronizing
between simultaneous additions for ordering is the caller's
responsibility.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tejun Heo [Fri, 25 Apr 2008 18:16:04 +0000 (03:16 +0900)]
klist: implement KLIST_INIT() and DEFINE_KLIST()
klist is missing static initializers and definition helper. Add them.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ben Hutchings [Mon, 28 Apr 2008 14:59:58 +0000 (15:59 +0100)]
sysfs: Disallow truncation of files in sysfs
sysfs allows attribute files to be truncated, e.g. using ftruncate(), with the
expected effect on their inode. For most attributes, this doesn't change the
"real" size of the file i.e. how much can be read from it. However, the
parameter validation for reading and writing binary attribute files is based
on the inode size and not the size specified in the file's bin_attribute, so it
can be broken by this. For example, if we try using dd to write to such a file:
# pwd
/sys/bus/pci/devices/0000:08:00.0
# ls -l config
-rw-r--r-- 1 root root 4096 Feb 1 17:35 config
# dd if=/dev/zero of=config bs=4 count=1
1+0 records in
1+0 records out
# ls -l config
-rw-r--r-- 1 root root 0 Feb 1 17:50 config
# dd if=/dev/zero of=config bs=4 count=1 seek=128
dd: writing `config': No space left on device
1+0 records in
0+0 records out
Also, after truncation to 0, parameter validation for read and write is
disabled. Most bin_attribute read and write methods also validate the size and
offset, but for some this will allow out-of-range access. This may be a
security issue, though access to such files is often limited to root. In any
case, the validation should remain for safety's sake!)
This was previously reported in Bugzilla as bug 9867.
sysfs should ignore size changes or else refuse them (by returning -EINVAL).
This patch makes it ignore them.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alexey Dobriyan [Thu, 1 May 2008 00:10:02 +0000 (04:10 +0400)]
Fix ACPI vs proc_create_data() mismerge
acpi_device_dir() is NULL until all files are createst, so everyting is
created in straight in /proc/ and creation code warns.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Wed, 30 Apr 2008 21:49:54 +0000 (14:49 -0700)]
ipv6: Compilation fix for compat MCAST_MSFILTER sockopts.
The last hunk from the commit
dae50295 (ipv4/ipv6 compat: Fix SSM
applications on 64bit kernels.) escaped from the compat_ipv6_setsockopt
to the ipv6_getsockopt (I guess due to patch smartness wrt searching
for context) thus breaking 32-bit and 64-bit-without-compat compilation.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ingo Molnar [Wed, 30 Apr 2008 21:05:52 +0000 (23:05 +0200)]
x86: numaq fix
do not override the existing pci-y rule when adding visws or
numaq rules.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 30 Apr 2008 18:45:40 +0000 (20:45 +0200)]
x86: 8K stacks by default
Switch back to 8K stacks as the safer default. Out-of-memory
situations are less problematic than silent and hard to debug
stack corruption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Andres Salomon [Wed, 30 Apr 2008 15:30:24 +0000 (11:30 -0400)]
x86: ioremap ram check fix
bdd3cee2e4b7279457139058615ced6c2b41e7de (x86: ioremap(), extend check
to all RAM pages) breaks OLPC's ioremap call. The ioremap that OLPC uses is:
romsig = ioremap(0xffffffc0, 16);
The commit that breaks it is basically:
- for (pfn = phys_addr >> PAGE_SHIFT; pfn < max_pfn_mapped &&
- (pfn << PAGE_SHIFT) < last_addr; pfn++) {
+ for (pfn = phys_addr >> PAGE_SHIFT;
+ (pfn << PAGE_SHIFT) < last_addr; pfn++) {
+
Previously, the 'pfn < max_pfn_mapped' check would've caused us to not
enter the loop. Removing that check means we loop infinitely. The
reason for that is because pfn is 0xfffff, and last_addr is 0xffffffcf.
The remaining check that is used to exit the loop is not sufficient;
when pfn<<PAGE_SHIFT is 0xfffff000, that is less than 0xffffffcf; when
we increment pfn and it overflows (pfn == 0x100000), pfn<<PAGE_SHIFT
ends up being 0. That, of course, is less than last_addr. In effect,
pfn<<PAGE_SHIFT is never lower than last_addr.
The simple fix for this is to limit the last_addr check to the PAGE_MASK;
a patch is below.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Hugh Dickins [Wed, 30 Apr 2008 15:17:46 +0000 (16:17 +0100)]
x86: fix HT cpu booting on 32-bit
Since recent smpboot 32/64-bit merge, my dual Xeon with HT has been
booting only 2 of its 4 cpus (when running an i386 kernel; but x86_64
is okay). J.A. Magallón reports the same.
native_cpu_up: bad cpu 2
native_cpu_up: bad cpu 3
The mach-default cpu_present_to_apicid() was just returning cpu number
(2, 3) instead of apicid (6, 7): looks like we now need the x86_64 code
even for the i386 case.
Comparing with other versions of cpu_present_to_apicid(), it seems a
good idea to include an NR_CPUS test too, since cpu_present() doesn't
include that; but that wasn't a problem here, and may no problem at all.
Prior to that smpboot merge, my Xeon booted the two HT siblings on one
physical first, then the two siblings on the other physical after - when
i386, but alternated them when x86_64. Since the merge, the x86_64
sequence is unchanged, but the i386 sequence is now like x86_64.
I prefer this consistency, and I prefer the new sequence: booting with
maxcpus=2 then uses the independent physicals without HT sharing.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Apr 2008 08:29:13 +0000 (10:29 +0200)]
x86: optimize inlining off
default to inline optimizing off.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Apr 2008 06:58:27 +0000 (08:58 +0200)]
x86: CONFIG_X86_ELAN fix
move the X86_CPU section out of the !X86_ELAN branch.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Apr 2008 06:48:45 +0000 (08:48 +0200)]
x86: Kconfig fix
Andrew noticed that OPTIMIZE_INLINING appeared in the toplevel
menu - fix it.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Suresh Siddha [Sat, 26 Apr 2008 00:07:22 +0000 (17:07 -0700)]
x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()
Use UC_MINUS for ioremap(), ioremap_nocache() instead of strong UC.
Once all the X drivers move to ioremap_wc(), we can go back to strong
UC semantics for ioremap() and ioremap_nocache().
To avoid attribute aliasing issues, pci_mmap_page_range() will also
use UC_MINUS for default non write-combining mapping request.
Next steps:
a) change all the video drivers using ioremap() or ioremap_nocache()
and adding WC MTTR using mttr_add() to ioremap_wc()
b) for strict usage, we can go back to strong uc semantics
for ioremap() and ioremap_nocache() after some grace period for
completing step-a.
c) user level X server needs to use the appropriate method for setting
up WC mapping (like using resourceX_wc sysfs file instead of
adding MTRR for WC and using /dev/mem or resourceX under /sys)
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sam Ravnborg [Tue, 29 Apr 2008 10:48:15 +0000 (12:48 +0200)]
x86: use defconfigs from x86/configs/*
Daniel Drake <dsd@gentoo.org> reported:
In 2.6.23, if you unpacked a kernel source tarball and then
ran "make menuconfig" you'd be presented with this message:
# using defaults found in arch/i386/defconfig
and the default options would be set.
The same thing in 2.6.24 does not give you any "using defaults" message, and
the default config options within menuconfig are rather blank (e.g. no PCI
support). You can work around this by explicitly running "make defconfig"
before menuconfig, but it would be nice to have the behaviour the way it was
for 2.6.23 (and the way it still is for other archs).
Fixed by adding a x86 specific defconfig list to Kconfig.
Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=10470
Tested-by: dsd@gentoo.org
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Alan Cox [Tue, 29 Apr 2008 13:20:23 +0000 (14:20 +0100)]
toshiba: use ioremap_cached
The switch of ioremap to default to uncached doesn't break this driver
but it does needlessly slow it down as BIOS space is cachable and this
driver is quite happy scanning cached ROM space.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Tue, 29 Apr 2008 10:04:51 +0000 (12:04 +0200)]
revert: "x86: ioremap(), extend check to all RAM pages"
Vegard Nossum reported a large (150 seconds) boot delay during bootup,
and bisected it to "x86: ioremap(), extend check to all RAM pages"
(commit
bdd3cee2e4b). Revert this commit for now.
Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jeremy Fitzhardinge [Mon, 28 Apr 2008 18:05:07 +0000 (11:05 -0700)]
x86: don't bother printing compat vdso address
The kernel prints the compat vdso address regardless of whether compat
vdso mode is enabled or not, which is confusing. Given that this
isn't very interesting information anyway, just remove the printk.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Gerhard Mack <gmack@innerfire.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Andi Kleen [Fri, 25 Apr 2008 09:45:26 +0000 (11:45 +0200)]
fix: x86: support for new UV apic
Don't warn in read_apic_id() when preemptible but only one CPU online.
Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Vegard Nossum [Fri, 25 Apr 2008 19:02:34 +0000 (21:02 +0200)]
x86: fix early-BUG message
The .asciz directive takes any number of strings, but each one is zero-
terminated, and string pasting is not done as in C. That results in only the
first line being output.
Replace .asciz with multiple .ascii directives and terminate with .asciz.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Dmitri Vorobiev [Sun, 27 Apr 2008 23:15:58 +0000 (03:15 +0400)]
x86: iommu_sac_force can become static
The iommu_sac_force variable is needlessly defined global,
and this patch makes it static. Additionally, this variable
needs not be explicitly initialized.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Dmitri Vorobiev [Sun, 27 Apr 2008 23:15:59 +0000 (03:15 +0400)]
x86: add proper header for reboot_force
This patch fixes one sparse warning by including the appropriate
header for the reboot_force symbol.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Mon, 28 Apr 2008 08:46:58 +0000 (10:46 +0200)]
x86 VISWS: build fix
the 'reboot_force' flag is a notion that non-PC subarchitectures do
not have.
also, unify the X86_BIOS_REBOOT option between 32-bit and 64-bit
and get rid of a few unnecessary Kconfig and Makefile complications
that way.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Sun, 27 Apr 2008 21:21:03 +0000 (23:21 +0200)]
x86, voyager: fix ioremap_nocache()
James Bottomley reported that the following commit:
| commit
6371b495991debfd1417b17c2bc4f7d7bae05739
| Author: Ingo Molnar <mingo@elte.hu>
| Date: Wed Jan 30 13:33:40 2008 +0100
|
| x86: change ioremap() to default to uncached
broke Voyager.
James says:
" it broke a class of voyager machines: those which
rely on the quad interrupt controller (QIC). The precis of why they
broke is because the QIC does IPIs (or CPIs in its terminology) via
cache line interference: you interrupt a processor by moving a
designated memory area to write exclusive in the cache (by simply
writing to the line) and the CPU acks the interrupt by moving it back to
read shared (by reading from it). That area, is, of course, mapped by
ioremap, so reversing the ioremap semantics and adding the uncached bit
completely breaks the QIC. "
Sorry about that!
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Sun, 27 Apr 2008 12:04:14 +0000 (14:04 +0200)]
hpet: fix
Al Viro pointed out that there's a missing readl() of timer->hpet_config,
found by Sparse.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Adrian Bunk [Mon, 21 Apr 2008 08:51:44 +0000 (11:51 +0300)]
x86: unexport kmap_atomic_to_page
This patch removes the no longer used export of kmap_atomic_to_page.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Adrian Bunk [Mon, 21 Apr 2008 08:47:46 +0000 (11:47 +0300)]
x86: remove Xgt_desc_struct
The comment says it should have been removed in 2.6.25.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Wed, 30 Apr 2008 18:52:52 +0000 (11:52 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (179 commits)
ACPI: Fix acpi_processor_idle and idle= boot parameters interaction
acpi: fix section mismatch warning in pnpacpi
intel_menlo: fix build warning
ACPI: Cleanup: Remove unneeded, multiple local dummy variables
ACPI: video - fix permissions on some proc entries
ACPI: video - properly handle errors when registering proc elements
ACPI: video - do not store invalid entries in attached_array list
ACPI: re-name acpi_pm_ops to acpi_suspend_ops
ACER_WMI/ASUS_LAPTOP: fix build bug
thinkpad_acpi: fix possible NULL pointer dereference if kstrdup failed
ACPI: check a return value correctly in acpi_power_get_context()
#if 0 acpi/bay.c:eject_removable_drive()
eeepc-laptop: add hwmon fan control
eeepc-laptop: add backlight
eeepc-laptop: add base driver
ACPI: thinkpad-acpi: bump up version to 0.20
ACPI: thinkpad-acpi: fix selects in Kconfig
ACPI: thinkpad-acpi: use a private workqueue
ACPI: thinkpad-acpi: fluff really minor fix
ACPI: thinkpad-acpi: use uppercase for "LED" on user documentation
...
Fixed conflicts in drivers/acpi/video.c and drivers/misc/intel_menlow.c
manually.
Len Brown [Wed, 30 Apr 2008 17:59:05 +0000 (13:59 -0400)]
Merge branch 'pnp' into release
Len Brown [Wed, 30 Apr 2008 17:58:00 +0000 (13:58 -0400)]
Merge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla-9916', 'ec', 'eeepc', 'idle', 'misc', 'pm-legacy', 'sysfs-links-2.6.26', 'thermal', 'thinkpad' and 'video' into release
Venkatesh Pallipadi [Wed, 30 Apr 2008 17:57:15 +0000 (13:57 -0400)]
ACPI: Fix acpi_processor_idle and idle= boot parameters interaction
acpi_processor_idle and "idle=" boot parameter interaction is broken.
The problem is that, at boot time acpi driver is checking for "idle=" boot
option and not registering the acpi idle handler. But, when there is a CST
changed callback (typically when switching AC <-> battery or suspend-resume)
there are no checks for boot_option_idle_override and acpi idle handler tries
to get installed with nasty side effects.
With CPU_IDLE configured this issue causes results in a nasty oops on CST
change callback and without CPU_IDLE there is no oops, but boot option
of "idle=" gets ignored and acpi idle handler gets installed.
Change the behavior to not do anything in acpi idle handler when there is a
"idle=" boot option.
Note that the problem is only there when "idle=" boot option is used.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Sam Ravnborg [Tue, 29 Apr 2008 20:52:01 +0000 (22:52 +0200)]
acpi: fix section mismatch warning in pnpacpi
Fix following section mismatch warning:
WARNING: vmlinux.o(.text+0x153d69): Section mismatch in reference from the function is_exclusive_device() to the variable .init.data:excluded_id_list
is_exclusive_device is only used from __init context so document
this with the __init annotation and get rid of the warning.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Wed, 30 Apr 2008 16:22:27 +0000 (09:22 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
[ALSA] soc - neo1973_wm8753.c add suspend and shutdown hooks for lm4857 chip
[ALSA] soc - neo1973_wm8753.c change maintainer contact info
[ALSA] soc - neo1973_wm8753.c cleanup checkpatch issues
[ALSA] soc - ln2440sbc_alc650 - Fix checkpatch warnings
[ALSA] soc - s3c24xx-pcm - Fix checkpatch warnings
[ALSA] soc - s3c2443-ac97 - Fix checkpatch warnings
[ALSA] soc - wm8753 - Clean up checkpatch warnings
Graeme Gregory [Wed, 30 Apr 2008 18:26:45 +0000 (20:26 +0200)]
[ALSA] soc - neo1973_wm8753.c add suspend and shutdown hooks for lm4857 chip
Patch taken from the openmoko bugtracker
http://bugzilla.openmoko.org/cgi-bin/bugzilla/show_bug.cgi?id=781
This patch adds Suspend/Resume and Shutdown support for the lm4857 to
the driver.
Signed-off-by: Graeme Gregory <graeme@openmoko.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Graeme Gregory [Wed, 30 Apr 2008 18:25:23 +0000 (20:25 +0200)]
[ALSA] soc - neo1973_wm8753.c change maintainer contact info
I have moved workplaces since I originally wrote this driver so update
the contact info for new employers.
Signed-off-by: Graeme Gregory <graeme@openmoko.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Graeme Gregory [Wed, 30 Apr 2008 18:24:54 +0000 (20:24 +0200)]
[ALSA] soc - neo1973_wm8753.c cleanup checkpatch issues
Clean up a few issues with the file that checkpatch noted, no functionality
changes.
Signed-off-by: Graeme Gregory <graeme@openmoko.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Mark Brown [Wed, 30 Apr 2008 15:19:57 +0000 (17:19 +0200)]
[ALSA] soc - ln2440sbc_alc650 - Fix checkpatch warnings
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Mark Brown [Wed, 30 Apr 2008 15:19:32 +0000 (17:19 +0200)]
[ALSA] soc - s3c24xx-pcm - Fix checkpatch warnings
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Mark Brown [Wed, 30 Apr 2008 15:19:07 +0000 (17:19 +0200)]
[ALSA] soc - s3c2443-ac97 - Fix checkpatch warnings
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Mark Brown [Wed, 30 Apr 2008 15:18:43 +0000 (17:18 +0200)]
[ALSA] soc - wm8753 - Clean up checkpatch warnings
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Wed, 30 Apr 2008 15:46:16 +0000 (08:46 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: remove duplicated include
sparc: Add kgdb support.
kgdbts: Sparc needs sstep emulation.
sparc32: Kill smp_message_pass() and related code.
sparc64: Kill PIL_RESERVED, unused.
sparc64: Split entry.S up into seperate files.
Linus Torvalds [Wed, 30 Apr 2008 15:45:48 +0000 (08:45 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
tcp: Overflow bug in Vegas
[IPv4] UFO: prevent generation of chained skb destined to UFO device
iwlwifi: move the selects to the tristate drivers
ipv4: annotate a few functions __init in ipconfig.c
atm: ambassador: vcc_sf semaphore to mutex
MAINTAINERS: The socketcan-core list is subscribers-only.
netfilter: nf_conntrack: padding breaks conntrack hash on ARM
ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()
sch_sfq: use del_timer_sync() in sfq_destroy()
net: Add compat support for getsockopt (MCAST_MSFILTER)
net: Several cleanups for the setsockopt compat support.
ipvs: fix oops in backup for fwmark conn templates
bridge: kernel panic when unloading bridge module
bridge: fix error handling in br_add_if()
netfilter: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets
netfilter: x_tables: fix net namespace leak when reading /proc/net/xxx_tables_names
netfilter: xt_TCPOPTSTRIP: signed tcphoff for ipv6_skip_exthdr() retval
tcp: Limit cwnd growth when deferring for GSO
tcp: Allow send-limited cwnd to grow up to max_burst when gso disabled
[netdrvr] gianfar: Determine TBIPA value dynamically
...
Ingo Molnar [Tue, 29 Apr 2008 22:15:31 +0000 (00:15 +0200)]
inlining: do not allow gcc below version 4 to optimize inlining
fix the condition to match intention: always use the old inlining
behavior on all gcc versions below 4.
this should solve the UML build problem.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
S.Çağlar Onur [Wed, 30 Apr 2008 12:29:02 +0000 (15:29 +0300)]
Update .mailmap
I realize some of the maintainers email clients and/or scripts cannot
handle UTF-8 encoded names properly, as a result your ChangeLogs
displays me as two different person :).
Following patch adds correctly encoded name of mine into .mailmap, to
prevent appearing it not to be so or badly displayed.
Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 30 Apr 2008 15:38:30 +0000 (08:38 -0700)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] Update default configuration.
[S390] use generic sys_ptrace
[S390] Remove self ptrace IEEE_IP hack.
[S390] Convert to SPARSEMEM & SPARSEMEM_VMEMMAP
[S390] System z large page support.
[S390] Convert machine feature detection code to C.
[S390] vmemmap: use clear_table to initialise page tables.
[S390] Move stfl to system.h and delete duplicated version.
[S390] uaccess_mvcos: #ifdef config dependent code.
[S390] cpu topology: Fix possible deadlock.
[S390] Add topology_core_siblings to topology.h
[S390] cio: Make isc handling more robust.
[S390] remove -traditional
[S390] Automatically detect added cpus.
[S390] smp: Fix locking order.
[S390] Add missing ifndef/define to include/asm-s390/sysinfo.h.
[S390] Move show_regs to traps.c.
[S390] cio: Use strict_strtoul() for attributes.
Linus Torvalds [Wed, 30 Apr 2008 15:37:40 +0000 (08:37 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Fix crashkernel= handling when no crashkernel= specified
[POWERPC] Make emergency stack safe for current_thread_info() use
[POWERPC] spufs: add .gitignore for spu_save_dump.h & spu_restore_dump.h
[POWERPC] spufs: trace spu_acquire_saved events
[POWERPC] spufs: fix marker name for find_victim
[POWERPC] spufs: add marker for destroy_spu_context
[POWERPC] spufs: add sputrace marker parameter names
[POWERPC] spufs: add context switch notification log
[POWERPC] mpc5200: defconfigs for CM5200, Lite5200B, Motion-PRO and TQM5200
[POWERPC] mpc5200: Switch mpc5200 dts files to dts-v1 format
[POWERPC] mpc5200: Fix FEC error handling on FIFO errors
[POWERPC] mpc5200: add Phytec pcm030 board support
[POWERPC] mpc5200: add gpiolib support for mpc5200
[POWERPC] mpc5200: add interrupt type function
[POWERPC] mpc5200: Fix unterminated of_device_id table
Ingo Molnar [Wed, 30 Apr 2008 09:50:11 +0000 (11:50 +0200)]
fix drivers/media/common/tuners/ build bug
x86.git randconfig testing found a build failure on latest -git:
drivers/built-in.o: In function `set_type':
tuner-core.c:(.text+0x2a9a26): undefined reference to `tea5761_attach'
tuner-core.c:(.text+0x2a9d05): undefined reference to `tda9887_attach'
tuner-core.c:(.text+0x2a9d51): undefined reference to `xc2028_attach'
tuner-core.c:(.text+0x2a9e22): undefined reference to `tda829x_attach'
tuner-core.c:(.text+0x2a9e3f): undefined reference to `microtune_attach'
drivers/built-in.o: In function `tuner_probe':
tuner-core.c:(.text+0x2aa18a): undefined reference to `tda829x_probe'
tuner-core.c:(.text+0x2aa302): undefined reference to `tea5761_autodetection'
with the following config:
http://redhat.com/~mingo/misc/config-Wed_Apr_30_10_21_40_CEST_2008.bad
the problem is caused by the drivers/media/common/tuners/ subdirectory
not being part of the kbuild hierarchy anymore, due to commit
7c91f0624 ("V4L/DVB(7767): Move tuners to common/tuners").
this seems similar to the problem also reported by Mike Galbraith.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 30 Apr 2008 07:55:17 +0000 (00:55 -0700)]
revert "memory hotplug: allocate usemap on the section with pgdat"
This:
commit
86f6dae1377523689bd8468fed2f2dd180fc0560
Author: Yasunori Goto <y-goto@jp.fujitsu.com>
Date: Mon Apr 28 02:13:33 2008 -0700
memory hotplug: allocate usemap on the section with pgdat
Usemaps are allocated on the section which has pgdat by this.
Because usemap size is very small, many other sections usemaps are allocated
on only one page. If a section has usemap, it can't be removed until removing
other sections. This dependency is not desirable for memory removing.
Pgdat has similar feature. When a section has pgdat area, it must be the last
section for removing on the node. So, if section A has pgdat and section B
has usemap for section A, Both sections can't be removed due to dependency
each other.
To solve this issue, this patch collects usemap on same section with pgdat.
If other sections doesn't have any dependency, this section will be able to be
removed finally.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
broke davem's sparc64 bootup. Revert it while we work out what went wrong.
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Wed, 30 Apr 2008 07:55:16 +0000 (00:55 -0700)]
mm: fix warning on memory offline
KAMEZAWA Hiroyuki found a warning message in the buffer dirtying code that
is coming from page migration caller.
WARNING: at fs/buffer.c:720 __set_page_dirty+0x330/0x360()
Call Trace:
[<
a000000100015220>] show_stack+0x80/0xa0
[<
a000000100015270>] dump_stack+0x30/0x60
[<
a000000100089ed0>] warn_on_slowpath+0x90/0xe0
[<
a0000001001f8b10>] __set_page_dirty+0x330/0x360
[<
a0000001001ffb90>] __set_page_dirty_buffers+0xd0/0x280
[<
a00000010012fec0>] set_page_dirty+0xc0/0x260
[<
a000000100195670>] migrate_page_copy+0x5d0/0x5e0
[<
a000000100197840>] buffer_migrate_page+0x2e0/0x3c0
[<
a000000100195eb0>] migrate_pages+0x770/0xe00
What was happening is that migrate_page_copy wants to transfer the PG_dirty
bit from old page to new page, so what it would do is set_page_dirty(newpage).
However set_page_dirty() is used to set the entire page dirty, wheras in
this case, only part of the page was dirty, and it also was not uptodate.
Marking the whole page dirty with set_page_dirty would lead to corruption or
unresolvable conditions -- a dirty && !uptodate page and dirty && !uptodate
buffers.
Possibly we could just ClearPageDirty(oldpage); SetPageDirty(newpage);
however in the interests of keeping the change minimal...
Signed-off-by: Nick Piggin <npiggin@suse.de>
Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Wed, 30 Apr 2008 07:55:14 +0000 (00:55 -0700)]
Drop the exporting of empty <linux/byteorder/generic.h>
Fix up the contents of <linux/byteorder/> so that it doesn't export a
content-free generic.h to user space. This involves:
* Removing the __KERNEL__ tests from generic.h and dropping it from
Kbuild.
* Wrapping the inclusions of generic.h in both big_endian.h and
little_endian.h in __KERNEL__ tests.
* Shifting big_endian.h and little_endian.h from header-y to
unifdef-y in Kbuild.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Wed, 30 Apr 2008 07:55:13 +0000 (00:55 -0700)]
remove __KERNEL__ tests of unexported headers under asm-generic/
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Wed, 30 Apr 2008 07:55:12 +0000 (00:55 -0700)]
Remove "#ifdef __KERNEL__" checks from unexported headers
Remove the "#ifdef __KERNEL__" tests from unexported header files in
linux/include whose entire contents are wrapped in that preprocessor
test.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Wed, 30 Apr 2008 07:55:10 +0000 (00:55 -0700)]
serial: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Wed, 30 Apr 2008 07:55:10 +0000 (00:55 -0700)]
drivers/char: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Wed, 30 Apr 2008 07:55:09 +0000 (00:55 -0700)]
fs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Wed, 30 Apr 2008 07:55:09 +0000 (00:55 -0700)]
afs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Wed, 30 Apr 2008 07:55:08 +0000 (00:55 -0700)]
lib: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Wed, 30 Apr 2008 07:55:08 +0000 (00:55 -0700)]
kernel: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Wed, 30 Apr 2008 07:55:07 +0000 (00:55 -0700)]
mm: remove remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Laurent Vivier [Wed, 30 Apr 2008 07:55:06 +0000 (00:55 -0700)]
brd: modify ramdisk device to be able to manage partitions
This patch adds partition management for Block RAM Device (BRD).
This patch is done to keep in sync BRD and loop device drivers.
This patch adds a parameter to the module, max_part, to specify
the maximum number of partitions per RAM device.
Example:
# modprobe brd max_part=63
# ls -l /dev/ram*
brw-rw---- 1 root disk 1, 0 2008-04-03 13:39 /dev/ram0
brw-rw---- 1 root disk 1, 64 2008-04-03 13:39 /dev/ram1
brw-rw---- 1 root disk 1, 640 2008-04-03 13:39 /dev/ram10
brw-rw---- 1 root disk 1, 704 2008-04-03 13:39 /dev/ram11
brw-rw---- 1 root disk 1, 768 2008-04-03 13:39 /dev/ram12
brw-rw---- 1 root disk 1, 832 2008-04-03 13:39 /dev/ram13
brw-rw---- 1 root disk 1, 896 2008-04-03 13:39 /dev/ram14
brw-rw---- 1 root disk 1, 960 2008-04-03 13:39 /dev/ram15
brw-rw---- 1 root disk 1, 128 2008-04-03 13:39 /dev/ram2
brw-rw---- 1 root disk 1, 192 2008-04-03 13:39 /dev/ram3
brw-rw---- 1 root disk 1, 256 2008-04-03 13:39 /dev/ram4
brw-rw---- 1 root disk 1, 320 2008-04-03 13:39 /dev/ram5
brw-rw---- 1 root disk 1, 384 2008-04-03 13:39 /dev/ram6
brw-rw---- 1 root disk 1, 448 2008-04-03 13:39 /dev/ram7
brw-rw---- 1 root disk 1, 512 2008-04-03 13:39 /dev/ram8
brw-rw---- 1 root disk 1, 576 2008-04-03 13:39 /dev/ram9
# fdisk /dev/ram0
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): o
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-2, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-2, default 2): 2
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
# ls -l /dev/ram0*
brw-rw---- 1 root disk 1, 0 2008-04-03 13:40 /dev/ram0
brw-rw---- 1 root disk 1, 1 2008-04-03 13:40 /dev/ram0p1
# mkfs /dev/ram0p1
mke2fs 1.40-WIP (14-Nov-2006)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
4016 inodes, 16032 blocks
801 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=
16515072
2 block groups
8192 blocks per group, 8192 fragments per group
2008 inodes per group
Superblock backups stored on blocks:
8193
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 26 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
# mount /dev/ram0p1 /mnt
df /mnt
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/ram0p1 15521 138 14582 1% /mnt
# ls -l /mnt
total 12
drwx------ 2 root root 12288 2008-04-03 13:41 lost+found
# umount /mnt
# rmmod brd
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Wed, 30 Apr 2008 07:55:04 +0000 (00:55 -0700)]
add hrtimer specific debugobjects code
hrtimers have now dynamic users in the network code. Put them under
debugobjects surveillance as well.
Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Wed, 30 Apr 2008 07:55:03 +0000 (00:55 -0700)]
debugobjects: add timer specific object debugging code
Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Wed, 30 Apr 2008 07:55:02 +0000 (00:55 -0700)]
debugobjects: add documentation
Add a DocBook for debugobjects.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Wed, 30 Apr 2008 07:55:01 +0000 (00:55 -0700)]
infrastructure to debug (dynamic) objects
We can see an ever repeating problem pattern with objects of any kind in the
kernel:
1) freeing of active objects
2) reinitialization of active objects
Both problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore. One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.
While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause. This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.
The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.
Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.
The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash. Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.
Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.
The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.
The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list
Each operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation. When the sanity check triggers a warning message
and a stack trace is printed.
The list of operations can be extended if the need arises. For now it's
limited to the requirements of the first user (timers).
The core code enqueues the objects into hash buckets. The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
global lock.
The debug code can be compiled in without being active. The runtime overhead
is minimal and could be optimized by asm alternatives. A kernel command line
option enables the debugging code.
Thanks to Ingo Molnar for review, suggestions and cleanup patches.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Wed, 30 Apr 2008 07:54:59 +0000 (00:54 -0700)]
slab: add a flag to prevent debug_free checks on a kmem_cache
This is a preperatory patch for the debugobjects infrastructure. The flag
prevents debug_free checks on kmem_caches. This is necessary to avoid
resursive calls into a debug mechanism which uses a kmem_cache itself.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Wed, 30 Apr 2008 07:54:57 +0000 (00:54 -0700)]
drivers: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Wed, 30 Apr 2008 07:54:55 +0000 (00:54 -0700)]
Add macros similar to min/max/min_t/max_t
Also, change the variable names used in the min/max macros to avoid shadowed
variable warnings when min/max min_t/max_t are nested.
Small formatting changes to make all the macros have a similar form.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix v4l build]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 30 Apr 2008 07:54:54 +0000 (00:54 -0700)]
alloc_uid: cleanup
Use kmem_cache_zalloc(), remove large amounts of initialisation code and
ifdeffery.
Note: this assumes that memset(*atomic_t, 0) correctly initialises the
atomic_t. This is true for all present archtiectures and if it becomes false
for a future architecture then we'll need to make large changes all over the
place anyway.
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 30 Apr 2008 07:54:54 +0000 (00:54 -0700)]
hfsplus: fix warning with 64k PAGE_SIZE
fs/hfsplus/btree.c: In function 'hfsplus_bmap_alloc':
fs/hfsplus/btree.c:239: warning: comparison is always false due to limited range of data type
But this might hide a real bug?
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 30 Apr 2008 07:54:53 +0000 (00:54 -0700)]
hfs: fix warning with 64k PAGE_SIZE
fs/hfs/btree.c: In function 'hfs_bmap_alloc':
fs/hfs/btree.c:263: warning: comparison is always false due to limited range of data type
The patch makes the warning go away, but the code might actually be buggy?
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Markus Armbruster [Wed, 30 Apr 2008 07:54:52 +0000 (00:54 -0700)]
printk: don't read beyond string arguments' terminating zero
Fix update_console_cmdline() not to to read beyond the terminating zero of its
name argument.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samuel Thibault [Wed, 30 Apr 2008 07:54:51 +0000 (00:54 -0700)]
Basic braille screen reader support
This adds a minimalistic braille screen reader support. This is meant to
be used by blind people e.g. on boot failures or when / cannot be mounted
etc and thus the userland screen readers can not work.
[akpm@linux-foundation.org: fix exports]
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Jiri Kosina <jikos@jikos.cz>
Cc: Dmitry Torokhov <dtor@mail.ru>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Wed, 30 Apr 2008 07:54:49 +0000 (00:54 -0700)]
asm-*/futex.h should include linux/uaccess.h
Lots of asm-*/futex.h call pagefault_enable and pagefault_disable, which
are declared in linux/uaccess.h, without including linux/uaccess.h.
They all include asm/uaccess.h, so this patch replaces asm/uaccess.h
with linux/uaccess.h.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Wed, 30 Apr 2008 07:54:49 +0000 (00:54 -0700)]
sysv: [bl]e*_add_cpu conversion
replace all:
big/little_endian_variable = cpu_to_[bl]eX([bl]eX_to_cpu(big/little_endian_variable) +
expression_in_cpu_byteorder);
with:
[bl]eX_add_cpu(&big/little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Wed, 30 Apr 2008 07:54:48 +0000 (00:54 -0700)]
quota: le*_add_cpu conversion
replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
expression_in_cpu_byteorder);
with:
leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Wed, 30 Apr 2008 07:54:47 +0000 (00:54 -0700)]
hfs/hfsplus: be*_add_cpu conversion
replace all:
big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
expression_in_cpu_byteorder);
with:
beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz [Wed, 30 Apr 2008 07:54:47 +0000 (00:54 -0700)]
affs: be*_add_cpu conversion
replace all:
big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
expression_in_cpu_byteorder);
with:
beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Wed, 30 Apr 2008 07:54:46 +0000 (00:54 -0700)]
reiserfs: use open_bdev_excl
Use the proper helper to open a blockdevice by name for filesystem use,
this makes sure it's properly claimed (also added for open-by-number) and
gets rid of the struct file abuse.
Tested by mounting a reiserfs filesystem with external journal.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Wed, 30 Apr 2008 07:54:45 +0000 (00:54 -0700)]
fuse: fix sparse warnings
fs/fuse/dev.c:306:2: warning: context imbalance in 'wait_answer_interruptible' - unexpected unlock
fs/fuse/dev.c:361:2: warning: context imbalance in 'request_wait_answer' - unexpected unlock
fs/fuse/dev.c:1002:4: warning: context imbalance in 'end_io_requests' - unexpected unlock
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Wed, 30 Apr 2008 07:54:45 +0000 (00:54 -0700)]
fuse: fix race in llseek
Fuse doesn't use i_mutex to protect setting i_size, and so
generic_file_llseek() can be racy: it doesn't use i_size_read().
So do a fuse specific llseek method, which does use i_size_read().
[akpm@linux-foundation.org: make `retval' loff_t]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Wed, 30 Apr 2008 07:54:44 +0000 (00:54 -0700)]
fuse: fix node ID type
Node ID is 64bit but it is passed as unsigned long to some functions. This
breakage wasn't noticed, because libfuse uses unsigned long too.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Wed, 30 Apr 2008 07:54:44 +0000 (00:54 -0700)]
fuse: fix max i/o size calculation
Fix a bug that Werner Baumann reported: fuse can send a bigger write request
than the maximum specified. This only affected direct_io operation.
In addition set a sane minimum for the max_read and max_write tunables, so I/O
always makes some progress.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Wed, 30 Apr 2008 07:54:43 +0000 (00:54 -0700)]
fuse: update file size on short read
If the READ request returned a short count, then either
- cached size is incorrect
- filesystem is buggy, as short reads are only allowed on EOF
So assume that the size is wrong and refresh it, so that cached read() doesn't
zero fill the missing chunk.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Wed, 30 Apr 2008 07:54:42 +0000 (00:54 -0700)]
fuse: implement perform_write
Introduce fuse_perform_write. With fusexmp (a passthrough filesystem), large
(1MB) writes into a backing tmpfs filesystem are sped up by almost 4 times
(256MB/s vs 71MB/s).
[mszeredi@suse.cz]:
- split into smaller functions
- testing
- duplicate generic_file_aio_write(), so that there's no need to add a
new ->perform_write() a_op. Comment from hch.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Wed, 30 Apr 2008 07:54:41 +0000 (00:54 -0700)]
fuse: clean up setting i_size in write
Extract common code for setting i_size in write functions into a common
helper.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Wed, 30 Apr 2008 07:54:41 +0000 (00:54 -0700)]
fuse: support writable mmap
Quoting Linus (3 years ago, FUSE inclusion discussions):
"User-space filesystems are hard to get right. I'd claim that they
are almost impossible, unless you limit them somehow (shared
writable mappings are the nastiest part - if you don't have those,
you can reasonably limit your problems by limiting the number of
dirty pages you accept through normal "write()" calls)."
Instead of attempting the impossible, I've just waited for the dirty page
accounting infrastructure to materialize (thanks to Peter Zijlstra and
others). This nicely solved the biggest problem: limiting the number of pages
used for write caching.
Some small details remained, however, which this largish patch attempts to
address. It provides a page writeback implementation for fuse, which is
completely safe against VM related deadlocks. Performance may not be very
good for certain usage patterns, but generally it should be acceptable.
It has been tested extensively with fsx-linux and bash-shared-mapping.
Fuse page writeback design
--------------------------
fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
It copies the contents of the original page, and queues a WRITE request to the
userspace filesystem using this temp page.
The writeback is finished instantly from the MM's point of view: the page is
removed from the radix trees, and the PageDirty and PageWriteback flags are
cleared.
For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
incremented. The per-bdi writeback count is not decremented until the actual
write completes.
On dirtying the page, fuse waits for a previous write to finish before
proceeding. This makes sure, there can only be one temporary page used at a
time for one cached page.
This approach is wasteful in both memory and CPU bandwidth, so why is this
complication needed?
The basic problem is that there can be no guarantee about the time in which
the userspace filesystem will complete a write. It may be buggy or even
malicious, and fail to complete WRITE requests. We don't want unrelated parts
of the system to grind to a halt in such cases.
Also a filesystem may need additional resources (particularly memory) to
complete a WRITE request. There's a great danger of a deadlock if that
allocation may wait for the writepage to finish.
Currently there are several cases where the kernel can block on page
writeback:
- allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
- page migration
- throttle_vm_writeout (through NR_WRITEBACK)
- sync(2)
Of course in some cases (fsync, msync) we explicitly want to allow blocking.
So for these cases new code has to be added to fuse, since the VM is not
tracking writeback pages for us any more.
As an extra safetly measure, the maximum dirty ratio allocated to a single
fuse filesystem is set to 1% by default. This way one (or several) buggy or
malicious fuse filesystems cannot slow down the rest of the system by hogging
dirty memory.
With appropriate privileges, this limit can be raised through
'/sys/class/bdi/<bdi>/max_ratio'.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>