GitHub/exynos8895/android_kernel_samsung_universal8895.git
15 years agox86: apic: Remove duplicated macros
Yinghai Lu [Mon, 27 Apr 2009 06:38:08 +0000 (23:38 -0700)]
x86: apic: Remove duplicated macros

XAPIC_DEST_* is dupliicated to the one in apicdef.h

[ Impact: cleanup ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49F552D0.5050505@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86: x2apic, IR: remove reinit_intr_remapped_IO_APIC()
Suresh Siddha [Mon, 20 Apr 2009 20:02:30 +0000 (13:02 -0700)]
x86: x2apic, IR: remove reinit_intr_remapped_IO_APIC()

When interrupt-remapping is enabled, we are relying on
setup_IO_APIC_irqs() to configure remapped entries in the
IO-APIC, which comes little bit later after enabling
interrupt-remapping.

Meanwhile, restoration of old io-apic entries after enabling
interrupt-remapping will not make the interrupts through
io-apic functional anyway.

So remove the unnecessary reinit_intr_remapped_IO_APIC() step.

The longer story:

When interrupt-remapping is enabled, IO-APIC entries need to be
setup in the re-mappable format (pointing to
interrupt-remapping table entries setup by the OS). This
remapping configuration is happening in the same place where we
traditionally configure IO-APIC (i.e., in
setup_IO_APIC_irqs()).

So when we enable interrupt-remapping successfully, there is no
need to restore old io-apic RTE entries before we actually do a
complete configuration shortly in setup_IO_APIC_irqs(). Old
IO-APIC RTE's may be in traditional format (non re-mappable) or
in re-mappable format pointing to interrupt-remapping table
entries setup by BIOS. Restoring both of these will not make
IO-APIC functional. We have to rely on setup_IO_APIC_irqs() for
proper configuration by OS.

So I am removing this unnecessary and broken step.

[ Impact: remove unnecessary/broken IO-APIC setup step ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Weidong Han <weidong.han@intel.com>
Cc: dwmw2@infradead.org
LKML-Reference: <20090420200450.552359000@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86: x2apic, IR: Make config X86_UV dependent on X86_X2APIC
Suresh Siddha [Mon, 20 Apr 2009 20:02:31 +0000 (13:02 -0700)]
x86: x2apic, IR: Make config X86_UV dependent on X86_X2APIC

Instead of selecting X86_X2APIC, make config X86_UV dependent
on X86_X2APIC.

This will eliminate enabling CONFIG_X86_X2APIC with out
enabling CONFIG_INTR_REMAP.

[ Impact: cleanup ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Cc: dwmw2@infradead.org
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Weidong Han <weidong.han@intel.com>
LKML-Reference: <20090420200450.694598000@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86: x2apic, IR: Clean up panic() with nox2apic boot option
Suresh Siddha [Mon, 20 Apr 2009 20:02:29 +0000 (13:02 -0700)]
x86: x2apic, IR: Clean up panic() with nox2apic boot option

Instead of panic() ignore the "nox2apic" boot option when BIOS
has already enabled x2apic prior to OS handover.

[ Impact: printk warning instead of panic() when BIOS has enabled x2apic already ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: dwmw2@infradead.org
Cc: Weidong Han <weidong.han@intel.com>
LKML-Reference: <20090420200450.425091000@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86: x2apic, IR: Move eoi_ioapic_irq() into a CONFIG_INTR_REMAP section
Suresh Siddha [Mon, 20 Apr 2009 20:02:28 +0000 (13:02 -0700)]
x86: x2apic, IR: Move eoi_ioapic_irq() into a CONFIG_INTR_REMAP section

Address the following complier warning:

   arch/x86/kernel/apic/io_apic.c:2543: warning: `eoi_ioapic_irq' defined but not used

By moving that function (and eoi_ioapic_irq()) into an existing
#ifdef CONFIG_INTR_REMAP section of the code.

[ Impact: cleanup ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: dwmw2@infradead.org
Cc: Weidong Han <weidong.han@intel.com>
LKML-Reference: <20090420200450.271099000@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Weidong Han <weidong.han@intel.com>
15 years agox86: x2apic, IR: Clean up X86_X2APIC and INTR_REMAP config checks
Suresh Siddha [Mon, 20 Apr 2009 20:02:27 +0000 (13:02 -0700)]
x86: x2apic, IR: Clean up X86_X2APIC and INTR_REMAP config checks

Add x2apic_supported() to clean up CONFIG_X86_X2APIC checks.

Fix CONFIG_INTR_REMAP checks.

[ Impact: cleanup ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: dwmw2@infradead.org
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Weidong Han <weidong.han@intel.com>
LKML-Reference: <20090420200450.128993000@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86: es7000, uv - use __cpuinit for kicking secondary cpus
Cyrill Gorcunov [Sun, 19 Apr 2009 07:43:11 +0000 (11:43 +0400)]
x86: es7000, uv - use __cpuinit for kicking secondary cpus

The caller already has __cpuinit attribute.

[ Impact: save memory, address section mismatch warning ]

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
LKML-Reference: <20090419074311.GA8670@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86: smpboot - wakeup_secondary should be done via __cpuinit section
Cyrill Gorcunov [Sat, 18 Apr 2009 19:45:28 +0000 (23:45 +0400)]
x86: smpboot - wakeup_secondary should be done via __cpuinit section

A caller (do_boot_cpu) already has __cpuinit attribute.

Since HOTPLUG_CPU depends on SMP && HOTPLUG it doesn't
lead to panic at moment.

[ Impact: cleanup ]

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090418194528.GD25510@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86, intr-remap: fix x2apic/intr-remap resume
Weidong Han [Fri, 17 Apr 2009 08:42:16 +0000 (16:42 +0800)]
x86, intr-remap: fix x2apic/intr-remap resume

Interrupt remapping was decoupled from x2apic. Shouldn't check
x2apic before resume interrupt remapping. Otherwise, interrupt
remapping won't be resumed when x2apic is not enabled.

[ Impact: fix potential intr-remap resume hang on !x2apic ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: allen.m.kay@intel.com
Cc: fenghua.yu@intel.com
LKML-Reference: <1239957736-6161-6-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86, intr-remap: add option to disable interrupt remapping
Weidong Han [Fri, 17 Apr 2009 08:42:15 +0000 (16:42 +0800)]
x86, intr-remap: add option to disable interrupt remapping

Add option "nointremap" to disable interrupt remapping.

[ Impact: add new boot option ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: allen.m.kay@intel.com
Cc: fenghua.yu@intel.com
LKML-Reference: <1239957736-6161-5-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86, intr-remap: enable interrupt remapping early
Weidong Han [Fri, 17 Apr 2009 08:42:14 +0000 (16:42 +0800)]
x86, intr-remap: enable interrupt remapping early

Currently, when x2apic is not enabled, interrupt remapping
will be enabled in init_dmars(), where it is too late to remap
ioapic interrupts, that is, ioapic interrupts are really in
compatibility mode, not remappable mode.

This patch always enables interrupt remapping before ioapic
setup, it guarantees all interrupts will be remapped when
interrupt remapping is enabled. Thus it doesn't need to set
the compatibility interrupt bit.

[ Impact: refactor intr-remap init sequence, enable fuller remap mode ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: allen.m.kay@intel.com
Cc: fenghua.yu@intel.com
LKML-Reference: <1239957736-6161-4-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agox86, intr-remap: fix ack for interrupt remapping
Weidong Han [Fri, 17 Apr 2009 08:42:13 +0000 (16:42 +0800)]
x86, intr-remap: fix ack for interrupt remapping

Shouldn't call ack_apic_edge() in ir_ack_apic_edge(), because
ack_apic_edge() does more than just ack: it also does irq migration
in the non-interrupt-remapping case. But there is no such need for
interrupt-remapping case, as irq migration is done in the process
context.

Similarly, ir_ack_apic_level() shouldn't call ack_apic_level, and
instead should do the local cpu's EOI + directed EOI to the io-apic.

ack_x2APIC_irq() is not neccessary, because ack_APIC_irq() will use MSR
write for x2apic, and uncached write for non-x2apic.

[ Impact: simplify/standardize intr-remap IRQ acking, fix on !x2apic ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: allen.m.kay@intel.com
Cc: fenghua.yu@intel.com
LKML-Reference: <1239957736-6161-3-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agodocs, x86: add nox2apic back to kernel-parameters.txt
Weidong Han [Fri, 17 Apr 2009 08:42:12 +0000 (16:42 +0800)]
docs, x86: add nox2apic back to kernel-parameters.txt

"nox2apic" was removed from kernel-parameters.txt by mistake, when
entries were sorted in alpha order (commit 0cb55ad2). But this early
parameter is still there, add it back to kernel-parameters.txt.

[ Impact: add boot parameter description ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: iommu@lists.linux-foundation.org
Cc: allen.m.kay@intel.com
Cc: fenghua.yu@intel.com
LKML-Reference: <1239957736-6161-2-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoMerge branch 'linus' into x86/apic
Ingo Molnar [Fri, 17 Apr 2009 14:18:22 +0000 (16:18 +0200)]
Merge branch 'linus' into x86/apic

Merge reason: new intr-remap patches depend on the s2ram iommu fixes from upstream

Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoMerge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 17 Apr 2009 01:17:22 +0000 (18:17 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Fix branch tracer header
  tracing: Fix power tracer header

15 years agoMerge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 17 Apr 2009 01:16:29 +0000 (18:16 -0700)]
Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Avoid printing sched_group::__cpu_power for default case
  tracing, sched: mark get_parent_ip() notrace

15 years agoMerge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 17 Apr 2009 00:56:39 +0000 (17:56 -0700)]
Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kernel/softirq.c: fix sparse warning
  rcu: Make hierarchical RCU less IPI-happy

15 years agokernel/softirq.c: fix sparse warning
H Hartley Sweeten [Thu, 16 Apr 2009 23:30:18 +0000 (19:30 -0400)]
kernel/softirq.c: fix sparse warning

Fix sparse warning in kernel/softirq.c.

  warning: do-while statement is not a compound statement

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
LKML-Reference: <BD79186B4FD85F4B8E60E381CAEE1909015F9033@mi8nycmail19.Mi8.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoMerge branch 'x86/uv' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux...
Linus Torvalds [Thu, 16 Apr 2009 23:43:20 +0000 (16:43 -0700)]
Merge branch 'x86/uv' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86/uv' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: UV BAU distribution and payload MMRs
  x86: UV: BAU partition-relative distribution map
  x86, uv: add Kconfig dependency on NUMA for UV systems
  x86: prevent /sys/firmware/sgi_uv from being created on non-uv systems
  x86, UV: Fix for nodes with memory and no cpus
  x86, UV: system table in bios accessed after unmap
  x86: UV BAU messaging timeouts
  x86: UV BAU and nodes with no memory

15 years agosched: Avoid printing sched_group::__cpu_power for default case
Gautham R Shenoy [Tue, 14 Apr 2009 03:39:36 +0000 (09:09 +0530)]
sched: Avoid printing sched_group::__cpu_power for default case

Commit 46e0bb9c12f4 ("sched: Print sched_group::__cpu_power
in sched_domain_debug") produces a messy dmesg output while
attempting to print the sched_group::__cpu_power for each
group in the sched_domain hierarchy.

Fix this by avoid printing the __cpu_power for default cases.
(i.e, __cpu_power == SCHED_LOAD_SCALE).

[ Impact: reduce syslog clutter ]

Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Fixed-by: Tony Luck <tony.luck@intel.com>
Cc: a.p.zijlstra@chello.nl
LKML-Reference: <20090414033936.GA534@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Thu, 16 Apr 2009 21:42:04 +0000 (14:42 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  ata: Report 16/32bit PIO as best we can
  libata: use ATA_ID_CFA_*
  pata_legacy: fix no device fail path
  pata_hpt37x: fix HPT370 DMA timeouts
  libata: handle SEMB signature better

15 years agomm: pass correct mm when growing stack
Hugh Dickins [Thu, 16 Apr 2009 20:58:12 +0000 (21:58 +0100)]
mm: pass correct mm when growing stack

Tetsuo Handa reports seeing the WARN_ON(current->mm == NULL) in
security_vm_enough_memory(), when do_execve() is touching the
target mm's stack, to set up its args and environment.

Yes, a UMH_NO_WAIT or UMH_WAIT_PROC call_usermodehelper() spawns
an mm-less kernel thread to do the exec.  And in any case, that
vm_enough_memory check when growing stack ought to be done on the
target mm, not on the execer's mm (though apart from the warning,
it only makes a slight tweak to OVERCOMMIT_NEVER behaviour).

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoRevert "kobject: don't block for each kobject_uevent".
Hugh Dickins [Thu, 16 Apr 2009 20:55:29 +0000 (21:55 +0100)]
Revert "kobject: don't block for each kobject_uevent".

This reverts commit f520360d93cdc37de5d972dac4bf3bdef6a7f6a7.

Tetsuo Handa, running a kernel with CONFIG_DEBUG_PAGEALLOC=y and
CONFIG_UEVENT_HELPER_PATH=/sbin/hotplug, has been hitting RCU detected
CPU stalls: it's been spinning in the loop where do_execve() counts up
the args (but why wasn't fixup_exception working? dunno).

The recent change, switching kobject_uevent_env() from UMH_WAIT_EXEC
to UMH_NO_WAIT, is broken: the exec uses args on the local stack here,
and an env which is kfreed as soon as call_usermodehelper() returns.
It very much needs to wait for the exec to be done.

An alternative would be to keep the UMH_NO_WAIT, and complicate the code
to allocate and free these resources correctly? but no, as GregKH
pointed out when making the commit, CONFIG_UEVENT_HELPER_PATH="" is a
much better optimization - though some distros are still saying
/sbin/hotplug in their .config, yet with no such binary in their initrd
or their root.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Will Newton <will.newton@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoata: Report 16/32bit PIO as best we can
Alan Cox [Thu, 9 Apr 2009 16:31:17 +0000 (17:31 +0100)]
ata: Report 16/32bit PIO as best we can

The legacy old IDE ioctl API for this is a bit primitive so we try
and map stuff sensibly onto it.

- Set PIO over DMA devices to report 32bit
- Add ability to change the PIO32 settings if the controller permits it
- Add that functionality into the sff drivers
- Add that functionality into the VLB legacy driver
- Turn on the 32bit PIO on the ninja32 and add support there

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agolibata: use ATA_ID_CFA_*
Sergei Shtylyov [Mon, 13 Apr 2009 16:50:00 +0000 (20:50 +0400)]
libata: use ATA_ID_CFA_*

Use ATA_ID_CFA_* constants for CFA specific identify data words 162 and 163.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agopata_legacy: fix no device fail path
Tejun Heo [Tue, 14 Apr 2009 03:59:03 +0000 (12:59 +0900)]
pata_legacy: fix no device fail path

When pata_legacy can't detect any device, it unregisters the
platform_device and fails detection.  However, it forgets to detach
ata host triggering weird failures as the host later gets freed by
devres while still attached.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agopata_hpt37x: fix HPT370 DMA timeouts
Sergei Shtylyov [Tue, 14 Apr 2009 14:39:14 +0000 (18:39 +0400)]
pata_hpt37x: fix HPT370 DMA timeouts

The libata driver has copied the code from the IDE driver which caused a post
2.4.18 regression on many HPT370[A] chips -- DMA stopped to work completely,
only causing timeouts.  Now remove hpt370_bmdma_start() for good...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agolibata: handle SEMB signature better
Tejun Heo [Tue, 14 Apr 2009 21:21:10 +0000 (06:21 +0900)]
libata: handle SEMB signature better

WDC WD1600JS-62MHB5 successfully hits the window between ATA/ATAPI-7
and Serial ATA II standards and reports 3c/c3 signature which now is
assigned to SEMB.  Make ata_dev_classify() report ATA_DEV_SEMB on the
sig and let ata_dev_read_id() work around it by trying IDENTIFY once.

This fixes bko#11579.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Haun <drhaun88@gmail.com>
Reported-by: Lars Wirzenius <liw@liw.fi>
Reported-by: Juan Manuel <jmcarranza@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agox86: UV BAU distribution and payload MMRs
Cliff Wickman [Thu, 16 Apr 2009 12:53:09 +0000 (07:53 -0500)]
x86: UV BAU distribution and payload MMRs

This patch correctly sets BAU memory mapped registers to point
to the sending activation descriptor table and target payload table.

The "Broadcast Assist Unit" is used for TLB shootdown in UV.

The memory mapped registers that point to sending and receiving
memory structures contain node numbers.

In one case the __pa() function did not provide the node id of
memory on blade zero in configurations where that id is nonzero.
In another case, it was assumed that memory was allocated on
the local node.  That assumption is not true in a configuration
in which the node has no memory.

Tested on the UV hardware simulator.

[ Impact: fix possible runtime crash due to incorrect TLB logic ]

Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1LuR5Z-0007An-B8@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoAdd block_write_full_page_endio for passing endio handler
Chris Mason [Wed, 15 Apr 2009 17:22:38 +0000 (13:22 -0400)]
Add block_write_full_page_endio for passing endio handler

block_write_full_page doesn't allow the caller to control what happens
when the IO is over.  This adds a new call named block_write_full_page_endio
so the buffer head end_io handler can be provided by the caller.

This will be used by the ext3 data=guarded mode to do i_size updates in
a workqueue based end_io handler.  end_buffer_async_write is also
exported so it can be called to do the dirty work of managing page
writeback for the higher level end_io handler.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Acked-by: Theodore Tso <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoExport filemap_write_and_wait_range
Chris Mason [Wed, 15 Apr 2009 17:22:37 +0000 (13:22 -0400)]
Export filemap_write_and_wait_range

This wasn't exported before and is useful (used by the experimental ext3
data=guarded code)

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Acked-by: Theodore Tso <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Thu, 16 Apr 2009 14:41:56 +0000 (07:41 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (64 commits)
  phylib: Fix delay argument of schedule_delayed_work
  NET/ixgbe: Fix powering off during shutdown
  NET/e1000e: Fix powering off during shutdown
  NET/e1000: Fix powering off during shutdown
  packet: avoid warnings when high-order page allocation fails
  gianfar: stop send queue before resetting gianfar
  myr10ge: again fix lro_gen_skb() alignment
  declance: convert to net_device_ops
  bfin_mac: convert to net_device_ops
  au1000: convert to net_device_ops
  atarilance: convert to net_device_ops
  a2065: convert to net_device_ops
  ixgbe: update real_num_tx_queues on changing num_rx_queues
  ixgbe: fix tx queue index
  Revert "rose: zero length frame filtering in af_rose.c"
  sfc: Use correct macro to set event bitfield
  sfc: Match calls to netif_napi_add() and netif_napi_del()
  bonding: Remove debug printk
  e1000/e1000: fix compile warning
  ehea: Fix incomplete conversion to net_device_ops
  ...

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
Linus Torvalds [Thu, 16 Apr 2009 14:40:48 +0000 (07:40 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc: remove some pointless conditionals before kfree()
  sbus: changed ioctls to unlocked
  sparc: asm/atomic.h on 32bit should include asm/system.h for xchg
  sparc64: Fix smp_callin() locking.

15 years agophylib: Fix delay argument of schedule_delayed_work
Atsushi Nemoto [Thu, 16 Apr 2009 09:43:37 +0000 (02:43 -0700)]
phylib: Fix delay argument of schedule_delayed_work

The commit a390d1f3 ("phylib: convert state_queue work to
delayed_work") missed converting 'expires' value to 'delay' value.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Acked-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoNET/ixgbe: Fix powering off during shutdown
Rafael J. Wysocki [Wed, 15 Apr 2009 17:44:01 +0000 (17:44 +0000)]
NET/ixgbe: Fix powering off during shutdown

Prevent ixgbe from putting the adapter into D3 during shutdown except when
we're going to power off the system, since doing that may generally cause
problems with kexec to happen (such problems were observed for igb and
forcedeth).  For this purpose seperate ixgbe_shutdown() from ixgbe_suspend()
and use the appropriate PCI PM callbacks in both of them.

Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoNET/e1000e: Fix powering off during shutdown
Rafael J. Wysocki [Wed, 15 Apr 2009 17:43:43 +0000 (17:43 +0000)]
NET/e1000e: Fix powering off during shutdown

Prevent e1000e from putting the adapter into D3 during shutdown except when
we're going to power off the system, since doing that may generally cause
problems with kexec to happen (such problems were observed for igb and
forcedeth).  For this purpose seperate e1000e_shutdown() from e1000e_suspend()
and use the appropriate PCI PM callbacks in both of them.

Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoNET/e1000: Fix powering off during shutdown
Rafael J. Wysocki [Wed, 15 Apr 2009 17:43:24 +0000 (17:43 +0000)]
NET/e1000: Fix powering off during shutdown

Prevent e1000 from putting the adapter into D3 during shutdown except when
we're going to power off the system, since doing that may generally cause
problems with kexec to happen (such problems were observed for igb and
forcedeth).  For this purpose seperate e1000_shutdown() from e1000_suspend()
and use the appropriate PCI PM callbacks in both of them.

Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoRCU: Don't try and predeclare inline funcs as it upsets some versions of gcc
David Howells [Wed, 15 Apr 2009 18:35:01 +0000 (19:35 +0100)]
RCU: Don't try and predeclare inline funcs as it upsets some versions of gcc

Don't try and predeclare inline funcs like this:

static inline void wait_migrated_callbacks(void)
...
static void _rcu_barrier(enum rcu_barrier type)
{
...
wait_migrated_callbacks();
}
...
static inline void wait_migrated_callbacks(void)
{
wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
}

as it upsets some versions of gcc under some circumstances:

kernel/rcupdate.c: In function `_rcu_barrier':
kernel/rcupdate.c:125: sorry, unimplemented: inlining failed in call to 'wait_migrated_callbacks': function body not available
kernel/rcupdate.c:152: sorry, unimplemented: called from here

This can be dealt with by simply putting the static variables (rcu_migrate_*)
at the top, and moving the implementation of the function up so that it
replaces its forward declaration.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoThe default CONFIG_BUG=n version of BUG() should have an empty do...while
David Howells [Wed, 15 Apr 2009 18:34:56 +0000 (19:34 +0100)]
The default CONFIG_BUG=n version of BUG() should have an empty do...while

The default CONFIG_BUG=n version of BUG() should incorporate an empty a
do...while statement to avoid compilation weirdness.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMN10300: Stop gcc from generating uninitialised variable warnings after BUG()
David Howells [Wed, 15 Apr 2009 18:34:51 +0000 (19:34 +0100)]
MN10300: Stop gcc from generating uninitialised variable warnings after BUG()

Stop gcc from generating uninitialised variable warnings after BUG().  The
problem is that MN10300's implementation of BUG() invokes system call 15 which
doesn't return - but there's no way to tell the compiler that and also emit the
bug table element with the correct file and line data.

So instead, we make the do...while wrapper in _debug_bug_trap() an endless loop
from which there's no escape.

Also, while we're at it, (1) get rid of _debug_bug_trap() and just implement
directly as BUG(), and (2) make the implementation of BUG() contingent on
CONFIG_BUG=y.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMN10300: Wire up missing system calls
David Howells [Wed, 15 Apr 2009 18:34:46 +0000 (19:34 +0100)]
MN10300: Wire up missing system calls

Wire up missing system calls preadv() and pwritev().

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMN10300: Discard duplicate PFN_xxx() macros
David Howells [Wed, 15 Apr 2009 18:34:41 +0000 (19:34 +0100)]
MN10300: Discard duplicate PFN_xxx() macros

Discard duplicate PFN_xxx() macros from arch code as they're now in the
general headers.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
Linus Torvalds [Wed, 15 Apr 2009 20:28:27 +0000 (13:28 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6

* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] boot cputime accounting
  [S390] add read_persistent_clock
  [S390] cpu hotplug and accounting values
  [S390] fix idle time accounting
  [S390] smp: fix cpu_possible_map initialization
  [S390] dasd: fix idaw boundary checking for track based ccw
  [S390] dasd: Use the new async framework for autoonlining.
  [S390] qdio: remove dead timeout handler
  [S390] appldata: Use new mod_virt_timer_periodic() function.
  [S390] extend virtual timer interface by mod_virt_timer_periodic
  [S390] stp synchronization retry timer
  [S390] call nmi_enter/nmi_exit on machine checks
  [S390] wire up preadv/pwritev system calls
  [S390] s390: move machine flags to lowcore

15 years agox86: use used_vectors in init_IRQ()
Yinghai Lu [Wed, 15 Apr 2009 18:57:01 +0000 (11:57 -0700)]
x86: use used_vectors in init_IRQ()

Impact: fix crash with many devices

I found this crash:

[  552.616646] general protection fault: 0403 [#1] SMP
[  552.620013] last sysfs file:
/sys/devices/pci0000:00/0000:00:02.0/usb1/1-1/1-1:1.0/host13/target13:0:0/13:0:0:0/block/sr0/size
[  552.620013] CPU 0
[  552.620013] Modules linked in:
[  552.620013] Pid: 0, comm: swapper Not tainted 2.6.30-rc1-tip-01931-g8fcafd8-dirty #28 Sun Fire X4440
[  552.620013] RIP: 0010:[<ffffffff8023bada>]  [<ffffffff8023bada>] default_idle+0x7d/0xda
[  552.620013] RSP: 0018:ffffffff81345e68  EFLAGS: 00010246
[  552.620013] RAX: 0000000000000000 RBX: ffffffff8133d870 RCX: ffffc20000000000
[  552.620013] RDX: 00000000001d0620 RSI: ffffffff8023bad8 RDI: ffffffff802a3169
[  552.620013] RBP: ffffffff81345e98 R08: 0000000000000000 R09: ffffffff812244a0
[  552.620013] R10: ffffffff81345dc8 R11: 7ebe1b6fa0bcac50 R12: 4ec4ec4ec4ec4ec5
[  552.620013] R13: ffffffff813a54d0 R14: ffffffff813a7a40 R15: 0000000000000000
[  552.620013] FS:  00000000006d1880(0000) GS:ffffc20000000000(0000) knlGS:0000000000000000
[  552.620013] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  552.620013] CR2: 00007fec9d936a50 CR3: 000000007d1a9000 CR4: 00000000000006e0
[  552.620013] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  552.620013] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  552.620013] Process swapper (pid: 0, threadinfo ffffffff81344000,task ffffffff812244a0)
[  552.620013] Stack:
[  552.620013]  0000000000000000 ffffc20000000000 00000000001d0620 7ebe1b6fa0bcac50
[  552.620013]  ffffffff8133d870 4ec4ec4ec4ec4ec5 ffffffff81345ec8 ffffffff8023bd84
[  552.620013]  4ec4ec4ec4ec4ec5 ffffffff813a54d0 7ebe1b6fa0bcac50 ffffffff8133d870
[  552.620013] Call Trace:
[  552.620013]  [<ffffffff8023bd84>] c1e_idle+0x109/0x124
[  552.620013]  [<ffffffff8023314b>] cpu_idle+0xb8/0x101
[  552.620013]  [<ffffffff80c16d6a>] rest_init+0x7e/0x94
[  552.620013]  [<ffffffff81357efc>] start_kernel+0x3dc/0x3fd
[  552.620013]  [<ffffffff813572a9>] x86_64_start_reservations+0xb9/0xd4
[  552.620013]  [<ffffffff813573b2>] x86_64_start_kernel+0xee/0x109
[  552.620013] Code: 48 8b 04 25 f8 b4 00 00 83 a0 3c e0 ff ff fb 0f ae f0 65 48 8b 04 25 f8 b4 00 00 f6 80 38 e0 ff ff 08 75 09 e8 71 76 06 00 fb f4 <eb> 06 e8 68 76 06 00 fb 65 48 8b 04 25 f8 b4 00 00 83 88 3c e0
[  552.620013] RIP  [<ffffffff8023bada>] default_idle+0x7d/0xda
[  552.620013]  RSP <ffffffff81345e68>
[  552.828646] ---[ end trace 4cbfc5c01382af7f ]---

Joerg Roedel said
"The 0403 error code means that there was an external interrupt with vector
0x80. Yinghai, my theory is that the kernel on this machine has no 32bit
emulation compiled in, right? In this case the selector points to a zero entry
which may cause the #gpf right after the hlt.
But I have no idea where the external int 0x80 comes from"

it turns out that we could use 0x80 for external device on 64-bit
when 32-bit emulation is disabled.

But we forgot to set the gate for it.

try to set gate for it by checking used_vectors.

Also move apic_intr_init() early to avoid setting
that gate two times.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <49E62DFD.6010904@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Wed, 15 Apr 2009 16:11:11 +0000 (09:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix the cmd cache keys for amp verbs
  ALSA: add missing definitions(letters) to HD-Audio.txt
  ALSA: hda - Add quirk mask for Fujitsu Amilo laptops with ALC883
  [ALSA] intel8x0: add one retry to the ac97_clock measurement routine
  [ALSA] intel8x0: fix wrong conditions in ac97_clock measure routine
  ALSA: hda - Avoid call of snd_jack_report at release
  ALSA: add private_data to struct snd_jack
  ALSA: snd-usb-caiaq: rename files to remove redundant information in file pathes
  ALSA: snd-usb-caiaq: clean up header includes
  ALSA: sound/pci: use memdup_user()
  ALSA: sound/usb: use memdup_user()
  ALSA: sound/isa: use memdup_user()
  ALSA: sound/core: use memdup_user()
  [ALSA] intel8x0: do not use zero value from PICB register
  [ALSA] intel8x0: an attempt to make ac97_clock measurement more reliable
  [ALSA] pcm-midlevel: Add more strict buffer position checks based on jiffies
  [ALSA] hda_intel: fix unexpected ring buffer positions
  ASoC: Disable S3C64xx support in Kconfig
  ASoC: magician: remove un-necessary #include of pxa-regs.h and hardware.h

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
Linus Torvalds [Wed, 15 Apr 2009 16:04:12 +0000 (09:04 -0700)]
Merge git://git./linux/kernel/git/steve/gfs2-2.6-fixes

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: Use DEFINE_SPINLOCK
  GFS2: cleanup file_operations mess
  GFS2: Move umount flush rwsem
  GFS2: Fix symlink creation race
  GFS2: Make quotad's waiting interruptible

15 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Wed, 15 Apr 2009 16:03:47 +0000 (09:03 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (28 commits)
  cfq-iosched: add close cooperator code
  cfq-iosched: log responsible 'cfqq' in idle timer arm
  cfq-iosched: tweak kick logic a bit more
  cfq-iosched: no need to save interrupts in cfq_kick_queue()
  brd: fix cacheflushing
  brd: support barriers
  swap: Remove code handling bio_alloc failure with __GFP_WAIT
  gfs2: Remove code handling bio_alloc failure with __GFP_WAIT
  ext4: Remove code handling bio_alloc failure with __GFP_WAIT
  dio: Remove code handling bio_alloc failure with __GFP_WAIT
  block: Remove code handling bio_alloc failure with __GFP_WAIT
  bio: add documentation to bio_alloc()
  splice: add helpers for locking pipe inode
  splice: remove generic_file_splice_write_nolock()
  ocfs2: fix i_mutex locking in ocfs2_splice_to_file()
  splice: fix i_mutex locking in generic_splice_write()
  splice: remove i_mutex locking in splice_from_pipe()
  splice: split up __splice_from_pipe()
  block: fix SG_IO to return a proper error value
  cfq-iosched: don't delay queue kick for a merged request
  ...

15 years agoMerge branch 'topic/hda' into for-linus
Takashi Iwai [Wed, 15 Apr 2009 15:52:32 +0000 (17:52 +0200)]
Merge branch 'topic/hda' into for-linus

* topic/hda:
  ALSA: hda - Fix the cmd cache keys for amp verbs
  ALSA: add missing definitions(letters) to HD-Audio.txt

15 years agoALSA: hda - Fix the cmd cache keys for amp verbs
Takashi Iwai [Wed, 15 Apr 2009 15:48:35 +0000 (17:48 +0200)]
ALSA: hda - Fix the cmd cache keys for amp verbs

Fix the key value generation for get/set amp verbs.  The upper bits of
the parameter have to be combined with the verb value to be unique for
each direction/index of amp access.

This fixes the resume problem on some hardwares like Macbook after
the channel mode is changed.

Tested-by: Johannes Berg <johannes@sipsolutions.net>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
15 years agoMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
Linus Torvalds [Wed, 15 Apr 2009 15:42:40 +0000 (08:42 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/paulus/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc: pseries/dtl.c should include asm/firmware.h
  powerpc: Fix data-corrupting bug in __futex_atomic_op
  powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()
  powerpc: Allow 256kB pages with SHMEM
  powerpc: Document new FSL I2C bindings and cleanup
  powerpc/mm: Fix compile warning
  powerpc/85xx: TQM8548: update defconfig
  powerpc/85xx: TQM8548: use proper phy-handles for enet2 and enet3
  powerpc/85xx: TQM85xx: correct address of LM75 I2C device nodes
  powerpc: Add support for early tlbilx opcode
  powerpc: Fix tlbilx opcode

15 years agoacpi-cpufreq: fix 'smp_call_function_many()' confusion
Linus Torvalds [Wed, 15 Apr 2009 15:05:13 +0000 (08:05 -0700)]
acpi-cpufreq: fix 'smp_call_function_many()' confusion

It turns out that 'smp_call_function_many()' doesn't work at all like
'smp_call_function_single()', and my change to Andrew's patch to use it
rather than a loop over all CPU's acpi-cpufreq doesn't work.

My bad.

'smp_call_function_many()' has two "features" (aka "documented bugs"):

 (a) it needs to be called with preemption disabled, because it uses
     smp_processor_id() without guarding the CPU lookup with 'get_cpu()'
     and 'put_cpu()' like the 'single' variant does.

 (b) even if the current CPU is part of the CPU mask, it won't do the
     call on that CPU.

Still, we're better off trying to use 'smp_call_function_many()' than
looping over CPU's, since it at least in theory allows us to use a
broadcast IPI and do it all in parallel.  So let's just work around the
silly semantic bugs in that function.

Reported-and-tested-by: Ali Gholami Rudi <ali@rudi.ir>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agopacket: avoid warnings when high-order page allocation fails
Eric Dumazet [Wed, 15 Apr 2009 10:39:52 +0000 (03:39 -0700)]
packet: avoid warnings when high-order page allocation fails

Latest tcpdump/libpcap triggers annoying messages because of high order page
allocation failures (when lowmem exhausted or fragmented)

These allocation errors are correctly handled so could be silent.

[22660.208901] tcpdump: page allocation failure. order:5, mode:0xc0d0
[22660.208921] Pid: 13866, comm: tcpdump Not tainted 2.6.30-rc2 #170
[22660.208936] Call Trace:
[22660.208950]  [<c04e2b46>] ? printk+0x18/0x1a
[22660.208965]  [<c02760f7>] __alloc_pages_internal+0x357/0x460
[22660.208980]  [<c0276251>] __get_free_pages+0x21/0x40
[22660.208995]  [<c04cc835>] packet_set_ring+0x105/0x3d0
[22660.209009]  [<c04ccd1d>] packet_setsockopt+0x21d/0x4d0
[22660.209025]  [<c0270400>] ? filemap_fault+0x0/0x450
[22660.209040]  [<c0449e34>] sys_setsockopt+0x54/0xa0
[22660.209053]  [<c044b97f>] sys_socketcall+0xef/0x270
[22660.209067]  [<c0202e34>] sysenter_do_call+0x12/0x26

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agocfq-iosched: add close cooperator code
Jens Axboe [Wed, 15 Apr 2009 10:15:11 +0000 (12:15 +0200)]
cfq-iosched: add close cooperator code

If we have processes that are working in close proximity to each
other on disk, we don't want to idle wait. Instead allow the close
process to issue a request, getting better aggregate bandwidth.
The anticipatory scheduler has similar checks, noop and deadline do
not need it since they don't care about process <-> io mappings.

The code for CFQ is a little more involved though, since we split
request queues into per-process contexts.

This fixes a performance problem with eg dump(8), since it uses
several processes in some silly attempt to speed IO up. Even if
dump(8) isn't really a valid case (it should be fixed by using
CLONE_IO), there are other cases where we see close processes
and where idling ends up hurting performance.

Credit goes to Jeff Moyer <jmoyer@redhat.com> for writing the
initial implementation.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agocfq-iosched: log responsible 'cfqq' in idle timer arm
Jens Axboe [Wed, 15 Apr 2009 10:14:13 +0000 (12:14 +0200)]
cfq-iosched: log responsible 'cfqq' in idle timer arm

Makes it easier to read the traces.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agocfq-iosched: tweak kick logic a bit more
Jens Axboe [Wed, 15 Apr 2009 10:12:46 +0000 (12:12 +0200)]
cfq-iosched: tweak kick logic a bit more

We only kick the dispatch for an idling queue, if we think it's a
(somewhat) fully merged request. Also allow a kick if we have other
busy queues in the system, since we don't want to risk waiting for
a potential merge in that case. It's better to get some work done and
proceed.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agocfq-iosched: no need to save interrupts in cfq_kick_queue()
Jens Axboe [Wed, 15 Apr 2009 10:11:10 +0000 (12:11 +0200)]
cfq-iosched: no need to save interrupts in cfq_kick_queue()

It's called from the workqueue handlers from process context, so
we always have irqs enabled when entered.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agobrd: fix cacheflushing
Nick Piggin [Wed, 15 Apr 2009 08:32:07 +0000 (10:32 +0200)]
brd: fix cacheflushing

brd is missing a flush_dcache_page. On 2nd thoughts, perhaps it is the
pagecache's responsibility to flush user virtual aliases (the driver of
course should flush kernel virtual mappings)... but anyway, there
already exists cache flushing for one direction of transfer, so we
should add the other.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agobrd: support barriers
Nick Piggin [Wed, 15 Apr 2009 08:27:07 +0000 (10:27 +0200)]
brd: support barriers

brd is always ordered (not that it matters, as it is defined not to
survive when the system goes down). So tell the block layer it is
ordered, which might be of help with testing filesystems.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoswap: Remove code handling bio_alloc failure with __GFP_WAIT
Nikanth Karthikesan [Wed, 15 Apr 2009 05:07:04 +0000 (10:37 +0530)]
swap: Remove code handling bio_alloc failure with __GFP_WAIT

Remove code handling bio_alloc failure with __GFP_WAIT.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agogfs2: Remove code handling bio_alloc failure with __GFP_WAIT
Nikanth Karthikesan [Wed, 15 Apr 2009 05:06:35 +0000 (10:36 +0530)]
gfs2: Remove code handling bio_alloc failure with __GFP_WAIT

Remove code handling bio_alloc failure with __GFP_WAIT.
GFP_NOFS implies __GFP_WAIT.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoext4: Remove code handling bio_alloc failure with __GFP_WAIT
Nikanth Karthikesan [Wed, 15 Apr 2009 05:06:16 +0000 (10:36 +0530)]
ext4: Remove code handling bio_alloc failure with __GFP_WAIT

Remove code handling bio_alloc failure with __GFP_WAIT.
GFP_NOIO implies __GFP_WAIT.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agodio: Remove code handling bio_alloc failure with __GFP_WAIT
Nikanth Karthikesan [Wed, 15 Apr 2009 05:05:52 +0000 (10:35 +0530)]
dio: Remove code handling bio_alloc failure with __GFP_WAIT

Remove code handling bio_alloc failure with __GFP_WAIT.
GFP_KERNEL implies __GFP_WAIT.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoblock: Remove code handling bio_alloc failure with __GFP_WAIT
Nikanth Karthikesan [Wed, 15 Apr 2009 05:05:31 +0000 (10:35 +0530)]
block: Remove code handling bio_alloc failure with __GFP_WAIT

Remove code handling bio_alloc failure with __GFP_WAIT.
GFP_KERNEL implies __GFP_WAIT.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agobio: add documentation to bio_alloc()
Jens Axboe [Wed, 15 Apr 2009 07:00:07 +0000 (09:00 +0200)]
bio: add documentation to bio_alloc()

Explain that with __GFP_WAIT set it will not fail, and that the caller
must never allocate more than 1 bio at the time.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agosplice: add helpers for locking pipe inode
Miklos Szeredi [Tue, 14 Apr 2009 17:48:41 +0000 (19:48 +0200)]
splice: add helpers for locking pipe inode

There are lots of sequences like this, especially in splice code:

if (pipe->inode)
mutex_lock(&pipe->inode->i_mutex);
/* do something */
if (pipe->inode)
mutex_unlock(&pipe->inode->i_mutex);

so introduce helpers which do the conditional locking and unlocking.
Also replace the inode_double_lock() call with a pipe_double_lock()
helper to avoid spreading the use of this functionality beyond the
pipe code.

This patch is just a cleanup, and should cause no behavioral changes.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agosplice: remove generic_file_splice_write_nolock()
Miklos Szeredi [Tue, 14 Apr 2009 17:48:40 +0000 (19:48 +0200)]
splice: remove generic_file_splice_write_nolock()

Remove the now unused generic_file_splice_write_nolock() function.
It's conceptually broken anyway, because splice may need to wait for
pipe events so holding locks across the whole operation is wrong.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoocfs2: fix i_mutex locking in ocfs2_splice_to_file()
Miklos Szeredi [Tue, 14 Apr 2009 17:48:39 +0000 (19:48 +0200)]
ocfs2: fix i_mutex locking in ocfs2_splice_to_file()

Rearrange locking of i_mutex on destination and call to
ocfs2_rw_lock() so locks are only held while buffers are copied with
the pipe_to_file() actor, and not while waiting for more data on the
pipe.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agosplice: fix i_mutex locking in generic_splice_write()
Miklos Szeredi [Tue, 14 Apr 2009 17:48:38 +0000 (19:48 +0200)]
splice: fix i_mutex locking in generic_splice_write()

Rearrange locking of i_mutex on destination so it's only held while
buffers are copied with the pipe_to_file() actor, and not while
waiting for more data on the pipe.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agosplice: remove i_mutex locking in splice_from_pipe()
Miklos Szeredi [Tue, 14 Apr 2009 17:48:37 +0000 (19:48 +0200)]
splice: remove i_mutex locking in splice_from_pipe()

splice_from_pipe() is only called from two places:

  - generic_splice_sendpage()
  - splice_write_null()

Neither of these require i_mutex to be taken on the destination inode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agosplice: split up __splice_from_pipe()
Miklos Szeredi [Tue, 14 Apr 2009 17:48:36 +0000 (19:48 +0200)]
splice: split up __splice_from_pipe()

Split up __splice_from_pipe() into four helper functions:

  splice_from_pipe_begin()
  splice_from_pipe_next()
  splice_from_pipe_feed()
  splice_from_pipe_end()

splice_from_pipe_next() will wait (if necessary) for more buffers to
be added to the pipe.  splice_from_pipe_feed() will feed the buffers
to the supplied actor and return when there's no more data available
(or if all of the requested data has been copied).

This is necessary so that implementations can do locking around the
non-waiting splice_from_pipe_feed().

This patch should not cause any change in behavior.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoblock: fix SG_IO to return a proper error value
FUJITA Tomonori [Mon, 13 Apr 2009 18:03:10 +0000 (20:03 +0200)]
block: fix SG_IO to return a proper error value

blk_rq_unmap_user() returns -EFAULT if a program passes an invalid
address to kernel. SG_IO path needs to pass the returned value to user
space instead of ignoring it.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agosparc: remove some pointless conditionals before kfree()
Wei Yongjun [Wed, 15 Apr 2009 10:04:56 +0000 (03:04 -0700)]
sparc: remove some pointless conditionals before kfree()

Remove some pointless conditionals before kfree().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoALSA: add missing definitions(letters) to HD-Audio.txt
Justin Mattock [Tue, 14 Apr 2009 21:31:21 +0000 (14:31 -0700)]
ALSA: add missing definitions(letters) to HD-Audio.txt

impact: Add missing definitions(letters).

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
15 years agogianfar: stop send queue before resetting gianfar
Markus Brunner [Wed, 15 Apr 2009 09:35:40 +0000 (02:35 -0700)]
gianfar: stop send queue before resetting gianfar

After a transmit timed out, the reset task will be called, which will free the
allocated resources(stop_gfar). If gfar_poll will be called before the
resources get allocated again gfar_clean_tx_ring will call
dev_kfree_skb_any(NULL).

Example crash:

ops: Kernel access of bad area, sig: 11 [#1]
PREEMPT RSBBA100
Modules linked in:
NIP: c01a10c4 LR: c013b254 CTR: c013c038
REGS: c02e7d20 TRAP: 0300   Not tainted  (2.6.27.20)
MSR: 00001032 <ME,IR,DR>  CR: 24000082  XER: 20000000
DAR: 000000a0, DSISR: 20000000
TASK = c02ce578[0] 'swapper' THREAD: c02e6000
GPR00: 000000a0 c02e7dd0 c02ce578 00000000 00000040 00000001 c02ec1c0
00001032
GPR08: c080d1e0 df9ea800 00000000 00000000 24000082 ffffffff 0404f000
00000000
GPR16: ffffffbf ffffffff ffffffff ffdff7ff ffffffff c02d0fd4 00100100
00200200
GPR24: c031220c 00000001 00000001 00000000 00000000 df849800 ff109000
df849b80
NIP [c01a10c4] dev_kfree_skb_irq+0x18/0x70
LR [c013b254] gfar_clean_tx_ring+0x70/0x11c
Call Trace:
[c02e7dd0] [c003e978] update_wall_time+0x730/0x744 (unreliable)
[c02e7df0] [c013b254] gfar_clean_tx_ring+0x70/0x11c
[c02e7e10] [c013c07c] gfar_poll+0x44/0x150
[c02e7e30] [c01a064c] net_rx_action+0xa8/0x19c
[c02e7e70] [c00251d4] __do_softirq+0x64/0xc0
[c02e7e90] [c0006384] do_softirq+0x40/0x58
[c02e7ea0] [c00250a8] irq_exit+0x40/0x9c
[c02e7eb0] [c000642c] do_IRQ+0x90/0xac
[c02e7ec0] [c0010ab4] ret_from_except+0x0/0x14
--- Exception: 501 at cpu_idle+0x9c/0xf8
    LR = cpu_idle+0x9c/0xf8
[c02e7f80] [c0009820] cpu_idle+0x58/0xf8 (unreliable)
[c02e7fa0] [c01fb8c8] __got2_end+0x7c/0x90
[c02e7fc0] [c026c794] start_kernel+0x2c0/0x2d4
[c02e7ff0] [00003438] 0x3438
Instruction dump:
7fa00124 80010024 bba10014 38210020 7c0803a6 4e800020 9421ffe0 7c0802a6
7c6b1b78 90010024 380300a0 bfa10014 <7d2000283129ffff 7d20012d 40a2fff4
Kernel panic - not syncing: Fatal exception in interrupt

This Patch calls netif_stop_queue before calling stop_gfar.

Signed-off-by: Markus Brunner <super.firetwister@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agomyr10ge: again fix lro_gen_skb() alignment
Stanislaw Gruszka [Wed, 15 Apr 2009 09:26:49 +0000 (02:26 -0700)]
myr10ge: again fix lro_gen_skb() alignment

Add LRO alignment initially committed in
621544eb8c3beaa859c75850f816dd9b056a00a3 ("[LRO]: fix lro_gen_skb()
alignment") and removed in 0dcffac1a329be69bab0ac604bf7283737108e68
("myri10ge: add multislices support") during conversion to
multi-slice.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15 years agoMerge branch 'topic/memdup_user' into for-linus
Takashi Iwai [Wed, 15 Apr 2009 09:24:27 +0000 (11:24 +0200)]
Merge branch 'topic/memdup_user' into for-linus

* topic/memdup_user:
  ALSA: sound/pci: use memdup_user()
  ALSA: sound/usb: use memdup_user()
  ALSA: sound/isa: use memdup_user()
  ALSA: sound/core: use memdup_user()

15 years agoMerge branch 'topic/usb-caiaq' into for-linus
Takashi Iwai [Wed, 15 Apr 2009 09:24:22 +0000 (11:24 +0200)]
Merge branch 'topic/usb-caiaq' into for-linus

* topic/usb-caiaq:
  ALSA: snd-usb-caiaq: rename files to remove redundant information in file pathes
  ALSA: snd-usb-caiaq: clean up header includes

15 years agoMerge branch 'topic/asoc' into for-linus
Takashi Iwai [Wed, 15 Apr 2009 09:24:14 +0000 (11:24 +0200)]
Merge branch 'topic/asoc' into for-linus

* topic/asoc:
  ASoC: Disable S3C64xx support in Kconfig
  ASoC: magician: remove un-necessary #include of pxa-regs.h and hardware.h

15 years agoMerge branch 'topic/hda' into for-linus
Takashi Iwai [Wed, 15 Apr 2009 09:24:09 +0000 (11:24 +0200)]
Merge branch 'topic/hda' into for-linus

* topic/hda:
  ALSA: hda - Add quirk mask for Fujitsu Amilo laptops with ALC883
  ALSA: hda - Avoid call of snd_jack_report at release
  ALSA: add private_data to struct snd_jack

15 years agoMerge branch 'topic/jack-free-fix' into topic/hda
Takashi Iwai [Wed, 15 Apr 2009 09:23:44 +0000 (11:23 +0200)]
Merge branch 'topic/jack-free-fix' into topic/hda

* topic/jack-free-fix:
  ALSA: hda - Avoid call of snd_jack_report at release
  ALSA: add private_data to struct snd_jack

15 years agoMerge branch 'master' of git://git.alsa-project.org/alsa-kernel into for-linus
Takashi Iwai [Wed, 15 Apr 2009 09:21:13 +0000 (11:21 +0200)]
Merge branch 'master' of git://git.alsa-project.org/alsa-kernel into for-linus

* 'master' of git://git.alsa-project.org/alsa-kernel:
  [ALSA] intel8x0: add one retry to the ac97_clock measurement routine
  [ALSA] intel8x0: fix wrong conditions in ac97_clock measure routine
  [ALSA] intel8x0: do not use zero value from PICB register
  [ALSA] intel8x0: an attempt to make ac97_clock measurement more reliable
  [ALSA] pcm-midlevel: Add more strict buffer position checks based on jiffies
  [ALSA] hda_intel: fix unexpected ring buffer positions

15 years agoGFS2: Use DEFINE_SPINLOCK
Xu Gang [Tue, 14 Apr 2009 06:54:14 +0000 (14:54 +0800)]
GFS2: Use DEFINE_SPINLOCK

SPIN_LOCK_UNLOCKED is deprecated, use DEFINE_SPINLOCK instead.
(as suggested in Documentation/spinlocks.txt)

Signed-off-by: Xu Gang <xug@cn.fujitsu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
15 years agoGFS2: cleanup file_operations mess
Christoph Hellwig [Tue, 7 Apr 2009 17:42:17 +0000 (19:42 +0200)]
GFS2: cleanup file_operations mess

Remove the weird pointer to file_operations mess and replace it with
straight-forward defining of the lockinginstance names to the _nolock
variants.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
15 years agoGFS2: Move umount flush rwsem
Steven Whitehouse [Tue, 7 Apr 2009 13:01:34 +0000 (14:01 +0100)]
GFS2: Move umount flush rwsem

The rwsem, used only on umount, is in the wrong place in glock.c.
This patch moves it up a bit so that it does not get called under
a spinlock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
15 years agoGFS2: Fix symlink creation race
Steven Whitehouse [Tue, 31 Mar 2009 15:06:27 +0000 (16:06 +0100)]
GFS2: Fix symlink creation race

In certain cases symlinks can appear to have zero size if a lookup
on the inode occurs within a certain (very short) time after the
symlink has been created. The symlink is correctly created on disk
but appears to have zero size when stat()ed. This patch closes the
race and prevents incorrect sizes appearing.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
15 years agoGFS2: Make quotad's waiting interruptible
Steven Whitehouse [Tue, 31 Mar 2009 14:49:08 +0000 (15:49 +0100)]
GFS2: Make quotad's waiting interruptible

So we don't count its D state in the loadavg.

Reported-by: Nathan Straz <nstraz@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
15 years agoALSA: hda - Add quirk mask for Fujitsu Amilo laptops with ALC883
Takashi Iwai [Tue, 14 Apr 2009 12:51:04 +0000 (14:51 +0200)]
ALSA: hda - Add quirk mask for Fujitsu Amilo laptops with ALC883

Added the models for quirk bitmask 1734:110x and 1734:113x of
Fujitsu laptops.

This will fix the model detection for Amilo Xa3540.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
15 years ago[ALSA] intel8x0: add one retry to the ac97_clock measurement routine
Jaroslav Kysela [Wed, 15 Apr 2009 08:16:24 +0000 (10:16 +0200)]
[ALSA] intel8x0: add one retry to the ac97_clock measurement routine

It seems that on some hardware platforms, the first measurement is wrong.
This patch adds second measurement to this case.

Signed-off-by: Jaroslav Kysela <perex@perex.cz>
15 years agocfq-iosched: don't delay queue kick for a merged request
Jens Axboe [Tue, 14 Apr 2009 12:18:16 +0000 (14:18 +0200)]
cfq-iosched: don't delay queue kick for a merged request

"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> reports that commit
b029195dda0129b427c6e579a3bb3ae752da3a93 introduced a regression
of about 50% with sequential threaded read workloads. The test
case is:

tiotest -k0 -k1 -k3 -f 80 -t 32

which starts 32 threads each reading a 80MB file. Twiddle the kick
queue logic so that we do start IO immediately, if it appears to be
a fully merged request. We can't really detect that, so just check
if the request is bigger than a page or not. The assumption is that
since single bio issues will first queue a single request with just
one page attached and then later do merges on that, if we already
have more than a page worth of data in the request, then the request
is most likely good to go.

Verified that this doesn't cause a regression with the test case that
commit b029195dda0129b427c6e579a3bb3ae752da3a93 was fixing. It does not,
we still see maximum sized requests for the queue-then-merge cases.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agobuffer: switch do_emergency_thaw() away from pdflush_operation()
Jens Axboe [Wed, 8 Apr 2009 11:44:08 +0000 (13:44 +0200)]
buffer: switch do_emergency_thaw() away from pdflush_operation()

This is (again) a preparatory patch similar to commit
a2a9537ac0b37a5da6fbe7e1e9cb06c524d2a9c4. It open codes a simple
async way of executing do_thaw_all() out of context, so we can get
rid of pdflush.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoblock: update biodoc.txt on plugging
Jens Axboe [Wed, 8 Apr 2009 09:38:50 +0000 (11:38 +0200)]
block: update biodoc.txt on plugging

We do per-device plugging, get rid of any references to tq_disk as that
has been dead since 2.6.5 or so.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoas-iosched: get rid of private REQ_SYNC/REQ_ASYNC defines
Jens Axboe [Wed, 8 Apr 2009 09:02:08 +0000 (11:02 +0200)]
as-iosched: get rid of private REQ_SYNC/REQ_ASYNC defines

We can just use the block layer BLK_RW_SYNC/ASYNC defines now.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agocfq-iosched: get rid of private SYNC/ASYNC defines
Jens Axboe [Wed, 8 Apr 2009 08:58:57 +0000 (10:58 +0200)]
cfq-iosched: get rid of private SYNC/ASYNC defines

We can just use the block layer BLK_RW_SYNC/ASYNC defines now.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agocfq-iosched: use rw_is_sync() to see if rw flags are sync or not
Jens Axboe [Wed, 8 Apr 2009 08:56:08 +0000 (10:56 +0200)]
cfq-iosched: use rw_is_sync() to see if rw flags are sync or not

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoDocument and move the various READ/WRITE types
Jens Axboe [Tue, 14 Apr 2009 06:19:27 +0000 (08:19 +0200)]
Document and move the various READ/WRITE types

It's a somewhat twisty maze of hints and behavioural modifiers, try
and clear it up a bit with some documentation.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoblock: fix bad spelling of quiesce
Jens Axboe [Wed, 8 Apr 2009 12:22:01 +0000 (14:22 +0200)]
block: fix bad spelling of quiesce

Credit goes to Andrew Morton for spotting this one.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoblock: move bio list helpers into bio.h
Christoph Hellwig [Tue, 7 Apr 2009 17:55:13 +0000 (19:55 +0200)]
block: move bio list helpers into bio.h

It's used by DM and MD and generally useful, so move the bio list
helpers into bio.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agopowerpc: pseries/dtl.c should include asm/firmware.h
Sachin Sant [Tue, 14 Apr 2009 14:35:55 +0000 (14:35 +0000)]
powerpc: pseries/dtl.c should include asm/firmware.h

A randconfig build on powerpc failed with:

dtl.c: In function 'dtl_init':
dtl.c:238: error: implicit declaration of function 'firmware_has_feature'
dtl.c:238: error: 'FW_FEATURE_SPLPAR' undeclared (first use in this function)

- We need firmware.h for these definitions.

Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
15 years agopowerpc: Fix data-corrupting bug in __futex_atomic_op
Paul Mackerras [Mon, 13 Apr 2009 14:09:09 +0000 (14:09 +0000)]
powerpc: Fix data-corrupting bug in __futex_atomic_op

Richard Henderson pointed out that the powerpc __futex_atomic_op has a
bug: it will write the wrong value if the stwcx. fails and it has to
retry the lwarx/stwcx. loop, since 'oparg' will have been overwritten
by the result from the first time around the loop.  This happens
because it uses the same register for 'oparg' (an input) as it uses
for the result.

This fixes it by using separate registers for 'oparg' and 'ret'.

Cc: stable@kernel.org
Signed-off-by: Paul Mackerras <paulus@samba.org>
15 years agopowerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()
Mike Mason [Fri, 10 Apr 2009 08:57:03 +0000 (08:57 +0000)]
powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()

While adding native EEH support to Emulex and Qlogic drivers, it was
discovered that dev->error_state was set to pci_io_channel_normal too
late in the recovery process. These drivers rely on error_state to
determine if they can access the device in their slot_reset callback,
thus error_state needs to be set to pci_io_channel_normal in
eeh_report_reset(). Below is a detailed explanation (courtesy of Richard
Lary) as to why this is necessary.

Background:
PCI MMIO or DMA accesses to a frozen slot generate additional EEH
errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the
adapter will be shutdown. To avoid triggering excessive EEH errors and
an undesirable adapter shutdown, some drivers use the
pci_channel_offline(dev) wrapper function to return a Boolean value
based on the value of pci_dev->error_state to determine if PCI MMIO or
DMA accesses are safe. If the wrapper returns TRUE, drivers must not
make PCI MMIO or DMA access to their hardware.

The pci_dev structure member error_state reflects one of three values,
1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3)
pci_channel_io_perm_failure.  Function pci_channel_offline(dev) returns
TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure.

The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the
point where the PCI slot is frozen. Currently, the EEH driver restores
dev->error_state to pci_channel_io_normal in eeh_report_resume() before
calling the driver's resume callback. However, when the EEH driver calls
the driver's slot_reset callback() from eeh_report_reset(), it
incorrectly indicates the error state is still pci_channel_io_frozen.

Waiting until eeh_report_resume() to restore dev->error_state to
pci_channel_io_normal is too late for Emulex and QLogic FC drivers and
any other drivers which are designed to use common code paths in these
two cases: i) those called after the driver's slot_reset callback() and
ii) those called after the PCI slot is frozen but before the driver's
slot_reset callback is called. Case i) all driver paths executed to
reinitialize the hardware after a reset and case ii) all code paths
executed by driver kernel threads that run asynchronous to the main
driver thread, such as interrupt handlers and worker threads to process
driver work queues.

Emulex and QLogic FC drivers are designed with common code paths which
require that pci_channel_offline(dev) reflect the true state of the
hardware. The state transitions that the hardware takes from Normal
Operations to Slot Frozen to Reset to Normal Operations are documented
in the Power Architectureâ„¢ Platform Requirements+ (PAPR+) in Table 75.
PE State Control.

PAPR defines the following 3 states:

0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed
     (Normal Operations)
1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled
2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled
     (Slot Frozen)

An EEH error places the slot in state 2 (Frozen) and the adapter driver
is notified that an EEH error was detected. If the adapter driver
returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls
eeh_reset_device() to place the slot into state 1 (Reset) and
eeh_reset_device completes by placing the slot into State 0 (Normal
Operations). Upon return from eeh_reset_device(), the EEH driver calls
eeh_report_reset, which then calls the adapter's slot_reset callback. At
the time the adapter's slot_reset callback is called, the true state of
the hardware is Normal Operations and should be accurately reflected by
setting dev->error_state to pci_channel_io_normal.

The current implementation of EEH driver does not do so and requires
this change to correct this deficiency.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Acked-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>