Len Brown [Fri, 30 Mar 2012 20:35:53 +0000 (16:35 -0400)]
Merge branch 'tboot' into release
Conflicts:
drivers/acpi/acpica/hwsleep.c
Text conflict between:
2feec47d4c5f80b05f1650f5a24865718978eea4
(ACPICA: ACPI 5: Support for new FADT SleepStatus, SleepControl registers)
which removed #include "actables.h"
and
09f98a825a821f7a3f1b162f9ed023f37213a63b
(x86, acpi, tboot: Have a ACPI os prepare sleep instead of calling tboot_sleep.)
which removed #include <linux/tboot.h>
The resolution is to remove them both.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 30 Mar 2012 20:21:26 +0000 (16:21 -0400)]
Merge branch 'd3' into release
Conflicts:
drivers/acpi/sleep.c
This was a text conflict between
a2ef5c4fd44ce3922435139393b89f2cce47f576
(ACPI: Move module parameter gts and bfs to sleep.c)
which added #include <linux/module.h>
and
b24e5098853653554baf6ec975b9e855f3d6e5c0
(ACPI, PCI: Move acpi_dev_run_wake() to ACPI core)
which added #include <linux/pm_runtime.h>
The resolution was to take them both.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 30 Mar 2012 20:12:23 +0000 (16:12 -0400)]
Merge branch 'apei' into release
Conflicts:
drivers/acpi/apei/apei-base.c
This was a conflict between
15afae604651d4e17652d2ffb56f5e36f991cfef
(CPI, APEI: Fix incorrect APEI register bit width check and usage)
and
653f4b538f66d37db560e0f56af08117136d29b7
(ACPICA: Expand OSL memory read/write interfaces to 64 bits)
The former changed a parameter in the call to acpi_os_read_memory64()
and the later replaced all calls to acpi_os_read_memory64()
with calls to acpi_os_read_memory().
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 30 Mar 2012 20:10:37 +0000 (16:10 -0400)]
Merge branches 'acpica', 'bgrt', 'bz-11533', 'cpuidle', 'ec', 'hotplug', 'misc', 'red-hat-bz-727865', 'thermal', 'throttling', 'turbostat' and 'video' into release
Signed-off-by: Len Brown <len.brown@intel.com>
Dan Carpenter [Wed, 7 Mar 2012 11:57:36 +0000 (14:57 +0300)]
ACPI throttling: fix endian bug in acpi_read_throttling_status()
Using a u64 here creates an endian bug. We store a u32 number in the
top byte which is a larger number than intended on big endian systems.
There is no reason to use a 64 bit data type here, I guess it was just
an oversight.
I removed the initialization to zero as well. It's needed with a u64
but with a u32, the variable gets initialized properly inside the call
to acpi_os_read_port().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Andi Kleen [Mon, 6 Feb 2012 16:17:12 +0000 (08:17 -0800)]
Disable MCP limit exceeded messages from Intel IPS driver
On a system on the thermal limit these are quite noisy and flood the logs.
Better would be a counter anyways. But given that we don't even have
anything for normal throttling this doesn't seem to be urgent either.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Igor Murzov [Fri, 30 Mar 2012 17:32:09 +0000 (21:32 +0400)]
ACPI video: Don't start video device until its associated input device has been allocated
Quoth Dmitry Torokhov:
In addition to bus notifier we do install device notifier explicitly
so it might fire up early. The easiest fox would be to move
acpi_video_bus_start_devices() after input_allocate_device() but
before input_register_device() - unregistered input devices can handle
input_event() calls just fine.
May fix crashes reported in:
https://bugzilla.kernel.org/show_bug.cgi?id=40672
Signed-off-by: Igor Murzov <e-mail@date.by>
Signed-off-by: Len Brown <len.brown@intel.com>
Igor Murzov [Fri, 30 Mar 2012 17:32:08 +0000 (21:32 +0400)]
ACPI video: Harden video bus adding.
It is always better to check return values, so add some new checks and
correct existing ones.
v2: Be consistent and don't mix errors from -E* and AE_* namespaces.
Signed-off-by: Igor Murzov <e-mail@date.by>
Signed-off-by: Len Brown <len.brown@intel.com>
Matthew Garrett [Tue, 31 Jan 2012 18:19:20 +0000 (13:19 -0500)]
ACPI: Add support for exposing BGRT data
ACPI 5.0 adds the BGRT, a table that contains a pointer to the firmware
boot splash and associated metadata. This simple driver exposes it via
/sys/firmware/acpi in order to allow bootsplash applications to draw their
splash around the firmware image and reduce the number of jarring graphical
transitions during boot.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Matthew Garrett [Tue, 31 Jan 2012 18:19:19 +0000 (13:19 -0500)]
ACPI: export acpi_kobj
Drivers may wish to add entries to /sys/firmware/acpi, so export acpi_kobj
in order to let them do that.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Myron Stowe [Thu, 9 Feb 2012 16:36:41 +0000 (09:36 -0700)]
ACPI: Fix logic for removing mappings in 'acpi_unmap'
Make sure the removal of mappings uses the same logic that put the
mappings in place.
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Jiang Liu [Tue, 14 Feb 2012 16:01:44 +0000 (00:01 +0800)]
CPER failed to handle generic error records with multiple sections
The function apei_estatus_print() and apei_estatus_check() forget to move ahead
the gdata pointer when dealing with multiple generic error data sections.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Alex He [Tue, 21 Feb 2012 08:58:10 +0000 (16:58 +0800)]
ACPI: Clean redundant codes in scan.c
Clean the redundant codes of apci_bus_get_power_flags().
Signed-off-by: Alex He <alex.he@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Paul E. McKenney [Tue, 28 Feb 2012 21:27:44 +0000 (13:27 -0800)]
ACPI: Fix unprotected smp_processor_id() in acpi_processor_cst_has_changed()
The acpi_processor_cst_has_changed() function is invoked from a
CPU_ONLINE or CPU_DEAD function, which might well execute on CPU 0
even though the CPU being hotplugged is some other CPU. In addition,
acpi_processor_cst_has_changed() invokes smp_processor_id() without
protection, resulting in splats when onlining CPUs.
This commit therefore changes the smp_processor_id() to pr->id, as is
used elsewhere in the code, for example, in acpi_processor_add().
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Yong Zhang <yong.zhang0@gmail.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Jan Beulich [Fri, 24 Feb 2012 11:41:53 +0000 (11:41 +0000)]
ACPI: consistently use should_use_kmap()
... so that acpi_unmap()'s behavior gets in sync with acpi_map()'s.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Yinghai Lu [Sat, 3 Mar 2012 21:29:20 +0000 (13:29 -0800)]
PNPACPI: Fix device ref leaking in acpi_pnp_match
During testing pci root bus removal, found some root bus bridge is not freed.
If booting with pnpacpi=off, those hostbridge could be freed without problem.
It turns out that some devices reference are not released during acpi_pnp_match.
that match should not hold one device ref during every calling.
Add pu_device calling before returning.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Petr Vandrovec [Thu, 8 Mar 2012 21:33:24 +0000 (13:33 -0800)]
ACPI: Fix use-after-free in acpi_map_lsapic
When processor is being hot-added to the system, acpi_map_lsapic invokes
ACPI _MAT method to find APIC ID and flags, verifies that returned structure
is indeed ACPI's local APIC structure, and that flags contain MADT_ENABLED
bit. Then saves APIC ID, frees structure - and accesses structure when
computing arguments for acpi_register_lapic call. Which sometime leads
to acpi_register_lapic call being made with second argument zero, failing
to bring processor online with error 'Unable to map lapic to logical cpu
number'.
As lapic->lapic_flags & ACPI_MADT_ENABLED was already confirmed to be non-zero
few lines above, we can just pass unconditional ACPI_MADT_ENABLED to the
acpi_register_lapic.
Signed-off-by: Petr Vandrovec <petr@vmware.com>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Julia Lawall [Thu, 15 Mar 2012 08:32:05 +0000 (09:32 +0100)]
ACPI: processor_driver: add missing kfree
The function acpi_processor_add is stored in the ops.add field of a
acpi_driver structure. This function is then called in
acpi_bus_driver_init. On failure, this function clears the field
device->driver_data, but does not free its contents. Thus the free has to
be done by the add function. In acpi_processor_add, the corresponding
value is pr. This value is currently freed on failure before storing it in
device->driver_data, but not after. This free is added in the error
handling code at the end of the function. The per_cpu variable
processors is also cleared so that it does not refer to a dangling pointer.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Gary Hade [Wed, 21 Mar 2012 22:28:50 +0000 (15:28 -0700)]
ACPI, APEI: Fix incorrect APEI register bit width check and usage
The current code incorrectly assumes that
(1) the APEI register bit width is always 8, 16, 32, or 64 and
(2) the APEI register bit width is always equal to the APEI
register access width.
ERST serialization instructions entries such as:
[030h 0048 1] Action : 00 [Begin Write Operation]
[031h 0049 1] Instruction : 03 [Write Register Value]
[032h 0050 1] Flags (decoded below) : 01
Preserve Register Bits : 1
[033h 0051 1] Reserved : 00
[034h 0052 12] Register Region : [Generic Address Structure]
[034h 0052 1] Space ID : 00 [SystemMemory]
[035h 0053 1] Bit Width : 03
[036h 0054 1] Bit Offset : 00
[037h 0055 1] Encoded Access Width : 03 [DWord Access:32]
[038h 0056 8] Address :
000000007F2D7038
[040h 0064 8] Value :
0000000000000001
[048h 0072 8] Mask :
0000000000000007
break this assumption by yielding:
[Firmware Bug]: APEI: Invalid bit width in GAR [0x7f2d7038/3/0]
I have found no ACPI specification requirements corresponding
with the above assumptions. There is even a good example in
the Serialization Instruction Entries section (ACPI 4.0 section
17.4,1.2, ACPI 4.0a section 2.5.1.2, ACPI 5.0 section 18.5.1.2)
that mentions a serialization instruction with a bit range of
[6:2] which is 5 bits wide, _not_ 8, 16, 32, or 64 bits wide.
Compile and boot tested with 3.3.0-rc7 on a IBM HX5.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Chen Gong [Thu, 15 Mar 2012 08:53:37 +0000 (16:53 +0800)]
Update documentation for parameter *notrigger* in einj.txt
Add description of parameter notrigger in the einj.txt.
One can utilize this new parameter to do some SRAR injection
test. Pay attention, the operation is highly depended on the
BIOS implementation. If no proper BIOS supports it, even if
enabling this parameter, expected result will not happen.
v2:
Update the documentation suggested by Tony
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Chen Gong [Thu, 15 Mar 2012 08:53:36 +0000 (16:53 +0800)]
ACPI, APEI, EINJ, new parameter to control trigger action
Some APEI firmware implementation will access injected address
specified in param1 to trigger the error when injecting memory
error, which means if one SRAR error is injected, the crash
always happens because it is executed in kernel context. This
new parameter can disable trigger action and control is taken
over by the user. In this way, an SRAR error can happen in user
context instead of crashing the system. This function is highly
depended on BIOS implementation so please ensure you know the
BIOS trigger procedure before you enable this switch.
v2:
notrigger should be created together with param1/param2
Tested-by: Tony Luck <tony.luck@lintel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Chen Gong [Sat, 3 Mar 2012 03:56:43 +0000 (11:56 +0800)]
ACPI, APEI, EINJ, limit the range of einj_param
On the platforms with ACPI4.x support, parameter extension
is not always doable, which means only parameter extension
is enabled, einj_param can take effect.
v2->v1: stopping early in einj_get_parameter_address for einj_param
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Jiang Liu [Wed, 7 Mar 2012 14:15:06 +0000 (22:15 +0800)]
ACPI, APEI, Fix ERST header length check
This fixes a trivial copy & paste error in ERST header length check.
It's just for future safety because sizeof(struct acpi_table_einj)
equals to sizeof(struct acpi_table_erst) with current ACPI5.0
specification. It applies to v3.3-rc6.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Boris Ostrovsky [Tue, 13 Mar 2012 18:55:10 +0000 (19:55 +0100)]
cpuidle: power_usage should be declared signed integer
power_usage is always assigned a negative value and should be declared
a signed integer
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Boris Ostrovsky [Tue, 13 Mar 2012 18:55:09 +0000 (19:55 +0100)]
idle, x86: Allow off-lined CPU to enter deeper C states
Currently when a CPU is off-lined it enters either MWAIT-based idle or,
if MWAIT is not desired or supported, HLT-based idle (which places the
processor in C1 state). This patch allows processors without MWAIT
support to stay in states deeper than C1.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Toshi Kani [Mon, 19 Mar 2012 19:08:02 +0000 (13:08 -0600)]
ACPI: Add CPU hotplug support for processor device objects
acpi_processor_install_hotplug_notify() registers processor objects to
receive ACPI CPU hotplug event notifications. This patch additionally
registers processor device objects (ACPI0007) to receive the notifications
as well.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Tue, 14 Feb 2012 00:04:43 +0000 (17:04 -0700)]
ACPI / PM: print physical addresses consistently with other parts of kernel
Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Matthew Garrett [Wed, 1 Feb 2012 15:26:54 +0000 (10:26 -0500)]
ACPI: Evaluate thermal trip points before reading temperature
An HP laptop (Pavilion G4-1016tx) has the following code in _TMP:
Store (\_SB.PCI0.LPCB.EC0.RTMP, Local0)
If (LGreaterEqual (Local0, S4TP))
{
Store (One, HTS4)
}
S4TP is initialised at 0 and not programmed further until either _HOT or
_CRT is called. If we evaluate _TMP before the trip points then HTS4 will
always be set, causing the firmware to generate a message on boot
complaining that the system shut down because of overheating. The simplest
solution is just to reverse the checking of trip points and _TMP in thermal
init.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Lin Ming [Tue, 27 Mar 2012 07:43:25 +0000 (15:43 +0800)]
ACPI, PCI: Move acpi_dev_run_wake() to ACPI core
acpi_dev_run_wake() is a generic function which can be used by
other subsystem too. Rename it to acpi_pm_device_run_wake, to be
consistent with acpi_pm_device_sleep_wake.
Then move it to ACPI core.
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Daniel Lezcano [Mon, 26 Mar 2012 12:51:28 +0000 (14:51 +0200)]
cpuidle: remove unused 'governor_data' field
As far as I can see, this field is never used in the code.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Daniel Lezcano [Mon, 26 Mar 2012 12:51:27 +0000 (14:51 +0200)]
cpuidle: remove useless array definition in cpuidle_structure
All the modules name are ro-data, it is never copied to the array.
eg.
static struct cpuidle_driver intel_idle_driver = {
.name = "intel_idle",
.owner = THIS_MODULE,
};
It safe to assign the pointer of this ro-data to a const char *.
By this way we save 12 bytes.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Tested-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Daniel Lezcano [Mon, 26 Mar 2012 12:51:26 +0000 (14:51 +0200)]
cpuidle: use the driver's state_count as default
If the state_count is not initialized for the device use
the driver's state count as the default. That will prevent
to add it manually in the cpuidle driver initialization
routine and will save us from duplicate line of code.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Len Brown <len.brown@intel.com>
ShuoX Liu [Wed, 28 Mar 2012 22:19:11 +0000 (15:19 -0700)]
cpuidle: add a sysfs entry to disable specific C state for debug purpose.
Some C states of new CPU might be not good. One reason is BIOS might
configure them incorrectly. To help developers root cause it quickly, the
patch adds a new sysfs entry, so developers could disable specific C state
manually.
In addition, C state might have much impact on performance tuning, as it
takes much time to enter/exit C states, which might delay interrupt
processing. With the new debug option, developers could check if a deep C
state could impact performance and how much impact it could cause.
Also add this option in Documentation/cpuidle/sysfs.txt.
[akpm@linux-foundation.org: check kstrtol return value]
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Reviewed-by: Yanmin Zhang <yanmin_zhang@intel.com>
Reviewed-and-Tested-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Lin Ming [Thu, 29 Mar 2012 06:09:39 +0000 (14:09 +0800)]
ACPI: Add interface to register/unregister device to/from power resources
Devices may share same list of power resources in _PR0, for example
Device(Dev0)
{
Name (_PR0, Package (0x01)
{
P0PR,
P1PR
})
}
Device(Dev1)
{
Name (_PR0, Package (0x01)
{
P0PR,
P1PR
}
}
Assume Dev0 and Dev1 were runtime suspended.
Then Dev0 is resumed first and it goes into D0 state.
But Dev1 is left in D0_Uninitialised state.
This is wrong. In this case, Dev1 must be resumed too.
In order to hand this case, each power resource maintains a list of
devices which relies on it.
When power resource is ON, it will check if the devices on its list
can be resumed. The device can only be resumed when all the power
resouces of its _PR0 are ON.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Thu, 29 Mar 2012 06:09:38 +0000 (14:09 +0800)]
ACPI: Introduce ACPI D3_COLD state support
If a device has _PR3, it means the device supports D3_COLD.
Add the ability to validate and enter D3_COLD state in ACPI.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Wed, 21 Mar 2012 03:02:24 +0000 (11:02 +0800)]
ACPICA: Update to version
20120320
Version
20120320.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Wed, 21 Mar 2012 01:51:39 +0000 (09:51 +0800)]
ACPICA: Object repair code: Support to add Package wrappers
Repair a common problem with objects that are defined to return
a variable-length Package of sub-objects. If there is only one
sub-object, some BIOS code mistakenly simply declares the single
object instead of a Package with one sub-object. This function
attempts to repair this error by wrapping a Package object around
the original object, creating the correct and expected Package
with one sub-object.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Andi Kleen [Mon, 6 Feb 2012 16:17:09 +0000 (08:17 -0800)]
ACPI: Make ACPI interrupt threaded
Some ACPI interrupt actions may need to wait, and it's easiest to
have a thread context for this. So turn the ACPI interrupt
into a threaded interrupt.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Andi Kleen [Mon, 6 Feb 2012 16:17:08 +0000 (08:17 -0800)]
ACPI: ec: Do request_region outside WARN()
WARN() is not supposed to have side effects, so move the request_regions
outside.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 30 Mar 2012 02:19:58 +0000 (22:19 -0400)]
tools turbostat: harden against cpu online/offline
Sometimes users have turbostat running in interval mode
when they take processors offline/online.
Previously, turbostat would survive, but not gracefully.
Tighten up the error checking so turbostat notices
changesn sooner, and print just 1 line on change:
turbostat: re-initialized with num_cpus %d
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 30 Mar 2012 01:44:40 +0000 (21:44 -0400)]
tools turbostat: reduce measurement overhead due to IPIs
turbostat uses /dev/cpu/*/msr interface to read MSRs.
For modern systems, it reads 10 MSR/CPU. This can
be observed as 10 "Function Call Interrupts"
per CPU per sample added to /proc/interrupts.
This overhead is measurable on large idle systems,
and as Yoquan Song pointed out, it can even trick
cpuidle into thinking the system is busy.
Here turbostat re-schedules itself in-turn to each
CPU so that its MSR reads will always be local.
This replaces the 10 "Function Call Interrupts"
with a single "Rescheduling interrupt" per sample
per CPU.
On an idle 32-CPU system, this shifts some residency from
the shallow c1 state to the deeper c7 state:
# ./turbostat.old -s
%c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7
0.27 1.29 2.29 0.95 0.02 0.00 98.77 20.23 0.00 77.41 0.00
0.25 1.24 2.29 0.98 0.02 0.00 98.75 20.34 0.03 77.74 0.00
0.27 1.22 2.29 0.54 0.00 0.00 99.18 20.64 0.00 77.70 0.00
0.26 1.22 2.29 1.22 0.00 0.00 98.52 20.22 0.00 77.74 0.00
0.26 1.38 2.29 0.78 0.02 0.00 98.95 20.51 0.05 77.56 0.00
^C
i# ./turbostat.new -s
%c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7
0.27 1.20 2.29 0.24 0.01 0.00 99.49 20.58 0.00 78.20 0.00
0.27 1.22 2.29 0.25 0.00 0.00 99.48 20.79 0.00 77.85 0.00
0.27 1.20 2.29 0.25 0.02 0.00 99.46 20.71 0.03 77.89 0.00
0.28 1.26 2.29 0.25 0.01 0.00 99.46 20.89 0.02 77.67 0.00
0.27 1.20 2.29 0.24 0.01 0.00 99.48 20.65 0.00 78.04 0.00
cc: Youquan Song <youquan.song@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Mon, 6 Feb 2012 23:37:16 +0000 (18:37 -0500)]
tools turbostat: add summary option
turbostat -s
cuts down on the amount of output, per user request.
also treak some output whitespace and the man page.
Signed-off-by: Len Brown <len.brown@intel.com>
Lin Ming [Wed, 21 Mar 2012 02:58:46 +0000 (10:58 +0800)]
ACPI: Move module parameter gts and bfs to sleep.c
Move linux specific module parameter gts and bfs out of ACPICA core
code to sleep.c.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Lin Ming [Wed, 21 Mar 2012 02:01:49 +0000 (10:01 +0800)]
ACPICA: Sleep/Wake interfaces: optionally execute _GTS and _BFS
Enhanced the sleep/wake interfaces to optionally execute the
_GTS method (Going To Sleep), and the _BFS method (Back From
Sleep). Windows apparently does not execute these methods, and
therefore these methods are often untested. It has been seen on
some systems where the execution of these methods causes errors
and also prevents the machine from entering S5. It is therefore
suggested that host operating systems do not execute these methods
by default. In the future, perhaps these methods can be optionally
executed based on the age of the system and/or what is the newest
version of Windows that the BIOS asks for via _OSI.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Andi Kleen [Mon, 6 Feb 2012 16:17:11 +0000 (08:17 -0800)]
ACPI: Do cpufreq clamping for throttling per package v2
On Intel CPUs the processor typically uses the highest frequency
set by any logical CPU. When the system overheats
Linux first forces the frequency to the lowest available one
to lower the temperature.
However this was done only per logical CPU, which means all
logical CPUs in a package would need to go through this before
the frequency is actually lowered.
Worse this delay actually prevents real throttling, because
the real throttle code only proceeds when the lowest frequency
is already reached.
So when a throttle event happens force the lowest frequency
for all CPUs in the package where it happened. The per CPU
state is now kept per package, not per logical CPU. An alternative
would be to do it per cpufreq unit, but since we want to bring
down the temperature of the complete chip it's better
to do it for all.
In principle it may even make sense to do it for all CPUs,
but I kept it on the package for now.
With this change the frequency is actually lowered, which
in terms also allows real throttling to proceed.
I also removed an unnecessary per cpu variable initialization.
v2: Fix package mapping
Cc: <stable@vger.kernel.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Wed, 21 Mar 2012 01:46:47 +0000 (09:46 +0800)]
ACPICA: Debugger: Add missing object info to namespace dump
Many namespace node types must have an attached object. For
these node types, print a message about a missing object during
a namespace dump.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Wed, 21 Mar 2012 01:44:00 +0000 (09:44 +0800)]
ACPICA: Change exception code for invalid pathname in acpi_evaluate_object
Change the returned exception code from AE_BAD_PARAMETER to the
more appropriate AE_BAD_PATHNAME, when the input pathname to
evaluate object is invalid.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Wed, 21 Mar 2012 01:42:45 +0000 (09:42 +0800)]
ACPICA: Clarify METHOD_NAME* defines for full-pathname cases
Changed the METHOD_NAME* defines that define a full pathname to
the method to METHOD_PATHNAME* in order to make it clear that
it is not a simple 4-character ACPI name. Used for the various
sleep/wake methods.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Mon, 20 Feb 2012 05:50:13 +0000 (13:50 +0800)]
ACPICA: Update to version
20120215
Version
20120215.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Tue, 14 Feb 2012 10:57:13 +0000 (18:57 +0800)]
ACPICA: Add table-driven dispatch for sleep/wake functions
Simplifies the code, especially the compile-time
ACPI_REDUCED_HARDWARE option.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Tue, 14 Feb 2012 10:47:42 +0000 (18:47 +0800)]
ACPICA: Split sleep/wake functions into two files
The functions for the original/legacy sleep/wake registers are in
hwsleep.c, and the functions for the new extended FADT V5 sleep
registers are in hwesleep.c
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Tue, 14 Feb 2012 10:43:03 +0000 (18:43 +0800)]
ACPICA: Distill multiple sleep method functions to a single function
Adds acpi_hw_execute_sleep_method to handle the various sleep methods
such as _GTS, _BFS, _WAK, and _SST. Removes the specialized
functions previously used for each of these methods.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Tue, 14 Feb 2012 10:31:56 +0000 (18:31 +0800)]
ACPICA: Add acpi_os_physical_table_override interface
This interface allows the host to override a table via a
physical address, instead of the logical address required by
acpi_os_table_override. This simplifies the host implementation.
Initial implementation by Thomas Renninger. ACPICA implementation
creates a single function for table overrides that attempts both
a logical and a physical override.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Tue, 14 Feb 2012 10:23:39 +0000 (18:23 +0800)]
ACPICA: ACPI 5: Update debug output for new notify values
Add new notify values, add support for "hardware specific" notifies.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Tue, 14 Feb 2012 10:29:55 +0000 (18:29 +0800)]
ACPICA: Expand OSL memory read/write interfaces to 64 bits
This change expands acpi_os_read_memory and acpi_os_write_memory to a
full 64 bits. This allows 64 bit transfers via the acpi_read and
acpi_write interfaces. Note: The internal acpi_hw_read and acpi_hw_write
interfaces remain at 32 bits, because 64 bits is not needed to
access the standard ACPI registers.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Tue, 14 Feb 2012 10:14:27 +0000 (18:14 +0800)]
ACPICA: Support for custom ACPICA build for ACPI 5 reduced hardware
Add ACPI_REDUCED_HARDWARE flag that removes all hardware-related
code (about 10% code, 5% static data).
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Tue, 14 Feb 2012 07:22:51 +0000 (15:22 +0800)]
ACPICA: Move ACPI timer prototypes to public acpixf file
These prototypes were incorrectly declared in achware.h.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Tue, 14 Feb 2012 07:00:53 +0000 (15:00 +0800)]
ACPICA: ACPI 5: Support for new FADT SleepStatus, SleepControl registers
Adds sleep and wake support for systems with these registers.
One new file, hwxfsleep.c
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bob Moore [Tue, 14 Feb 2012 05:11:07 +0000 (13:11 +0800)]
ACPICA: Update _REV return value to 5
_REV returns the supported ACPI revision level, which is now 5.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Thu, 22 Mar 2012 05:31:09 +0000 (01:31 -0400)]
Merge branch 'stable/for-x86-for-3.4' of git://git./linux/kernel/git/konrad/xen into tboot
Robert Lee [Wed, 21 Mar 2012 16:48:25 +0000 (11:48 -0500)]
ARM: davinci: Fix for cpuidle consolidation changes
The recent cpuidle consolidation changes erroneously omitted one
critical line of code.
Signed-off-by: Robert Lee <rob.lee@linaro.org>
Tested-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Amit Daniel Kachhap [Wed, 21 Mar 2012 11:10:01 +0000 (16:40 +0530)]
thermal: Fix for setting the thermal zone mode to enable/disable
Basically without this patch changing the mode of thermal zone
is not possible as wrong string size is passed to strncmp.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@linaro.org>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Dan Carpenter [Wed, 21 Mar 2012 19:55:04 +0000 (12:55 -0700)]
thermal: spear13xx: checking for NULL instead of IS_ERR()
thermal_zone_device_register() never returns NULL, on error it returns and
ERR_PTR().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Viresh Kumar <viresh.kumar@st.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Viresh Kumar [Wed, 21 Mar 2012 19:55:03 +0000 (12:55 -0700)]
thermal/spear_thermal: replace readl/writel with lighter _relaxed variants
readl/writel versions for ARM contain memory barrier instruction for
synchronizing DMA buffers. These are not required at least on this
module. So use lighter _relaxed variants.
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Vincenzo Frascino [Wed, 21 Mar 2012 19:55:03 +0000 (12:55 -0700)]
thermal: add support for thermal sensor present on SPEAr13xx machines
ST's SPEAr13xx machines are based on CortexA9 ARM processors. These
machines contain a thermal sensor for junction temperature monitoring.
This patch adds support for this thermal sensor in existing thermal
framework.
[akpm@linux-foundation.org: little code cleanup]
[akpm@linux-foundation.org: print the pointer correctly]
[viresh.kumar@st.com: thermal/spear_thermal: add compilation dependency on PLAT_SPEAR]
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@st.com>
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Joe Perches [Wed, 21 Mar 2012 19:55:02 +0000 (12:55 -0700)]
thermal_sys: convert printks to pr_<level>
Use the current logging style.
Remove PREFIX, add pr_fmt, convert the printks. All dmesg output now
prefixed with "thermal_sys: ".
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Joe Perches [Wed, 21 Mar 2012 19:55:02 +0000 (12:55 -0700)]
thermal_sys: kernel style cleanups
Just a few tidies to make it more like most kernel sources.
A couple of long lines still remain.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Joe Perches [Wed, 21 Mar 2012 19:55:02 +0000 (12:55 -0700)]
thermal_sys: remove obfuscating used-once macros
These don't add any value as they are used only once and the surrounding
code uses similar variable.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Joe Perches [Wed, 21 Mar 2012 19:55:01 +0000 (12:55 -0700)]
thermal_sys: remove unnecessary line continuations
Line continations are not necessary in function calls or statements.
Remove them.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Fabio Estevam [Wed, 21 Mar 2012 19:55:00 +0000 (12:55 -0700)]
drivers/thermal/thermal_sys.c: fix build warning
With CONFIG_NET=n:
drivers/thermal/thermal_sys.c:63: warning: 'thermal_event_seqnum' defined but not used
Move 'thermal_event_seqnum' definition inside the '#ifdef CONFIG_NET'
[akpm@linux-foundation.org: make thermal_event_seqnum local to generate_netlink_event()]
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Durgadoss R <durgadoss.r@intel.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Robert Lee [Tue, 20 Mar 2012 20:22:49 +0000 (15:22 -0500)]
SH: shmobile: Consolidate time keeping and irq enable
Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.
Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Robert Lee [Tue, 20 Mar 2012 20:22:48 +0000 (15:22 -0500)]
ARM: shmobile: Consolidate time keeping and irq enable
Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.
Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Robert Lee [Tue, 20 Mar 2012 20:22:47 +0000 (15:22 -0500)]
ARM: omap: Consolidate OMAP4 time keeping and irq enable
Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.
Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Robert Lee [Tue, 20 Mar 2012 20:22:46 +0000 (15:22 -0500)]
ARM: omap: Consolidate OMAP3 time keeping and irq enable
Use core cpuidle timekeeping and irqen wrapper and remove that
handling from this code.
Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Jean Pihet <j-pihet@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Robert Lee [Tue, 20 Mar 2012 20:22:45 +0000 (15:22 -0500)]
ARM: davinci: Consolidate time keeping and irq enable
Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.
Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Robert Lee [Tue, 20 Mar 2012 20:22:44 +0000 (15:22 -0500)]
ARM: kirkwood: Consolidate time keeping and irq enable
Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.
Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Robert Lee [Tue, 20 Mar 2012 20:22:43 +0000 (15:22 -0500)]
ARM: at91: Consolidate time keeping and irq enable
Enable core cpuidle timekeeping and irq enabling and remove that
handling from this code.
Signed-off-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Robert Lee [Tue, 20 Mar 2012 20:22:42 +0000 (15:22 -0500)]
cpuidle: Add common time keeping and irq enabling
Make necessary changes to implement time keeping and irq enabling
in the core cpuidle code. This will allow the removal of these
functionalities from various platform cpuidle implementations whose
timekeeping and irq enabling follows the form in this common code.
Signed-off-by: Robert Lee <rob.lee@linaro.org>
Tested-by: Jean Pihet <j-pihet@ti.com>
Tested-by: Amit Daniel <amit.kachhap@linaro.org>
Tested-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Julian Anastasov [Thu, 23 Feb 2012 20:40:43 +0000 (22:40 +0200)]
ACPICA: Fix regression in FADT revision checks
commit
64b3db22c04586997ab4be46dd5a5b99f8a2d390 (2.6.39),
"Remove use of unreliable FADT revision field" causes regression
for old P4 systems because now cst_control and other fields are
not reset to 0.
The effect is that acpi_processor_power_init will notice
cst_control != 0 and a write to CST_CNT register is performed
that should not happen. As result, the system oopses after the
"No _CST, giving up" message, sometimes in acpi_ns_internalize_name,
sometimes in acpi_ns_get_type, usually at random places. May be
during migration to CPU 1 in acpi_processor_get_throttling.
Every one of these settings help to avoid this problem:
- acpi=off
- processor.nocst=1
- maxcpus=1
The fix is to update acpi_gbl_FADT.header.length after
the original value is used to check for old revisions.
https://bugzilla.kernel.org/show_bug.cgi?id=42700
https://bugzilla.redhat.com/show_bug.cgi?id=727865
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 31 Jul 2011 17:23:49 +0000 (13:23 -0400)]
ACPI: ignore FADT reset-reg-sup flag
we check that the address is non-zero later anyway.
https://bugzilla.kernel.org/show_bug.cgi?id=11533
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Sun, 18 Mar 2012 23:15:34 +0000 (16:15 -0700)]
Linux 3.3
Jason Baron [Fri, 16 Mar 2012 20:34:03 +0000 (16:34 -0400)]
Don't limit non-nested epoll paths
Commit
28d82dc1c4ed ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).
The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.
This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 18 Mar 2012 02:22:24 +0000 (19:22 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking changes from David Miller:
"1) icmp6_dst_alloc() returns NULL instead of ERR_PTR() leading to
crashes, particularly during shutdown. Reported by Dave Jones and
fixed by Eric Dumazet.
2) hyperv and wimax/i2400m return NETDEV_TX_BUSY when they have
already freed the SKB, which causes crashes as to the caller this
means requeue the packet. Fixes from Eric Dumazet.
3) usbnet driver doesn't allocate the right amount of headroom on
fresh RX SKBs, fix from Eric Dumazet.
4) Fix regression in ip6_mc_find_dev_rcu(), as an RCU lookup it
abolutely should not take a reference to 'dev', this leads to
leaks. Fix from RonQing Li.
5) Fix netfilter ctnetlink race between delete and timeout expiration.
From Pablo Neira Ayuso.
6) Revert SFQ change which causes regressions, specifically queueing
to tail can lead to unavoidable flow starvation. From Eric
Dumazet.
7) Fix a memory leak and a crash on corrupt firmware files in bnx2x,
from Michal Schmidt."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
netfilter: ctnetlink: fix race between delete and timeout expiration
ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
net/hyperv: fix erroneous NETDEV_TX_BUSY use
net/usbnet: reserve headroom on rx skbs
bnx2x: fix memory leak in bnx2x_init_firmware()
bnx2x: fix a crash on corrupt firmware file
sch_sfq: revert dont put new flow at the end of flows
ipv6: fix icmp6_dst_alloc()
Linus Torvalds [Sat, 17 Mar 2012 16:54:16 +0000 (09:54 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools, x86: Build perf on older user-space as well
perf tools: Use scnprintf where applicable
perf tools: Incorrect use of snprintf results in SEGV
Pablo Neira Ayuso [Fri, 16 Mar 2012 02:00:34 +0000 (02:00 +0000)]
netfilter: ctnetlink: fix race between delete and timeout expiration
Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.
After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.
Tested-by: Kerin Millar <kerframil@gmail.com>
Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RongQing.Li [Thu, 15 Mar 2012 22:54:14 +0000 (22:54 +0000)]
ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.
[ bug introduced in
96b52e61be1 (ipv6: mcast: RCU conversions) ]
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 17 Mar 2012 00:14:55 +0000 (17:14 -0700)]
Merge branch 'akpm' (more patches from Andrew)
Merge some more email patches from Andrew Morton:
"A couple of nilfs fixes"
* emailed from Andrew Morton <akpm@linux-foundation.org>:
nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
nilfs2: clamp ns_r_segments_percentage to [1, 99]
Ryusuke Konishi [Sat, 17 Mar 2012 00:08:39 +0000 (17:08 -0700)]
nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:
BUG: unable to handle kernel NULL pointer dereference at
00000048
IP: [<
d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2]
*pde =
00000000
Oops: 0000 [#1] PREEMPT SMP
...
Call Trace:
[<
d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2]
[<
d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2]
[<
c0226636>] mount_fs+0x36/0x180
[<
c023d961>] vfs_kern_mount+0x51/0xa0
[<
c023ddae>] do_kern_mount+0x3e/0xe0
[<
c023f189>] do_mount+0x169/0x700
[<
c023fa9b>] sys_mount+0x6b/0xa0
[<
c04abd1f>] sysenter_do_call+0x12/0x28
Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43
20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72
48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00
EIP: [<
d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:
ca9bbdcc
CR2:
0000000000000048
This turned out due to a defect in an error path which runs if the
calculated location of the secondary super block was invalid.
This patch fixes it and eliminates the reported oops.
Reported-by: Slicky Devil <slicky.dvl@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Slicky Devil <slicky.dvl@gmail.com>
Cc: <stable@vger.kernel.org> [2.6.30+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Haogang Chen [Sat, 17 Mar 2012 00:08:38 +0000 (17:08 -0700)]
nilfs2: clamp ns_r_segments_percentage to [1, 99]
ns_r_segments_percentage is read from the disk. Bogus or malicious
value could cause integer overflow and malfunction due to meaningless
disk usage calculation. This patch reports error when mounting such
bogus volumes.
Signed-off-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 17 Mar 2012 00:04:02 +0000 (17:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull maintainer update from James Morris:
"Please pull this patch which adds Serge as maintainer of the
capabilities code, as discussed on lwn and the lsm list.
New capabilities must be signed off by the maintainer, and new uses of
any capabilities should at be cc'd to the maintainer."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
MAINTAINERS: Add Serge as maintainer of capabilities
Linus Torvalds [Sat, 17 Mar 2012 00:03:15 +0000 (17:03 -0700)]
Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming
Pull c6x bugfix from Mark Salter:
"Remove dead code from entry.S which causes a build failure when using
a newer assembler (v2.22 complains about it, v2.20 ignores it)."
* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
C6X: remove dead code from entry.S
Anton Blanchard [Fri, 16 Mar 2012 10:28:19 +0000 (10:28 +0000)]
afs: Remote abort can cause BUG in rxrpc code
When writing files to afs I sometimes hit a BUG:
kernel BUG at fs/afs/rxrpc.c:179!
With a backtrace of:
afs_free_call
afs_make_call
afs_fs_store_data
afs_vnode_store_data
afs_write_back_from_locked_page
afs_writepages_region
afs_writepages
The cause is:
ASSERT(skb_queue_empty(&call->rx_queue));
Looking at a tcpdump of the session the abort happens because we
are exceeding our disk quota:
rx abort fs reply store-data error diskquota exceeded (32)
So the abort error is valid. We hit the BUG because we haven't
freed all the resources for the call.
By freeing any skbs in call->rx_queue before calling afs_free_call
we avoid hitting leaking memory and avoid hitting the BUG.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Blanchard [Fri, 16 Mar 2012 10:28:07 +0000 (10:28 +0000)]
afs: Read of file returns EBADMSG
A read of a large file on an afs mount failed:
# cat junk.file > /dev/null
cat: junk.file: Bad message
Looking at the trace, call->offset wrapped since it is only an
unsigned short. In afs_extract_data:
_enter("{%u},{%zu},%d,,%zu", call->offset, len, last, count);
...
if (call->offset < count) {
if (last) {
_leave(" = -EBADMSG [%d < %zu]", call->offset, count);
return -EBADMSG;
}
Which matches the trace:
[cat ] ==> afs_extract_data({65132},{524},1,,65536)
[cat ] <== afs_extract_data() = -EBADMSG [0 < 65536]
call->offset went from 65132 to 0. Fix this by making call->offset an
unsigned int.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Salter [Fri, 16 Mar 2012 13:27:57 +0000 (09:27 -0400)]
C6X: remove dead code from entry.S
The ENDPROC() on sys_fadvise64_c6x() in arch/c6x/kernel/entry.S is
outside of the conditional block with the matching ENTRY() macro. This
leads a newer (v2.22 vs. v2.20) assembler to complain:
/tmp/ccGZBaPT.s: Assembler messages:
/tmp/ccGZBaPT.s: Error: .size expression for sys_fadvise64_c6x does not evaluate to a constant
The conditional block became dead code when c6x switched to generic
unistd.h and should be removed along with the offending ENDPROC().
Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Eric Dumazet [Wed, 14 Mar 2012 09:21:44 +0000 (09:21 +0000)]
wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY,
since caller is going to reuse freed skb.
In fact netif_tx_stop_queue() / netif_stop_queue() is needed before
returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop.
In case of memory allocation error, only safe way is to drop the packet
and return NETDEV_TX_OK
Also increments tx_dropped counter
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 14 Mar 2012 08:53:34 +0000 (08:53 +0000)]
net/hyperv: fix erroneous NETDEV_TX_BUSY use
A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY,
since caller is going to reuse freed skb.
This is mostly a revert of commit
bf769375c (staging: hv: fix the return
status of netvsc_start_xmit())
In fact netif_tx_stop_queue() / netif_stop_queue() is needed before
returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop.
In case of memory allocation error, only safe way is to drop the packet
and return NETDEV_TX_OK
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 14 Mar 2012 06:56:25 +0000 (06:56 +0000)]
net/usbnet: reserve headroom on rx skbs
network drivers should reserve some headroom on incoming skbs so that we
dont need expensive reallocations, eg forwarding packets in tunnels.
This NET_SKB_PAD padding is done in various helpers, like
__netdev_alloc_skb_ip_align() in this patch, combining NET_SKB_PAD and
NET_IP_ALIGN magic.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Thu, 15 Mar 2012 14:08:29 +0000 (14:08 +0000)]
bnx2x: fix memory leak in bnx2x_init_firmware()
When cycling the interface down and up, bnx2x_init_firmware() knows that
the firmware is already loaded, but nevertheless it allocates certain
arrays anew (init_data, init_ops, init_ops_offsets, iro_arr). The old
arrays are leaked.
Fix the leaks by returning early if the firmware was already loaded.
Because if the firmware is loaded, so are the arrays.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Thu, 15 Mar 2012 14:08:28 +0000 (14:08 +0000)]
bnx2x: fix a crash on corrupt firmware file
If the requested firmware is deemed corrupt and then released, reset the
pointer to NULL in order to avoid double-freeing it in
bnx2x_release_firmware() or dereferencing it in bnx2x_init_firmware().
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 13 Mar 2012 18:04:25 +0000 (18:04 +0000)]
sch_sfq: revert dont put new flow at the end of flows
This reverts commit
d47a0ac7b6 (sch_sfq: dont put new flow at the end of
flows)
As Jesper found out, patch sounded great but has bad side effects.
In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.
It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.
A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.
Reported-by: Jesper Dangaard Brouer <jdb@comx.dk>
Cc: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>