Linus Torvalds [Fri, 14 Jan 2011 17:08:29 +0000 (09:08 -0800)]
Merge branch 'vfs-scale-working' of git://git./linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
kernel: fix hlist_bl again
cgroups: Fix a lockdep warning at cgroup removal
fs: namei fix ->put_link on wrong inode in do_filp_open
Linus Torvalds [Fri, 14 Jan 2011 17:08:00 +0000 (09:08 -0800)]
Merge branch 'for-next' of git://git./linux/kernel/git/sameo/mfd-2.6
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (59 commits)
mfd: ab8500-core chip version cut 2.0 support
mfd: Flag WM831x /IRQ as a wake source
mfd: Convert WM831x away from legacy I2C PM operations
regulator: Support MAX8998/LP3974 DVS-GPIO
mfd: Support LP3974 RTC
i2c: Convert SCx200 driver from using raw PCI to platform device
x86: OLPC: convert olpc-xo1 driver from pci device to platform device
mfd: MAX8998/LP3974 hibernation support
mfd/ab8500: remove spi support
mfd: Remove ARCH_U8500 dependency from AB8500
misc: Make AB8500_PWM driver depend on U8500 due to PWM breakage
mfd: Add __devexit annotation for vx855_remove
mfd: twl6030 irq_data conversion.
gpio: Fix cs5535 printk warnings
misc: Fix cs5535 printk warnings
mfd: Convert Wolfson MFD drivers to use irq_data accessor function
mfd: Convert TWL4030 to new irq_ APIs
mfd: Convert tps6586x driver to new irq_ API
mfd: Convert tc6393xb driver to new irq_ APIs
mfd: Convert t7166xb driver to new irq_ API
...
Linus Torvalds [Fri, 14 Jan 2011 16:47:26 +0000 (08:47 -0800)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] MAINTAINERS: Update zcrypt driver entry
[S390] Randomize PIEs
[S390] Randomise the brk region
[S390] Add is_32bit_task() helper function
[S390] Randomize lower bits of stack address
[S390] Randomize mmap start address
[S390] Rearrange mmap.c
[S390] Enable flexible mmap layout for 64 bit processes
[S390] vdso: dont map at mmap_base
[S390] reduce miminum gap between stack and mmap_base
[S390] mmap: consider stack address randomization
[S390] Update default configuration
[S390] cio: path_event overindication after resume
Lennert Buytenhek [Fri, 14 Jan 2011 08:44:19 +0000 (09:44 +0100)]
gpio: timbgpio: Fix up irq_data conversion breakage.
Commit
a1f5f22adc3206c47e70652c12671666c65b579f ("gpio: timbgpio:
irq_data conversion") was slightly too enthusiastic in converting
timbgpio_irq() over to take an irq_data * argument instead of an
unsigned int irq argument, as it is a flow handler, which still take
IRQ numbers for now. (And on top of that, it was using the wrong
accessors.)
This fixes it up, and seems to build without warnings.
Signed-off-by: Lennert Buytenhek <buytenh@secretlab.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Richard Röjfors <richard.rojfors@mocean-labs.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Fri, 14 Jan 2011 05:31:45 +0000 (05:31 +0000)]
cgroup_fs: fix cgroup use of simple_lookup()
cgroup can't use simple_lookup(), since that'd override its desired ->d_op.
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfram Sang [Fri, 14 Jan 2011 08:34:29 +0000 (09:34 +0100)]
include/gpio.h: remove remaining __must_check-annotiations
Commit
5f829e405ec4e96f711165a4a7b55c271d4363e2 (gpiolib: add missing functions
to generic fallback) also introduced two.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Greg KH <gregkh@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Fri, 14 Jan 2011 05:56:53 +0000 (14:56 +0900)]
Revert update for dirty_ratio for memcg.
The flags added by commit
db16d5ec1f87f17511599bc77857dd1662b5a22f
has no user now. We believe we'll use it soon but considering
patch reviewing, the change itself should be folded into incoming
set of "dirty ratio for memcg" patches.
So, it's better to drop this change from current mainline tree.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Fri, 14 Jan 2011 06:04:21 +0000 (15:04 +0900)]
revert documentaion update for memcg's dirty ratio.
Subjct: Revert memory cgroup dirty_ratio Documentation.
The commit
ece72400c2a27a3d726cb0854449f991d9fcd2da adds documentation
for memcg's dirty ratio. But the function is not implemented yet.
Remove the documentation for avoiding confusing users.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Fri, 14 Jan 2011 13:12:45 +0000 (13:12 +0000)]
kernel: fix hlist_bl again
__d_rehash is dereferencing an almost-NULL pointer on my ARM926.
CONFIG_SMP=n and CONFIG_DEBUG_SPINLOCK=y.
The faulting instruction is: strne r3, [r2, #4]
and as can be seen from the register dump below, r2 is 0x00000001, hence
the faulting 0x00000005 address.
__d_rehash is essentially:
spin_lock_bucket(b);
entry->d_flags &= ~DCACHE_UNHASHED;
hlist_bl_add_head_rcu(&entry->d_hash, &b->head);
spin_unlock_bucket(b);
which is:
bit_spin_lock(0, (unsigned long *)&b->head.first);
entry->d_flags &= ~DCACHE_UNHASHED;
hlist_bl_add_head_rcu(&entry->d_hash, &b->head);
__bit_spin_unlock(0, (unsigned long *)&b->head.first);
bit_spin_lock(0, ptr) sets bit 0 of *ptr, in this case b->head.first if
CONFIG_SMP or CONFIG_DEBUG_SPINLOCK is set:
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
while (unlikely(test_and_set_bit_lock(bitnum, addr))) {
while (test_bit(bitnum, addr)) {
preempt_enable();
cpu_relax();
preempt_disable();
}
}
#endif
So, b->head.first starts off NULL, and becomes a non-NULL (address 1).
hlist_bl_add_head_rcu() does this:
static inline void hlist_bl_add_head_rcu(struct hlist_bl_node *n,
struct hlist_bl_head *h)
{
first = hlist_bl_first(h);
n->next = first;
if (first)
first->pprev = &n->next;
It is the store to first->pprev which is faulting.
hlist_bl_first():
static inline struct hlist_bl_node *hlist_bl_first(struct hlist_bl_head *h)
{
return (struct hlist_bl_node *)
((unsigned long)h->first & ~LIST_BL_LOCKMASK);
}
but:
#if defined(CONFIG_SMP)
#define LIST_BL_LOCKMASK 1UL
#else
#define LIST_BL_LOCKMASK 0UL
#endif
So, we have one piece of code which sets bit 0 of addresses, and another
bit of code which doesn't clear it before dereferencing the pointer if
!CONFIG_SMP && CONFIG_DEBUG_SPINLOCK. With the patch below, I can again
sucessfully boot the kernel on my Versatile PB/926 platform.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Mattias Wallin [Tue, 7 Dec 2010 10:20:47 +0000 (11:20 +0100)]
mfd: ab8500-core chip version cut 2.0 support
This patch adds support for chip version 2.0 or cut 2.0.
One new interrupt latch register - latch 12 - is introduced.
Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Wed, 5 Jan 2011 17:56:01 +0000 (17:56 +0000)]
mfd: Flag WM831x /IRQ as a wake source
The WM831x can generate wake events, some unconditionally, so flag
the primary IRQ as a wake source in order to help the CPU treat the
/IRQ signal appropriately.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Wed, 5 Jan 2011 15:12:39 +0000 (15:12 +0000)]
mfd: Convert WM831x away from legacy I2C PM operations
Since the legacy bus PM operations are deprecated move the suspend
method over to dev_pm_ops.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
MyungJoo Ham [Tue, 11 Jan 2011 11:20:05 +0000 (12:20 +0100)]
regulator: Support MAX8998/LP3974 DVS-GPIO
The previous driver did not support BUCK1-DVS3, BUCK1-DVS4, and
BUCK2-DVS2 modes. This patch adds such modes and an option to block
setting buck1/2 voltages out of the preset values.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
MyungJoo Ham [Tue, 4 Jan 2011 05:17:39 +0000 (14:17 +0900)]
mfd: Support LP3974 RTC
The first releases of LP3974 have a large delay in RTC registers,
which requires 2 seconds of delay after writing to a rtc register
(recommended by National Semiconductor's engineers)
before reading it.
If "rtc_delay" field of the platform data is true, the rtc driver
assumes that such delays are required. Although we have not seen
LP3974s without requiring such delays, we assume that such LP3974s
will be released soon (or they have done so already) and they are
supported by "lp3974" without setting "rtc_delay" at the platform
data.
This patch adds delays with msleep when writing values to RTC registers
if the platform data has rtc_delay set.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Andres Salomon [Fri, 31 Dec 2010 04:27:33 +0000 (20:27 -0800)]
i2c: Convert SCx200 driver from using raw PCI to platform device
The SCx200 ACB driver supports ISA hardware as well as PCI. The PCI
hardware is CS5535/CS5536 based, and the device that it grabs is handled by
the cs5535-mfd driver. This converts the SCx200 driver to use a
platform_driver rather than the previous PCI hackery.
The driver used to manually track the iface list (via linked list); now it
only does this for ISA devices. PCI ifaces are handled through standard
driver model lists.
It's unclear what happens in case of errors in the old ISA code; rather than
pretending the code actually cares, I've dropped the (implicit) ignorance
of return values and marked it with a comment.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Andres Salomon [Mon, 29 Nov 2010 23:45:06 +0000 (15:45 -0800)]
x86: OLPC: convert olpc-xo1 driver from pci device to platform device
The cs5535-mfd driver now takes care of the PCI BAR handling; this
means the olpc-xo1 driver shouldn't be touching the PCI device at all.
This patch uses both cs5535-acpi and cs5535-pms platform devices rather
than a single platform device because the cs5535-mfd driver may be used
by other CS5535 platform-specific drivers; OLPC doesn't get to dictate
that ACPI and PMS will always be used together.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
MyungJoo Ham [Thu, 23 Dec 2010 08:53:36 +0000 (17:53 +0900)]
mfd: MAX8998/LP3974 hibernation support
This patch makes the driver to save and restore register values
for hibernation.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Sundar Iyer [Fri, 24 Dec 2010 10:52:08 +0000 (11:52 +0100)]
mfd/ab8500: remove spi support
Since the Ab8500 v1.0, the SPI support is deprecated on the HW.
Signed-off-by: Sundar Iyer <sundar.iyer@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 13:07:49 +0000 (13:07 +0000)]
mfd: Remove ARCH_U8500 dependency from AB8500
While it is vanishingly unlikely that the device will be deployed on other
architectures removing the dependency facilitates build testing when doing
generic work on both the MFD core for the device and the subsystem drivers.
There appears to be no actual code dependency.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Mon, 20 Dec 2010 12:28:11 +0000 (12:28 +0000)]
misc: Make AB8500_PWM driver depend on U8500 due to PWM breakage
Since we don't have a PWM API every PWM driver ends up exporting its
own version and we need to limit the platforms we try to build them on
in order to avoid multiple definitions. As the AB8500 is normally a
companion chip for the U8500 CPU depend on that architecture.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Axel Lin [Mon, 20 Dec 2010 02:02:06 +0000 (10:02 +0800)]
mfd: Add __devexit annotation for vx855_remove
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Harald Welte <HaraldWelte@viatech.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Lennert Buytenhek [Mon, 13 Dec 2010 12:31:18 +0000 (13:31 +0100)]
mfd: twl6030 irq_data conversion.
Signed-off-by: Lennert Buytenhek <buytenh@secretlab.ca>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Joe Perches [Mon, 20 Dec 2010 10:28:40 +0000 (11:28 +0100)]
gpio: Fix cs5535 printk warnings
drivers/gpio/cs5535-gpio.c: In function 'cs5535_gpio_probe':
drivers/gpio/cs5535-gpio.c:269: warning: format '%x' expects type 'unsigned int', but argument 3 has type 'resource_size_t'
drivers/gpio/cs5535-gpio.c:269: warning: format '%x' expects type 'unsigned int', but argument 4 has type 'resource_size_t'
Use vsprintf extension %pR to format resource.
Original-patch-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Joe Perches [Mon, 20 Dec 2010 10:26:19 +0000 (11:26 +0100)]
misc: Fix cs5535 printk warnings
drivers/misc/cs5535-mfgpt.c: In function 'cs5535_mfgpt_probe':
drivers/misc/cs5535-mfgpt.c:320: warning: format '%x' expects type 'unsigned int', but argument 3 has type 'resource_size_t'
drivers/misc/cs5535-mfgpt.c:320: warning: format '%x' expects type 'unsigned int', but argument 4 has type 'resource_size_t'
Use vsprintf extension %pR to format resource.
Original-patch-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 13:21:21 +0000 (13:21 +0000)]
mfd: Convert Wolfson MFD drivers to use irq_data accessor function
Actually makes the code larger rathe rthan smaller but does provide some
isolation against core API changes.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sun, 12 Dec 2010 12:51:39 +0000 (12:51 +0000)]
mfd: Convert TWL4030 to new irq_ APIs
The genirq core is being updated to pass struct irq_data to irq_chip
operations. Update the TWL4030 driver to the new APIs.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sun, 12 Dec 2010 12:39:28 +0000 (12:39 +0000)]
mfd: Convert tps6586x driver to new irq_ API
The genirq core is being updated to supply struct irq_data to irq_chip
operations rather than an irq number. Update the tps6586x driver to the
new APIs.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sun, 12 Dec 2010 12:34:04 +0000 (12:34 +0000)]
mfd: Convert tc6393xb driver to new irq_ APIs
The genirq core is being update to pass struct irq_data to irq_chip rather
than an irq number to operations. Update tc6393 to the new API.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Ian Molton <ian@mnementh.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sun, 12 Dec 2010 12:18:58 +0000 (12:18 +0000)]
mfd: Convert t7166xb driver to new irq_ API
The genirq core is being updated to pass struct irq_data rather than an
irq number to irq_chip operations. Update the t7166xb driver to the new
APIs.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Ian Molton <ian@mnementh.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sun, 12 Dec 2010 12:01:08 +0000 (12:01 +0000)]
mfd: Convert SMTPE driver to new irq_ APIs
The genirq core is being updated to supply struct irq_data to irq_chip
operations rather than an irq number. Update the SMTPE driver to the new
APIs.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Rabin Vincent <rabin.vincent@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 18:31:03 +0000 (18:31 +0000)]
mfd: Convert MAX8998 driver to irq_ API
The genirq core is being updated to pass struct irq_data to interrupt
operations, update the MAX8998 driver to the new API.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 18:20:55 +0000 (18:20 +0000)]
mfd: Convert max8925 to new irq_ API
The genirq infrastructure is being converted to pass struct irq_data rather
than an irq number to irq_chip operations, update max8925 to the new APIs.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sun, 12 Dec 2010 19:08:30 +0000 (20:08 +0100)]
mfd: Convert jz4740-adc to new irq_ methods
Convert the jz4740-adc driver to use the recently introduced IRQ API
variants which are passed struct irq_data rather than an IRQ number.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 16:44:11 +0000 (16:44 +0000)]
mfd: Convert HTC I2C CPLD driver to irq_ API
The genirq core is being converted to pass a struct irq_data to interrupt
operations rather than an IRQ number.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Cory Maccarrone <darkstar6262@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 16:43:59 +0000 (16:43 +0000)]
mfd: Convert HTC EGPIO driver to irq_ API
The genirq core is being converted to pass a struct irq_data to interrupt
operations rather than an IRQ number.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Lennert Buytenhek [Mon, 13 Dec 2010 12:30:09 +0000 (13:30 +0100)]
mfd: Convert ezx-pcap to new irq_ methods
Signed-off-by: Lennert Buytenhek <buytenh@secretlab.ca>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 13:16:08 +0000 (13:16 +0000)]
mfd: Convert AB8500 to new irq_ methods
The genirq core is being converted to supply struct irq_data to chips
rather than the interrupt number.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 13:14:12 +0000 (13:14 +0000)]
mfd: Convert AB3500 to new irq_ APIs
The genirq core is being updated to pass struct irq_data rather than irq
numbers into chip drivers. As part of the update assignments to NULL for
unused operations are removed, these are not needed and the genirq docs
should be good enough.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 13:08:57 +0000 (13:08 +0000)]
mfd: Convert ASIC3 to new irq_ APIs
The interrupt controller APIs are being updated to pass a struct irq_data
rather than the interrupt number.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Ian Molton <ian@mnementh.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 12:31:31 +0000 (12:31 +0000)]
mfd: Convert 88PM860x driver to new irq_ APIs
The interrupt controller APIs are being updated to pass a struct irq_data
rather than the interrupt number.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 16:44:49 +0000 (16:44 +0000)]
mfd: Staticise internal functions in HTC I2CCPLD driver
Most of these are GPIO operations, though a couple are just internal only
functions.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Cory Maccarrone <darkstar6262@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Mon, 13 Dec 2010 14:06:47 +0000 (14:06 +0000)]
mfd: Use NULL to initialise NULL pointers in ab8500-debugfs
Partly for coding style reasons, but mostly because sparse warns on it.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 13:00:34 +0000 (13:00 +0000)]
mfd: Correct ASIC3 IRQ_OWM resource setup
We should specify both a start and an end for the IRQ range rather than
initialise the start twice.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Sat, 11 Dec 2010 12:59:35 +0000 (12:59 +0000)]
mfd: Staticise unexported symbols in ASIC3
There's no use of either of these outside of the driver so they can be
declared static.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Stephen Warren [Thu, 9 Dec 2010 17:30:11 +0000 (10:30 -0700)]
mfd: Remove tps6586x device ID check
... and convert it to a dev_info print at probe time.
There are many variants of this chip with different values of VERSIONCRC.
The set of values is large, and not useful to enumerate. All are SW
compatible. The difference lies in default settings of the various power
rails, and other similar differences. The driver, or clients of the
driver, shouldn't be affected by this, since all rails should be
programmed into the desired state in all cases for correct operation.
Derived-from-code-by: Andrew Chew <achew@nvidia.com>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Bryan Wu [Wed, 8 Dec 2010 02:42:04 +0000 (10:42 +0800)]
mfd: Fix twl_probe section mismatch warning in mfd/twl-core.c
Fix the following section mismatch warning when building
omap2plus_defconfig:
WARNING: vmlinux.o(.data+0x47d7c): Section mismatch in reference
from the variable twl_driver to the function .init.text:twl_probe()
Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mattias Wallin [Thu, 2 Dec 2010 14:40:31 +0000 (15:40 +0100)]
mfd: ab8500-core ioresources irq for subdrivers added
This patch adds the ioresources used by subdrivers to
retrieve their interrupt.
Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mattias Wallin [Thu, 2 Dec 2010 14:10:27 +0000 (15:10 +0100)]
mfd: ab8500-core wake up from suspend
This patch makes the system wake up from suspend when an
ab8500 interrupt occur. This can for example be USB cable
insert or an RTC alarm.
Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mattias Wallin [Thu, 2 Dec 2010 14:09:36 +0000 (15:09 +0100)]
mfd: Export ab8500 chip id to sysfs
This patch adds a file into sysfs for reading out chip id.
It has been requested for modem silent reboot.
Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Ludovic Barre <ludovic.barre@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mattias Wallin [Thu, 2 Dec 2010 14:08:32 +0000 (15:08 +0100)]
mfd: ab8500-core improved error handling in get_chip_id
We check for dev before dereferencing it.
Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Andres Salomon [Thu, 2 Dec 2010 03:55:10 +0000 (19:55 -0800)]
gpio/misc: Add MODULE_ALIAS entries for CS5535 functions
This adds MODULE_ALIAS entries to the various cs5535 subdevice modules; this
allows the modules to automatically be loaded when cs5535-mfd loads.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Andres Salomon [Sat, 23 Oct 2010 07:41:14 +0000 (00:41 -0700)]
misc: Convert cs5535-mfgpt from pci device to platform device
The cs5535-mfd driver now takes care of the PCI BAR handling; this
simplifies the mfgpt driver a bunch.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Andres Salomon [Sat, 23 Oct 2010 07:41:09 +0000 (00:41 -0700)]
gpio: Convert cs5535 from pci device to platform device
The cs5535-mfd driver now takes care of the PCI BAR handling; this
simplifies the gpio driver a lot.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Andres Salomon [Tue, 30 Nov 2010 21:54:39 +0000 (13:54 -0800)]
mfd: Fix cs5535 warning on x86-64
ARRAY_SIZE() returns size_t; use %zu instead of %d so that we don't
get warnings on x86-64.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Fri, 26 Nov 2010 17:19:35 +0000 (17:19 +0000)]
mfd: Implement runtime PM for WM8994 core driver
Allow the WM8994 to completely power off, including disabling the LDOs
if they are software controlled, when it goes idle. The CODEC subdevice
controls activity for the MFD as a whole.
If the GPIOs need to be used while the device is active runtime PM
should be disabled for the device by machine specific code.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Fri, 26 Nov 2010 17:19:34 +0000 (17:19 +0000)]
mfd: Provide pm_runtime_no_callbacks flag in cell data
Allow MFD cells to have pm_runtime_no_callbacks() called on them during
registration. This causes the runtime PM framework to ignore them,
allowing use of runtime PM to suspend the device as a whole even if
not all drivers for the MFD can usefully implement runtime PM. For
example, RTCs are likely to run continuously regardless of the power
state of the system.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mattias Wallin [Fri, 26 Nov 2010 12:06:39 +0000 (13:06 +0100)]
mfd: Fix ab8500-debug indentation errors
Replace spaces with proper tabs.
Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Wed, 24 Nov 2010 18:01:44 +0000 (18:01 +0000)]
mfd: Convert WM8994 to new irq_ interrupt methods
Kernel 2.6.37 adds new interrupt methods which take a struct irq_data
rather than an irq number. Convert over to these as they will become
mandatory in future.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Wed, 24 Nov 2010 18:01:43 +0000 (18:01 +0000)]
mfd: Convert WM835x to new irq_ interrupt methods
Kernel 2.6.37 adds new interrupt methods which take a struct irq_data
rather than an irq number. Convert over to these as they will become
mandatory in future.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Wed, 24 Nov 2010 18:01:42 +0000 (18:01 +0000)]
mfd: Convert WM831x to new irq_ interrupt methods
Kernel 2.6.37 adds new interrupt methods which take a struct irq_data
rather than an irq number. Convert over to these as they will become
mandatory in future.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Wed, 24 Nov 2010 18:01:41 +0000 (18:01 +0000)]
mfd: Add WM8326 support
The WM8326 is a high performance variant of the WM832x series with
no software visible differences.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mark Brown [Wed, 24 Nov 2010 18:01:40 +0000 (18:01 +0000)]
mfd: Simplify WM832x subdevice instantiation
All the current WM832x devices have the same set of subdevices so can
just use multiple case statements with a single body.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Joe Perches [Fri, 12 Nov 2010 21:37:56 +0000 (13:37 -0800)]
mfd: Use printf extension %pR for struct resource
Using %pR standardizes the struct resource output.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Andres Salomon [Fri, 26 Nov 2010 10:52:35 +0000 (11:52 +0100)]
mfd: Add cs5535-mfd driver for AMD Geode's CS5535/CS5536 support
Add an MFD driver to handle the ISA device on CS5535 and CS5536
southbridges. This ISA bridge is actually multiple devices: GPIOs,
MFGPTs, etc.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Uwe Kleine-König [Thu, 11 Nov 2010 15:47:50 +0000 (16:47 +0100)]
mfd: Don't open-code mc13xxx_unlock
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Axel Lin [Wed, 10 Nov 2010 07:49:41 +0000 (15:49 +0800)]
mfd: Include <linux/gpio.h> instead of <asm/gpio.h>
As warned by checkpatch.pl, use #include <linux/gpio.h> instead
of <asm/gpio.h>.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Axel Lin [Wed, 10 Nov 2010 07:47:51 +0000 (15:47 +0800)]
mfd: Include <linux/io.h> instead of <asm/io.h>
As warned by checkpatch.pl, use #include <linux/io.h> instead of <asm/io.h>
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Joe Perches [Sat, 30 Oct 2010 21:08:32 +0000 (14:08 -0700)]
mfd: Update WARN uses
Remove KERN_<level>.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Li Zefan [Fri, 14 Jan 2011 03:34:34 +0000 (11:34 +0800)]
cgroups: Fix a lockdep warning at cgroup removal
Commit
2fd6b7f5 ("fs: dcache scale subdirs") forgot to annotate a dentry
lock, which caused a lockdep warning.
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Nick Piggin [Fri, 14 Jan 2011 08:42:43 +0000 (08:42 +0000)]
fs: namei fix ->put_link on wrong inode in do_filp_open
J. R. Okajima noticed that ->put_link is being attempted on the
wrong inode, and suggested the way to fix it. I changed it a bit
according to Al's suggestion to keep an explicit link path around.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Linus Torvalds [Fri, 14 Jan 2011 04:15:35 +0000 (20:15 -0800)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (59 commits)
ACPI / PM: Fix build problems for !CONFIG_ACPI related to NVS rework
ACPI: fix resource check message
ACPI / Battery: Update information on info notification and resume
ACPI: Drop device flag wake_capable
ACPI: Always check if _PRW is present before trying to evaluate it
ACPI / PM: Check status of power resources under mutexes
ACPI / PM: Rename acpi_power_off_device()
ACPI / PM: Drop acpi_power_nocheck
ACPI / PM: Drop acpi_bus_get_power()
Platform / x86: Make fujitsu_laptop use acpi_bus_update_power()
ACPI / Fan: Rework the handling of power resources
ACPI / PM: Register power resource devices as soon as they are needed
ACPI / PM: Register acpi_power_driver early
ACPI / PM: Add function for updating device power state consistently
ACPI / PM: Add function for device power state initialization
ACPI / PM: Introduce __acpi_bus_get_power()
ACPI / PM: Introduce function for refcounting device power resources
ACPI / PM: Add functions for manipulating lists of power resources
ACPI / PM: Prevent acpi_power_get_inferred_state() from making changes
ACPICA: Update version to
20101209
...
Linus Torvalds [Fri, 14 Jan 2011 04:15:18 +0000 (20:15 -0800)]
Merge branch 'idle-release' of git://git./linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
intel_idle: open broadcast clock event
cpuidle: CPUIDLE_FLAG_CHECK_BM is omap3_idle specific
cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idle
cpuidle: delete unused CPUIDLE_FLAG_SHALLOW, BALANCED, DEEP definitions
SH, cpuidle: delete use of NOP CPUIDLE_FLAGS_SHALLOW
cpuidle: delete NOP CPUIDLE_FLAG_POLL
ACPI: processor_idle: delete use of NOP CPUIDLE_FLAGs
cpuidle: Rename X86 specific idle poll state[0] from C0 to POLL
ACPI, intel_idle: Cleanup idle= internal variables
cpuidle: Make cpuidle_enable_device() call poll_idle_init()
intel_idle: update Sandy Bridge core C-state residency targets
Linus Torvalds [Fri, 14 Jan 2011 04:15:02 +0000 (20:15 -0800)]
Merge branch 'sfi-release' of git://git./linux/kernel/git/lenb/linux-sfi-2.6
* 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6:
SFI: use ioremap_cache() instead of ioremap()
Linus Torvalds [Fri, 14 Jan 2011 04:14:13 +0000 (20:14 -0800)]
Merge branch 'vfs-scale-working' of git://git./linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
fs: fix do_last error case when need_reval_dot
nfs: add missing rcu-walk check
fs: hlist UP debug fixup
fs: fix dropping of rcu-walk from force_reval_path
fs: force_reval_path drop rcu-walk before d_invalidate
fs: small rcu-walk documentation fixes
Fixed up trivial conflicts in Documentation/filesystems/porting
J. R. Okajima [Fri, 14 Jan 2011 03:56:04 +0000 (03:56 +0000)]
fs: fix do_last error case when need_reval_dot
When open(2) without O_DIRECTORY opens an existing dir, it should return
EISDIR. In do_last(), the variable 'error' is initialized EISDIR, but it
is changed by d_revalidate() which returns any positive to represent
'the target dir is valid.'
Should we keep and return the initialized 'error' in this case.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Nick Piggin [Fri, 14 Jan 2011 02:48:39 +0000 (02:48 +0000)]
nfs: add missing rcu-walk check
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Linus Torvalds [Fri, 14 Jan 2011 02:46:48 +0000 (18:46 -0800)]
Merge branch 'stable/gntdev' of git://git./linux/kernel/git/konrad/xen
* 'stable/gntdev' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/p2m: Fix module linking error.
xen p2m: clear the old pte when adding a page to m2p_override
xen gntdev: use gnttab_map_refs and gnttab_unmap_refs
xen: introduce gnttab_map_refs and gnttab_unmap_refs
xen p2m: transparently change the p2m mappings in the m2p override
xen/gntdev: Fix circular locking dependency
xen/gntdev: stop using "token" argument
xen: gntdev: move use of GNTMAP_contains_pte next to the map_op
xen: add m2p override mechanism
xen: move p2m handling to separate file
xen/gntdev: add VM_PFNMAP to vma
xen/gntdev: allow usermode to map granted pages
xen: define gnttab_set_map_op/unmap_op
Fix up trivial conflict in drivers/xen/Kconfig
Linus Torvalds [Fri, 14 Jan 2011 02:44:52 +0000 (18:44 -0800)]
Merge branch 'stable/platform-pci-fixes' of git://git./linux/kernel/git/konrad/xen
* 'stable/platform-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen-platform: Fix compile errors if CONFIG_PCI is not enabled.
xen: rename platform-pci module to xen-platform-pci.
xen-platform: use PCI interfaces to request IO and MEM resources.
Nick Piggin [Fri, 14 Jan 2011 02:36:43 +0000 (02:36 +0000)]
fs: hlist UP debug fixup
Po-Yu Chuang <ratbert.chuang@gmail.com> noticed that hlist_bl_set_first could
crash on a UP system when LIST_BL_LOCKMASK is 0, because
LIST_BL_BUG_ON(!((unsigned long)h->first & LIST_BL_LOCKMASK));
always evaulates to true.
Fix the expression, and also avoid a dependency between bit spinlock
implementation and list bl code (list code shouldn't know anything
except that bit 0 is set when adding and removing elements). Eventually
if a good use case comes up, we might use this list to store 1 or more
arbitrary bits of data, so it really shouldn't be tied to locking either,
but for now they are helpful for debugging.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Nick Piggin [Fri, 14 Jan 2011 02:36:19 +0000 (02:36 +0000)]
fs: fix dropping of rcu-walk from force_reval_path
As J. R. Okajima noted, force_reval_path passes in the same dentry to
d_revalidate as the one in the nameidata structure (other callers pass in a
child), so the locking breaks. This can oops with a chrooted nfs mount, for
example. Similarly there can be other problems with revalidating a dentry
which is already in nameidata of the path walk.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Nick Piggin [Fri, 14 Jan 2011 02:35:53 +0000 (02:35 +0000)]
fs: force_reval_path drop rcu-walk before d_invalidate
d_revalidate can return in rcu-walk mode even when it returns 0. We can't just
call any old dcache function on rcu-walk dentry (the dentry is unstable, so
even through d_lock can safely be taken, the result may no longer be what we
expect -- careful re-checks would be required). So just drop rcu in this case.
(I missed this conversion when switching to the rcu-walk convention that Linus
suggested)
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Nick Piggin [Fri, 14 Jan 2011 02:26:53 +0000 (02:26 +0000)]
fs: small rcu-walk documentation fixes
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Daisuke Nishimura [Thu, 13 Jan 2011 23:47:43 +0000 (15:47 -0800)]
memcg: fix memory migration of shmem swapcache
In the current implementation mem_cgroup_end_migration() decides whether
the page migration has succeeded or not by checking "oldpage->mapping".
But if we are tring to migrate a shmem swapcache, the page->mapping of it
is NULL from the begining, so the check would be invalid. As a result,
mem_cgroup_end_migration() assumes the migration has succeeded even if
it's not, so "newpage" would be freed while it's not uncharged.
This patch fixes it by passing mem_cgroup_end_migration() the result of
the page migration.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Juhl [Thu, 13 Jan 2011 23:47:42 +0000 (15:47 -0800)]
memcg: use [kv]zalloc[_node] rather than [kv]malloc+memset
In mem_cgroup_alloc() we currently do either kmalloc() or vmalloc() then
followed by memset() to zero the memory. This can be more efficiently
achieved by using kzalloc() and vzalloc(). There's also one situation
where we can use kzalloc_node() - this is what's new in this version of
the patch.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daisuke Nishimura [Thu, 13 Jan 2011 23:47:41 +0000 (15:47 -0800)]
memcg: fix deadlock between cpuset and memcg
Commit
b1dd693e ("memcg: avoid deadlock between move charge and
try_charge()") can cause another deadlock about mmap_sem on task migration
if cpuset and memcg are mounted onto the same mount point.
After the commit, cgroup_attach_task() has sequence like:
cgroup_attach_task()
ss->can_attach()
cpuset_can_attach()
mem_cgroup_can_attach()
down_read(&mmap_sem) (1)
ss->attach()
cpuset_attach()
mpol_rebind_mm()
down_write(&mmap_sem) (2)
up_write(&mmap_sem)
cpuset_migrate_mm()
do_migrate_pages()
down_read(&mmap_sem)
up_read(&mmap_sem)
mem_cgroup_move_task()
mem_cgroup_clear_mc()
up_read(&mmap_sem)
We can cause deadlock at (2) because we've already aquire the mmap_sem at (1).
But the commit itself is necessary to fix deadlocks which have existed
before the commit like:
Ex.1)
move charge | try charge
--------------------------------------+------------------------------
mem_cgroup_can_attach() | down_write(&mmap_sem)
mc.moving_task = current | ..
mem_cgroup_precharge_mc() | __mem_cgroup_try_charge()
mem_cgroup_count_precharge() | prepare_to_wait()
down_read(&mmap_sem) | if (mc.moving_task)
-> cannot aquire the lock | -> true
| schedule()
| -> move charge should wake it up
Ex.2)
move charge | try charge
--------------------------------------+------------------------------
mem_cgroup_can_attach() |
mc.moving_task = current |
mem_cgroup_precharge_mc() |
mem_cgroup_count_precharge() |
down_read(&mmap_sem) |
.. |
up_read(&mmap_sem) |
| down_write(&mmap_sem)
mem_cgroup_move_task() | ..
mem_cgroup_move_charge() | __mem_cgroup_try_charge()
down_read(&mmap_sem) | prepare_to_wait()
-> cannot aquire the lock | if (mc.moving_task)
| -> true
| schedule()
| -> move charge should wake it up
This patch fixes all of these problems by:
1. revert the commit.
2. To fix the Ex.1, we set mc.moving_task after mem_cgroup_count_precharge()
has released the mmap_sem.
3. To fix the Ex.2, we use down_read_trylock() instead of down_read() in
mem_cgroup_move_charge() and, if it has failed to aquire the lock, cancel
all extra charges, wake up all waiters, and retry trylock.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reported-by: Ben Blum <bblum@andrew.cmu.edu>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Paul Menage <menage@google.com>
Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 13 Jan 2011 23:47:40 +0000 (15:47 -0800)]
memcg: remove unnecessary return from void-returning mem_cgroup_del_lru_list()
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Thu, 13 Jan 2011 23:47:39 +0000 (15:47 -0800)]
memcg: fix unit mismatch in memcg oom limit calculation
Adding the number of swap pages to the byte limit of a memory control
group makes no sense. Convert the pages to bytes before adding them.
The only user of this code is the OOM killer, and the way it is used means
that the error results in a higher OOM badness value. Since the cgroup
limit is the same for all tasks in the cgroup, the error should have no
practical impact at the moment.
But let's not wait for future or changing users to trip over it.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 13 Jan 2011 23:47:38 +0000 (15:47 -0800)]
memcg: add lock to synchronize page accounting and migration
Introduce a new bit spin lock, PCG_MOVE_LOCK, to synchronize the page
accounting and migration code. This reworks the locking scheme of
_update_stat() and _move_account() by adding new lock bit PCG_MOVE_LOCK,
which is always taken under IRQ disable.
1. If pages are being migrated from a memcg, then updates to that
memcg page statistics are protected by grabbing PCG_MOVE_LOCK using
move_lock_page_cgroup(). In an upcoming commit, memcg dirty page
accounting will be updating memcg page accounting (specifically: num
writeback pages) from IRQ context (softirq). Avoid a deadlocking
nested spin lock attempt by disabling irq on the local processor when
grabbing the PCG_MOVE_LOCK.
2. lock for update_page_stat is used only for avoiding race with
move_account(). So, IRQ awareness of lock_page_cgroup() itself is not
a problem. The problem is between mem_cgroup_update_page_stat() and
mem_cgroup_move_account_page().
Trade-off:
* Changing lock_page_cgroup() to always disable IRQ (or
local_bh) has some impacts on performance and I think
it's bad to disable IRQ when it's not necessary.
* adding a new lock makes move_account() slower. Score is
here.
Performance Impact: moving a 8G anon process.
Before:
real 0m0.792s
user 0m0.000s
sys 0m0.780s
After:
real 0m0.854s
user 0m0.000s
sys 0m0.842s
This score is bad but planned patches for optimization can reduce
this impact.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Andrea Righi <arighi@develer.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Thelen [Thu, 13 Jan 2011 23:47:37 +0000 (15:47 -0800)]
memcg: create extensible page stat update routines
Replace usage of the mem_cgroup_update_file_mapped() memcg
statistic update routine with two new routines:
* mem_cgroup_inc_page_stat()
* mem_cgroup_dec_page_stat()
As before, only the file_mapped statistic is managed. However, these more
general interfaces allow for new statistics to be more easily added. New
statistics are added with memcg dirty page accounting.
Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrea Righi <arighi@develer.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Thelen [Thu, 13 Jan 2011 23:47:36 +0000 (15:47 -0800)]
memcg: document cgroup dirty memory interfaces
Document cgroup dirty memory interfaces and statistics.
[akpm@linux-foundation.org: fix use_hierarchy description]
Signed-off-by: Andrea Righi <arighi@develer.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Thelen [Thu, 13 Jan 2011 23:47:35 +0000 (15:47 -0800)]
memcg: add page_cgroup flags for dirty page tracking
This patchset provides the ability for each cgroup to have independent
dirty page limits.
Limiting dirty memory is like fixing the max amount of dirty (hard to
reclaim) page cache used by a cgroup. So, in case of multiple cgroup
writers, they will not be able to consume more than their designated share
of dirty pages and will be forced to perform write-out if they cross that
limit.
The patches are based on a series proposed by Andrea Righi in Mar 2010.
Overview:
- Add page_cgroup flags to record when pages are dirty, in writeback, or nfs
unstable.
- Extend mem_cgroup to record the total number of pages in each of the
interesting dirty states (dirty, writeback, unstable_nfs).
- Add dirty parameters similar to the system-wide /proc/sys/vm/dirty_*
limits to mem_cgroup. The mem_cgroup dirty parameters are accessible
via cgroupfs control files.
- Consider both system and per-memcg dirty limits in page writeback when
deciding to queue background writeback or block for foreground writeback.
Known shortcomings:
- When a cgroup dirty limit is exceeded, then bdi writeback is employed to
writeback dirty inodes. Bdi writeback considers inodes from any cgroup, not
just inodes contributing dirty pages to the cgroup exceeding its limit.
- When memory.use_hierarchy is set, then dirty limits are disabled. This is a
implementation detail. An enhanced implementation is needed to check the
chain of parents to ensure that no dirty limit is exceeded.
Performance data:
- A page fault microbenchmark workload was used to measure performance, which
can be called in read or write mode:
f = open(foo. $cpu)
truncate(f, 4096)
alarm(60)
while (1) {
p = mmap(f, 4096)
if (write)
*p = 1
else
x = *p
munmap(p)
}
- The workload was called for several points in the patch series in different
modes:
- s_read is a single threaded reader
- s_write is a single threaded writer
- p_read is a 16 thread reader, each operating on a different file
- p_write is a 16 thread writer, each operating on a different file
- Measurements were collected on a 16 core non-numa system using "perf stat
--repeat 3". The -a option was used for parallel (p_*) runs.
- All numbers are page fault rate (M/sec). Higher is better.
- To compare the performance of a kernel without non-memcg compare the first and
last rows, neither has memcg configured. The first row does not include any
of these memcg patches.
- To compare the performance of using memcg dirty limits, compare the baseline
(2nd row titled "w/ memcg") with the the code and memcg enabled (2nd to last
row titled "all patches").
root_cgroup child_cgroup
s_read s_write p_read p_write s_read s_write p_read p_write
mmotm w/o memcg 0.428 0.390 0.429 0.388
mmotm w/ memcg 0.411 0.378 0.391 0.362 0.412 0.377 0.385 0.363
all patches 0.384 0.360 0.370 0.348 0.381 0.363 0.368 0.347
all patches 0.431 0.402 0.427 0.395
w/o memcg
This patch:
Add additional flags to page_cgroup to track dirty pages within a
mem_cgroup.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrea Righi <arighi@develer.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Thu, 13 Jan 2011 23:47:34 +0000 (15:47 -0800)]
mm: batch activate_page() to reduce lock contention
The zone->lru_lock is heavily contented in workload where activate_page()
is frequently used. We could do batch activate_page() to reduce the lock
contention. The batched pages will be added into zone list when the pool
is full or page reclaim is trying to drain them.
For example, in a 4 socket 64 CPU system, create a sparse file and 64
processes, processes shared map to the file. Each process read access the
whole file and then exit. The process exit will do unmap_vmas() and cause
a lot of activate_page() call. In such workload, we saw about 58% total
time reduction with below patch. Other workloads with a lot of
activate_page also benefits a lot too.
I tested some microbenchmarks:
case-anon-cow-rand-mt 0.58%
case-anon-cow-rand -3.30%
case-anon-cow-seq-mt -0.51%
case-anon-cow-seq -5.68%
case-anon-r-rand-mt 0.23%
case-anon-r-rand 0.81%
case-anon-r-seq-mt -0.71%
case-anon-r-seq -1.99%
case-anon-rx-rand-mt 2.11%
case-anon-rx-seq-mt 3.46%
case-anon-w-rand-mt -0.03%
case-anon-w-rand -0.50%
case-anon-w-seq-mt -1.08%
case-anon-w-seq -0.12%
case-anon-wx-rand-mt -5.02%
case-anon-wx-seq-mt -1.43%
case-fork 1.65%
case-fork-sleep -0.07%
case-fork-withmem 1.39%
case-hugetlb -0.59%
case-lru-file-mmap-read-mt -0.54%
case-lru-file-mmap-read 0.61%
case-lru-file-mmap-read-rand -2.24%
case-lru-file-readonce -0.64%
case-lru-file-readtwice -11.69%
case-lru-memcg -1.35%
case-mmap-pread-rand-mt 1.88%
case-mmap-pread-rand -15.26%
case-mmap-pread-seq-mt 0.89%
case-mmap-pread-seq -69.72%
case-mmap-xread-rand-mt 0.71%
case-mmap-xread-seq-mt 0.38%
The most significent are:
case-lru-file-readtwice -11.69%
case-mmap-pread-rand -15.26%
case-mmap-pread-seq -69.72%
which use activate_page a lot. others are basically variations because
each run has slightly difference.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Thu, 13 Jan 2011 23:47:33 +0000 (15:47 -0800)]
mm: simplify code of swap.c
Clean up code and remove duplicate code. Next patch will use
pagevec_lru_move_fn introduced here too.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 13 Jan 2011 23:47:32 +0000 (15:47 -0800)]
mm/page_alloc.c: don't cache `current' in a local
It's old-fashioned and unneeded.
akpm:/usr/src/25> size mm/page_alloc.o
text data bss dec hex filename
39884
1241317 18808
1300009 13d629 mm/page_alloc.o (before)
39838
1241317 18808
1299963 13d5fb mm/page_alloc.o (after)
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 13 Jan 2011 23:47:31 +0000 (15:47 -0800)]
mm: fix hugepage migration
2.6.37 added an unmap_and_move_huge_page() for memory failure recovery,
but its anon_vma handling was still based around the 2.6.35 conventions.
Update it to use page_lock_anon_vma, get_anon_vma, page_unlock_anon_vma,
drop_anon_vma in the same way as we're now changing unmap_and_move().
I don't particularly like to propose this for stable when I've not seen
its problems in practice nor tested the solution: but it's clearly out of
synch at present.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: <stable@kernel.org> [2.6.37, 2.6.36]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 13 Jan 2011 23:47:30 +0000 (15:47 -0800)]
mm: fix migration hangs on anon_vma lock
Increased usage of page migration in mmotm reveals that the anon_vma
locking in unmap_and_move() has been deficient since 2.6.36 (or even
earlier). Review at the time of
f18194275c39835cb84563500995e0d503a32d9a
("mm: fix hang on anon_vma->root->lock") missed the issue here: the
anon_vma to which we get a reference may already have been freed back to
its slab (it is in use when we check page_mapped, but that can change),
and so its anon_vma->root may be switched at any moment by reuse in
anon_vma_prepare.
Perhaps we could fix that with a get_anon_vma_unless_zero(), but let's
not: just rely on page_lock_anon_vma() to do all the hard thinking for us,
then we don't need any rcu read locking over here.
In removing the rcu_unlock label: since PageAnon is a bit in
page->mapping, it's impossible for a !page->mapping page to be anon; but
insert VM_BUG_ON in case the implementation ever changes.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: <stable@kernel.org> [2.6.37, 2.6.36]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 13 Jan 2011 23:47:29 +0000 (15:47 -0800)]
ksm: drain pagevecs to lru
It was hard to explain the page counts which were causing new LTP tests
of KSM to fail: we need to drain the per-cpu pagevecs to LRU occasionally.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: CAI Qian <caiqian@redhat.com>
Cc:Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric B Munson [Thu, 13 Jan 2011 23:47:28 +0000 (15:47 -0800)]
hugetlb: fix handling of parse errors in sysfs
When parsing changes to the huge page pool sizes made from userspace via
the sysfs interface, bogus input values are being covered up by
nr_hugepages_store_common and nr_overcommit_hugepages_store returning 0
when strict_strtoul returns an error. This can cause an infinite loop in
the nr_hugepages_store code. This patch changes the return value for
these functions to -EINVAL when strict_strtoul returns an error.
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Reported-by: CAI Qian <caiqian@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric B Munson [Thu, 13 Jan 2011 23:47:27 +0000 (15:47 -0800)]
hugetlb: do not allow pagesize >= MAX_ORDER pool adjustment
Huge pages with order >= MAX_ORDER must be allocated at boot via the
kernel command line, they cannot be allocated or freed once the kernel is
up and running. Currently we allow values to be written to the sysfs and
sysctl files controling pool size for these huge page sizes. This patch
makes the store functions for nr_hugepages and nr_overcommit_hugepages
return -EINVAL when the pool for a page size >= MAX_ORDER is changed.
[akpm@linux-foundation.org: avoid multiple return paths in nr_hugepages_store_common()]
[caiqian@redhat.com: add checking in hugetlb_overcommit_handler()]
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Reported-by: CAI Qian <caiqian@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 13 Jan 2011 23:47:26 +0000 (15:47 -0800)]
hugetlb: check the return value of string conversion in sysctl handler
proc_doulongvec_minmax may fail if the given buffer doesn't represent a
valid number. If we provide something invalid we will initialize the
resulting value (nr_overcommit_huge_pages in this case) to a random value
from the stack.
The issue was introduced by
a3d0c6aa when the default handler has been
replaced by the helper function where we do not check the return value.
Reproducer:
echo "" > /proc/sys/vm/nr_overcommit_hugepages
[akpm@linux-foundation.org: correctly propagate proc_doulongvec_minmax return code]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: CAI Qian <caiqian@redhat.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>