Paul Mundt [Fri, 4 Nov 2011 14:15:29 +0000 (23:15 +0900)]
Merge branch 'master' of git://git./linux/kernel/git/torvalds/linux into rmobile-latest
Linus Torvalds [Fri, 4 Nov 2011 04:07:58 +0000 (21:07 -0700)]
Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
cifs: Assume passwords are encoded according to iocharset (try #2)
CIFS: Fix the VFS brlock cache usage in posix locking case
[CIFS] Update cifs version to 1.76
CIFS: Remove extra mutex_unlock in cifs_lock_add_if
Linus Torvalds [Fri, 4 Nov 2011 04:05:43 +0000 (21:05 -0700)]
Merge git://git./linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
be2net: Add detect UE feature for Lancer
be2net: Prevent CQ full condition for Lancer
be2net: Fix disabling multicast promiscous mode
be2net: Fix endian issue in RX filter command
af_packet: de-inline some helper functions
MAINTAINERS: Add can-gw include to maintained files
net: Add back alignment for size for __alloc_skb
net: add missing bh_unlock_sock() calls
l2tp: fix race in l2tp_recv_dequeue()
ixgbevf: Update release version
ixgbe: DCB, return max for IEEE traffic classes
ixgbe: fix reading of the buffer returned by the firmware
ixgbe: Fix compiler warnings
ixgbe: fix smatch splat due to missing NULL check
ixgbe: fix disabling of Tx laser at probe
ixgbe: Fix link issues caused by a reset while interface is down
igb: Fix for I347AT4 PHY cable length unit detection
e100: make sure vlan support isn't advertised on old adapters
e1000e: demote a debugging WARN to a debug log message
net: fix typo in drivers/net/ethernet/xilinx/ll_temac_main.c
...
Padmanabh Ratnakar [Thu, 3 Nov 2011 01:50:08 +0000 (01:50 +0000)]
be2net: Add detect UE feature for Lancer
Add code to detect UE in case of Lancer.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Padmanabh Ratnakar [Thu, 3 Nov 2011 01:49:55 +0000 (01:49 +0000)]
be2net: Prevent CQ full condition for Lancer
Indicate to HW that the CQ is cleaned up before posting new RX buffers.
This prevents the HW to go into CQ full error condition.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Padmanabh Ratnakar [Thu, 3 Nov 2011 01:49:27 +0000 (01:49 +0000)]
be2net: Fix disabling multicast promiscous mode
If user tries to disable multicast promiscous mode, the adapter remains
in this mode as resetting the multicast promiscous mode was missing
in RX filter command. Fixed this.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Padmanabh Ratnakar [Thu, 3 Nov 2011 01:49:13 +0000 (01:49 +0000)]
be2net: Fix endian issue in RX filter command
Use cpu_to_le32() for mcast_num field in RX filter command as this
field is of type u32.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Nov 2011 01:52:51 +0000 (21:52 -0400)]
Merge git://git./linux/kernel/git/jkirsher/net
Olof Johansson [Wed, 2 Nov 2011 11:00:49 +0000 (11:00 +0000)]
af_packet: de-inline some helper functions
This popped some compiler errors due to mismatched prototypes. Just
remove most manual inlines, the compiler should be able to figure out
what makes sense to inline and not.
net/packet/af_packet.c:252: warning: 'prb_curr_blk_in_use' declared inline after being called
net/packet/af_packet.c:252: warning: previous declaration of 'prb_curr_blk_in_use' was here
net/packet/af_packet.c:258: warning: 'prb_queue_frozen' declared inline after being called
net/packet/af_packet.c:258: warning: previous declaration of 'prb_queue_frozen' was here
net/packet/af_packet.c:248: warning: 'packet_previous_frame' declared inline after being called
net/packet/af_packet.c:248: warning: previous declaration of 'packet_previous_frame' was here
net/packet/af_packet.c:251: warning: 'packet_increment_head' declared inline after being called
net/packet/af_packet.c:251: warning: previous declaration of 'packet_increment_head' was here
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Chetan Loke <loke.chetan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Hartkopp [Wed, 2 Nov 2011 10:55:13 +0000 (10:55 +0000)]
MAINTAINERS: Add can-gw include to maintained files
Commit
c1aabdf379bc2feeb0df7057ed5bad96f492133e (can-gw: add netlink based
CAN routing) added a new include file that's neither referenced by any of
the CAN maintainers.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Lindgren [Wed, 2 Nov 2011 13:40:28 +0000 (13:40 +0000)]
net: Add back alignment for size for __alloc_skb
Commit
87fb4b7b533073eeeaed0b6bf7c2328995f6c075 (net: more
accurate skb truesize) changed the alignment of size. This
can cause problems at least on some machines with NFS root:
Unhandled fault: alignment exception (0x801) at 0xc183a43a
Internal error: : 801 [#1] PREEMPT
Modules linked in:
CPU: 0 Not tainted (
3.1.0-08784-g5eeee4a #733)
pc : [<
c02fbba0>] lr : [<
c02fbb9c>] psr:
60000013
sp :
c180fef8 ip :
00000000 fp :
c181f580
r10:
00000000 r9 :
c044b28c r8 :
00000001
r7 :
c183a3a0 r6 :
c1835be0 r5 :
c183a412 r4 :
000001f2
r3 :
00000000 r2 :
00000000 r1 :
ffffffe6 r0 :
c183a43a
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
Control:
0005317f Table:
10004000 DAC:
00000017
Process swapper (pid: 1, stack limit = 0xc180e270)
Stack: (0xc180fef8 to 0xc1810000)
fee0:
00000024 00000000
ff00:
00000000 c183b9c0 c183b8e0 c044b28c c0507ccc c019dfc4 c180ff2c c0503cf8
ff20:
c180ff4c c180ff4c 00000000 c1835420 c182c740 c18349c0 c05233c0 00000000
ff40:
00000000 c00e6bb8 c180e000 00000000 c04dd82c c0507e7c c050cc18 c183b9c0
ff60:
c05233c0 00000000 00000000 c01f34f4 c0430d70 c019d364 c04dd898 c04dd898
ff80:
c04dd82c c0507e7c c180e000 00000000 c04c584c c01f4918 c04dd898 c04dd82c
ffa0:
c04ddd28 c180e000 00000000 c0008758 c181fa60 3231d82c 00000037 00000000
ffc0:
00000000 c04dd898 c04dd82c c04ddd28 00000013 00000000 00000000 00000000
ffe0:
00000000 c04b2224 00000000 c04b21a0 c001056c c001056c 00000000 00000000
Function entered at [<
c02fbba0>] from [<
c019dfc4>]
Function entered at [<
c019dfc4>] from [<
c01f34f4>]
Function entered at [<
c01f34f4>] from [<
c01f4918>]
Function entered at [<
c01f4918>] from [<
c0008758>]
Function entered at [<
c0008758>] from [<
c04b2224>]
Function entered at [<
c04b2224>] from [<
c001056c>]
Code:
e1a00005 e3a01028 ebfa7cb0 e35a0000 (
e5858028)
Here PC is at __alloc_skb and &shinfo->dataref is unaligned because
skb->end can be unaligned without this patch.
As explained by Eric Dumazet <eric.dumazet@gmail.com>, this happens
only with SLOB, and not with SLAB or SLUB:
* Eric Dumazet <eric.dumazet@gmail.com> [111102 15:56]:
>
> Your patch is absolutely needed, I completely forgot about SLOB :(
>
> since, kmalloc(386) on SLOB gives exactly ksize=386 bytes, not nearest
> power of two.
>
> [ 60.305763] malloc(size=385)->
ffff880112c11e38 ksize=386 -> nsize=2
> [ 60.305921] malloc(size=385)->
ffff88007c92ce28 ksize=386 -> nsize=2
> [ 60.306898] malloc(size=656)->
ffff88007c44ad28 ksize=656 -> nsize=272
> [ 60.325385] malloc(size=656)->
ffff88007c575868 ksize=656 -> nsize=272
> [ 60.325531] malloc(size=656)->
ffff88011c777230 ksize=656 -> nsize=272
> [ 60.325701] malloc(size=656)->
ffff880114011008 ksize=656 -> nsize=272
> [ 60.346716] malloc(size=385)->
ffff880114142008 ksize=386 -> nsize=2
> [ 60.346900] malloc(size=385)->
ffff88011c777690 ksize=386 -> nsize=2
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 2 Nov 2011 12:42:56 +0000 (12:42 +0000)]
net: add missing bh_unlock_sock() calls
Simon Kirby reported lockdep warnings and following messages :
[104661.897577] huh, entered softirq 3 NET_RX
ffffffff81613740
preempt_count
00000101, exited with
00000102?
[104661.923653] huh, entered softirq 3 NET_RX
ffffffff81613740
preempt_count
00000101, exited with
00000102?
Problem comes from commit
0e734419
(ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
If inet_csk_route_child_sock() returns NULL, we should release socket
lock before freeing it.
Another lock imbalance exists if __inet_inherit_port() returns an error
since commit
093d282321da ( tproxy: fix hash locking issue when using
port redirection in __inet_inherit_port()) a backport is also needed for
>= 2.6.37 kernels.
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Balazs Scheidler <bazsi@balabit.hu>
CC: KOVACS Krisztian <hidden@balabit.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 2 Nov 2011 22:47:44 +0000 (22:47 +0000)]
l2tp: fix race in l2tp_recv_dequeue()
Misha Labjuk reported panics occurring in l2tp_recv_dequeue()
If we release reorder_q.lock, we must not keep a dangling pointer (tmp),
since another thread could manipulate reorder_q.
Instead we must restart the scan at beginning of list.
Reported-by: Misha Labjuk <spiked.yar@gmail.com>
Tested-by: Misha Labjuk <spiked.yar@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 3 Nov 2011 20:28:14 +0000 (13:28 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (37 commits)
MIPS: O32: Provide definition of registers ta0 .. ta3.
MIPS: perf: Add Octeon support for hardware perf.
MIPS: perf: Add support for 64-bit perf counters.
MIPS: perf: Reorganize contents of perf support files.
MIPS: perf: Cleanup formatting in arch/mips/kernel/perf_event.c
MIPS: Add accessor macros for 64-bit performance counter registers.
MIPS: Add probes for more Octeon II CPUs.
MIPS: Add more CPU identifiers for Octeon II CPUs.
MIPS: XLR, XLS: Add comment for smp setup
MIPS: JZ4740: GPIO: Check correct IRQ in demux handler
MIPS: JZ4740: GPIO: Simplify IRQ demuxer
MIPS: JZ4740: Use generic irq chip
MIPS: Alchemy: remove all CONFIG_SOC_AU1??? defines
MIPS: Alchemy: kill au1xxx.h header
MIPS: Alchemy: clean DMA code of CONFIG_SOC_AU1??? defines
MIPS, IDE: Alchem, au1xxx-ide: Remove pb1200/db1200 header dep
MIPS: Alchemy: Redo PCI as platform driver
MIPS: Alchemy: more base address cleanup
MIPS: Alchemy: rewrite USB platform setup.
MIPS: Alchemy: abstract USB block control register access
...
Fix up trivial conflicts in:
arch/mips/alchemy/devboards/db1x00/platform.c
drivers/ide/Kconfig
drivers/mmc/host/au1xmmc.c
drivers/video/Kconfig
sound/mips/Kconfig
Josh Boyer [Thu, 3 Nov 2011 18:00:11 +0000 (16:00 -0200)]
edac: Only build sb_edac on 64-bit
The sb_edac driver is marginally useful on a 32-bit kernel, and
currently has 64-bit divide compile errors when building that config.
For now, make this build on only for 64-bit kernels.
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 3 Nov 2011 16:59:39 +0000 (09:59 -0700)]
Merge branch 'next' of git://github.com/kernelslacker/cpufreq
* 'next' of git://github.com/kernelslacker/cpufreq:
[CPUFREQ] db8500: support all frequencies
[CPUFREQ] db8500: remove unneeded for loop iteration over freq_table
[CPUFREQ] ARM Exynos4210 PM/Suspend compatibility with different bootloaders
[CPUFREQ] ARM: ux500: send cpufreq notification for all cpus
[CPUFREQ] e_powersaver: Allow user to lower maximum voltage
[CPUFREQ] e_powersaver: Check BIOS limit for CPU frequency
[CPUFREQ] e_powersaver: Additional checks
[CPUFREQ] exynos4210: Show list of available frequencies
Linus Torvalds [Thu, 3 Nov 2011 16:40:51 +0000 (09:40 -0700)]
Merge branch 'for-next' of git://git.infradead.org/users/sameo/mfd-2.6
* 'for-next' of git://git.infradead.org/users/sameo/mfd-2.6: (80 commits)
mfd: Fix missing abx500 header file updates
mfd: Add missing <linux/io.h> include to intel_msic
x86, mrst: add platform support for MSIC MFD driver
mfd: Expose TurnOnStatus in ab8500 sysfs
mfd: Remove support for early drop ab8500 chip
mfd: Add support for ab8500 v3.3
mfd: Add ab8500 interrupt disable hook
mfd: Convert db8500-prcmu panic() into pr_crit()
mfd: Refactor db8500-prcmu request_clock() function
mfd: Rename db8500-prcmu init function
mfd: Fix db5500-prcmu defines
mfd: db8500-prcmu voltage domain consumers additions
mfd: db8500-prcmu reset code retrieval
mfd: db8500-prcmu tweak for modem wakeup
mfd: Add db8500-pcmu watchdog accessor functions for watchdog
mfd: hwacc power state db8500-prcmu accessor
mfd: Add db8500-prcmu accessors for PLL and SGA clock
mfd: Move to the new db500 PRCMU API
mfd: Create a common interface for dbx500 PRCMU drivers
mfd: Initialize DB8500 PRCMU regs
...
Fix up trivial conflicts in
arch/arm/mach-imx/mach-mx31moboard.c
arch/arm/mach-omap2/board-omap3beagle.c
arch/arm/mach-u300/include/mach/irqs.h
drivers/mfd/wm831x-spi.c
Linus Torvalds [Thu, 3 Nov 2011 15:22:06 +0000 (08:22 -0700)]
Merge branch 'sh-latest' of git://github.com/pmundt/linux-sh
* 'sh-latest' of git://github.com/pmundt/linux-sh:
sh: Add default uImage rule for sh7757lcr
sh: modify the asm/sh_eth.h to linux/sh_eth.h in sh7757lcr
sh: userimask.c needs linux/stat.h
sh: pfc: Add GPIO IRQ support
sh: modify the asm/sh_eth.h to linux/sh_eth.h in some boards
sh: pfc: Remove unused gpio_in_use member
sh: add parameters for EHCI and RIIC in clock-sh7757.c
sh: kexec: Add PHYSICAL_START
SH: irq: Remove IRQF_DISABLED
sh: pfc: get_config_reg() shift clean up
sh: intc: Add IRQ trigger bit field check
sh: drop unused Kconfig symbol
sh: Fix implicit declaration of function numa_node_id
sh: kexec: Register crashk_res
sh: ecovec: add renesas_usbhs DMAEngine support
Linus Torvalds [Thu, 3 Nov 2011 15:05:35 +0000 (08:05 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ohad/hwspinlock
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock:
hwspinlock: add MAINTAINERS entries
hwspinlock/omap: omap_hwspinlock_remove should be __devexit
hwspinlock/u8500: add hwspinlock driver
hwspinlock/core: register a bank of hwspinlocks in a single API call
hwspinlock/core: remove stubs for register/unregister
hwspinlock/core: use a mutex to protect the radix tree
hwspinlock/core/omap: fix id issues on multiple hwspinlock devices
hwspinlock/omap: simplify allocation scheme
hwspinlock/core: simplify 'owner' handling
hwspinlock/core: simplify Kconfig
Fix up trivial conflicts (addition of omap_hwspinlock_pdata, removal of
omap_spinlock_latency) in arch/arm/mach-omap2/hwspinlock.c
Also, do an "evil merge" to fix a compile error in omap_hsmmc.c which
for some reason was reported in the same email thread as the "please
pull hwspinlock changes".
Linus Torvalds [Thu, 3 Nov 2011 14:53:22 +0000 (07:53 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
Revert "HID: multitouch: decide if hid-multitouch needs to handle mt devices"
HID: drivers/hid/hid-roccat.c: eliminate a null pointer dereference
HID: hid-apple: add device ID of another wireless aluminium
HID: Add device IDs for Macbook Pro 8 keyboards
Linus Torvalds [Thu, 3 Nov 2011 14:44:04 +0000 (07:44 -0700)]
Revert "perf: Add PM notifiers to fix CPU hotplug races"
This reverts commit
144060fee07e9c22e179d00819c83c86fbcbf82c.
It causes a resume regression for Andi on his Acer Aspire 1830T post
3.1. The screen just stays black after wakeup.
Also, it really looks like the wrong way to suspend and resume perf
events: I think they should be done as part of the CPU suspend and
resume, rather than as a notifier that does smp_call_function().
Reported-by: Andi Kleen <andi@firstfloor.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 3 Nov 2011 00:02:37 +0000 (17:02 -0700)]
Merge git://git./linux/kernel/git/steve/linux-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm:
dm: raid fix device status indicator when array initializing
dm log userspace: add log device dependency
dm log userspace: fix comment hyphens
dm: add thin provisioning target
dm: add persistent data library
dm: add bufio
dm: export dm get md
dm table: add immutable feature
dm table: add always writeable feature
dm table: add singleton feature
dm kcopyd: add dm_kcopyd_zero to zero an area
dm: remove superfluous smp_mb
dm: use local printk ratelimit
dm table: propagate non rotational flag
Linus Torvalds [Thu, 3 Nov 2011 00:01:01 +0000 (17:01 -0700)]
Merge branch 'for-linus' of git://git.selinuxproject.org/~jmorris/linux-security
* 'for-linus' of git://git.selinuxproject.org/~jmorris/linux-security:
TOMOYO: Fix interactive judgment functionality.
Greg Rose [Thu, 20 Oct 2011 04:14:49 +0000 (04:14 +0000)]
ixgbevf: Update release version
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
John Fastabend [Wed, 19 Oct 2011 08:48:49 +0000 (08:48 +0000)]
ixgbe: DCB, return max for IEEE traffic classes
Returning the max traffic classes on get requests simplifies
user space configurations because applications will know
explicitly how many traffic classes can be used.
Typical switch implementations use 2 or 3 traffic classes
so this not seen often today. And user space can learn
the number of traffic classes by return codes but this
allows user space to configure ixgbe correctly at the
start.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 22 Oct 2011 05:21:32 +0000 (05:21 +0000)]
ixgbe: fix reading of the buffer returned by the firmware
This patch fixes some issues found in the buffer read portion of
ixgbe_host_interface_command()
- use `bi` as the buffer index counter instead of `i`
- add conversion to native cpu byte ordering on register read
- fix conversion from bytes to dword
- use dword_len instead of buf_len when reading the register
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Fri, 21 Oct 2011 07:55:15 +0000 (07:55 +0000)]
ixgbe: Fix compiler warnings
Wrap SR-IOV specific functions in CONFIG_PCI_IOV to avoid compiler
warnings.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
John Fastabend [Sat, 15 Oct 2011 05:00:10 +0000 (05:00 +0000)]
ixgbe: fix smatch splat due to missing NULL check
ixgbe_ieee_ets and ixgbe_ieee_pfc are intialized at
the same time. Do a check for both before configuring
IEEE802.1Qaz. Also max_frame was causing a sparse
warning resolved here as well.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Wed, 19 Oct 2011 07:59:55 +0000 (07:59 +0000)]
ixgbe: fix disabling of Tx laser at probe
register_netdev() calls ndo_set_features() which may result in HW reset
which in turn will bring the laser back up.
This patch moves ixgbe_laser_tx_disable() below register_netdev()
in ixgbe_probe() to make sure laser is shut off on load.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Linus Torvalds [Wed, 2 Nov 2011 23:55:15 +0000 (16:55 -0700)]
Merge branch 'linux_next' of git://git./linux/kernel/git/mchehab/linux-edac
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (21 commits)
MAINTAINERS: add an entry for Edac Sandy Bridge driver
edac: tag sb_edac as EXPERIMENTAL, as it requires more testing
EDAC: Fix incorrect edac mode reporting in sb_edac
edac: sb_edac: Add it to the building system
edac: Add an experimental new driver to support Sandy Bridge CPU's
i7300_edac: Fix error cleanup logic
i7core_edac: Initialize memory name with cpu, channel, bank
i7core_edac: Fix compilation on 32 bits arch
i7core_edac: scrubbing fixups
EDAC: Correct Kconfig dependencies
i7core_edac: return -ENODEV if no MC is found
i7core_edac: use edac's own way to print errors
MAINTAINERS: remove dropped edac_mce.* from the file
i7core_edac: Drop the edac_mce facility
x86, MCE: Use notifier chain only for MCE decoding
EDAC i7core: Use mce socketid for better compatibility
i7core_edac: Don't enable memory scrubbing for Xeon 35xx
i7core_edac: Add scrubbing support
edac: Move edac main structs to include/linux/edac.h
i7core_edac: Fix oops when trying to inject errors
...
Emil Tantilov [Wed, 19 Oct 2011 07:41:58 +0000 (07:41 +0000)]
ixgbe: Fix link issues caused by a reset while interface is down
Interface fails to obtain link on 82599 SFP in the following scenario:
1. Set advertised speed to GB:
ethtool -s eth0 advertise 0x20
2. Bring interface down
ip link set eth0 down
3. Issue any command that leads to a reset:
ethtool -t eth0
4. Bring link back up:
ip link set eth0 up
Following patch makes sure that the driver flaps the Tx laser every time
ixgbe_start_hw() is called, and not only when the speed is set.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kantecki, Tomasz [Mon, 17 Oct 2011 22:06:59 +0000 (22:06 +0000)]
igb: Fix for I347AT4 PHY cable length unit detection
The PHY cable length unit detection was not using the correct
the correct PHY data variable for I347AT4.
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 22 Oct 2011 05:18:10 +0000 (05:18 +0000)]
e100: make sure vlan support isn't advertised on old adapters
e100 parts don't support vlan offload but they generally do
allow use of vlans in higher software layers via the 8021q module.
That said, there are a couple of really old revisions of e100
hardware that don't even allow the longer frame sizes
required for vlan use with standard MTU.
Use the VLAN_CHALLENGED flag to prevent vlan binding to these
devices.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Michael Tokarev <mjt@tls.msk.ru>
CC: David Lamparter <equinox@diac24.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 21 Oct 2011 04:33:47 +0000 (04:33 +0000)]
e1000e: demote a debugging WARN to a debug log message
This debugging message was recently added but it does not need to be as
alarming as a WARN.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Linus Torvalds [Wed, 2 Nov 2011 23:54:36 +0000 (16:54 -0700)]
Merge branch 'for-3.2' of git://linux-nfs.org/~bfields/linux
* 'for-3.2' of git://linux-nfs.org/~bfields/linux:
nfsd4: typo logical vs bitwise negate in nfsd4_decode_share_access
Linus Torvalds [Wed, 2 Nov 2011 23:52:17 +0000 (16:52 -0700)]
Merge branch 'misc-3.2' of git://git./linux/kernel/git/aegl/linux
* 'misc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
MAINTAINERS: Update entry for IA64
[IA64] gpio: GENERIC_GPIO default must be n
[IA64[ add CONFIG_NET_VENDOR_INTEL=y to default config files where needed
[IA64] agp/hp-agp: Allow binding user memory to the AGP GART
[IA64] sn2: add missing put_cpu()
Linus Torvalds [Wed, 2 Nov 2011 23:07:27 +0000 (16:07 -0700)]
Merge branch 'akpm' (Andrew's incoming - part two)
Says Andrew:
"60 patches. That's good enough for -rc1 I guess. I have quite a lot
of detritus to be rechecked, work through maintainers, etc.
- most of the remains of MM
- rtc
- various misc
- cgroups
- memcg
- cpusets
- procfs
- ipc
- rapidio
- sysctl
- pps
- w1
- drivers/misc
- aio"
* akpm: (60 commits)
memcg: replace ss->id_lock with a rwlock
aio: allocate kiocbs in batches
drivers/misc/vmw_balloon.c: fix typo in code comment
drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
w1: disable irqs in critical section
drivers/w1/w1_int.c: multiple masters used same init_name
drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
drivers/power/ds2780_battery.c: add a nolock function to w1 interface
drivers/power/ds2780_battery.c: create central point for calling w1 interface
w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
pps gpio client: add missing dependency
pps: new client driver using GPIO
pps: default echo function
include/linux/dma-mapping.h: add dma_zalloc_coherent()
sysctl: make CONFIG_SYSCTL_SYSCALL default to n
sysctl: add support for poll()
RapidIO: documentation update
drivers/net/rionet.c: fix ethernet address macros for LE platforms
RapidIO: fix potential null deref in rio_setup_device()
RapidIO: add mport driver for Tsi721 bridge
...
Andrew Bresticker [Wed, 2 Nov 2011 20:40:29 +0000 (13:40 -0700)]
memcg: replace ss->id_lock with a rwlock
While back-porting Johannes Weiner's patch "mm: memcg-aware global
reclaim" for an internal effort, we noticed a significant performance
regression during page-reclaim heavy workloads due to high contention of
the ss->id_lock. This lock protects idr map, and serializes calls to
idr_get_next() in css_get_next() (which is used during the memcg hierarchy
walk).
Since idr_get_next() is just doing a look up, we need only serialize it
with respect to idr_remove()/idr_get_new(). By making the ss->id_lock a
rwlock, contention is greatly reduced and performance improves.
Tested: cat a 256m file from a ramdisk in a 128m container 50 times on
each core (one file + container per core) in parallel on a NUMA machine.
Result is the time for the test to complete in 1 of the containers.
Both kernels included Johannes' memcg-aware global reclaim patches.
Before rwlock patch: 1710.778s
After rwlock patch: 152.227s
Signed-off-by: Andrew Bresticker <abrestic@google.com>
Cc: Paul Menage <menage@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Moyer [Wed, 2 Nov 2011 20:40:10 +0000 (13:40 -0700)]
aio: allocate kiocbs in batches
In testing aio on a fast storage device, I found that the context lock
takes up a fair amount of cpu time in the I/O submission path. The reason
is that we take it for every I/O submitted (see __aio_get_req). Since we
know how many I/Os are passed to io_submit, we can preallocate the kiocbs
in batches, reducing the number of times we take and release the lock.
In my testing, I was able to reduce the amount of time spent in
_raw_spin_lock_irq by .56% (average of 3 runs). The command I used to
test this was:
aio-stress -O -o 2 -o 3 -r 8 -d 128 -b 32 -i 32 -s 16384 <dev>
I also tested the patch with various numbers of events passed to
io_submit, and I ran the xfstests aio group of tests to ensure I didn't
break anything.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Daniel Ehrenberg <dehrenberg@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rakib Mullick [Wed, 2 Nov 2011 20:40:07 +0000 (13:40 -0700)]
drivers/misc/vmw_balloon.c: fix typo in code comment
Fix typo in code comment.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Acked-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rakib Mullick [Wed, 2 Nov 2011 20:40:04 +0000 (13:40 -0700)]
drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
In vmballoon_reserve_page(), flags has been passed from the callee
function (vmballoon_inflate here). So, we can determine can_sleep outside
the loop.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Acked-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Weitzel [Wed, 2 Nov 2011 20:40:02 +0000 (13:40 -0700)]
w1: disable irqs in critical section
Interrupting w1_delay() in w1_read_bit() results in missing the low level
on the w1 line and receiving "1" instead of "0".
Add local_irq_save()/local_irq_restore() around the critical section
Signed-off-by: Jan Weitzel <j.weitzel@phytec.de>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Florian Faber [Wed, 2 Nov 2011 20:39:59 +0000 (13:39 -0700)]
drivers/w1/w1_int.c: multiple masters used same init_name
When using multiple masters, w1_int.c would use the .init_name from w1.c
for all entities, which will fail when creating a corresponding sysfs
entry. This patch uses the unique name previously generated.
WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0x48/0x64()
sysfs: cannot create duplicate filename '/devices/w1 bus master'
Modules linked in:
Call trace:
[<
9001a604>] warn_slowpath_common+0x34/0x44
[<
9001a64c>] warn_slowpath_fmt+0x14/0x18
[<
90078020>] sysfs_add_one+0x48/0x64
[<
900784ec>] create_dir+0x40/0x68
[<
9007857a>] sysfs_create_dir+0x66/0x78
[<
900c1a8a>] kobject_add_internal+0x6e/0x104
[<
900c1bc0>] kobject_add_varg+0x20/0x2c
[<
900c1c1c>] kobject_add+0x30/0x3c
[<
900dbd66>] device_add+0x6a/0x378
[<
900dbb4a>] device_initialize+0x12/0x48
[<
900dc080>] device_register+0xc/0x10
[<
900f99be>] w1_add_master_device+0x162/0x274
[<
90008e7a>] w1_gpio_probe+0x66/0xb4
[<
9000030c>] kernel_init+0x0/0xe8
[<
900dde54>] platform_drv_probe+0xc/0xe
[<
9000030c>] kernel_init+0x0/0xe8
[<
900dd4f8>] driver_probe_device+0x6c/0xdc
[<
900dd5fc>] __driver_attach+0x34/0x48
[<
900dcce8>] bus_for_each_dev+0x2c/0x48
[<
900dd5c8>] __driver_attach+0x0/0x48
[<
900dd38c>] driver_attach+0x10/0x14
[<
900dd16a>] bus_add_driver+0x6a/0x18c
[<
900dd768>] driver_register+0x60/0xb8
[<
90011594>] __initcall_w1_therm_init6+0x0/0x4
[<
90008e00>] w1_gpio_init+0x0/0x14
[<
9000030c>] kernel_init+0x0/0xe8
[<
900ddf48>] platform_driver_register+0x30/0x38
[<
90011594>] __initcall_w1_therm_init6+0x0/0x4
[<
90008e00>] w1_gpio_init+0x0/0x14
[<
9000030c>] kernel_init+0x0/0xe8
[<
900ddf5e>] platform_driver_probe+0xe/0x3c
[<
90008e0c>] w1_gpio_init+0xc/0x14
[<
90011594>] __initcall_w1_therm_init6+0x0/0x4
[<
90008e00>] w1_gpio_init+0x0/0x14
[<
900126d4>] do_one_initcall+0x34/0x130
[<
90000372>] kernel_init+0x66/0xe8
[<
90011594>] __initcall_w1_therm_init6+0x0/0x4
[<
9001ca3e>] do_exit+0x0/0x3a6
[<
9000030c>] kernel_init+0x0/0xe8
[<
9001ca3e>] do_exit+0x0/0x3a6
---[ end trace
5a9233884fead918 ]---
kobject_add_internal failed for w1 bus master with -EEXIST, don't try to register things with the same name in the same directory.
Signed-off-by: Florian Faber <faber@faberman.de>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clifton Barnes [Wed, 2 Nov 2011 20:39:55 +0000 (13:39 -0700)]
drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
Fixes the deadlock when inserting and removing the ds2780.
Signed-off-by: Clifton Barnes <cabarnes@indesign-llc.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clifton Barnes [Wed, 2 Nov 2011 20:39:52 +0000 (13:39 -0700)]
drivers/power/ds2780_battery.c: add a nolock function to w1 interface
Adds a nolock function to the w1 interface to avoid locking the
mutex if needed.
Signed-off-by: Clifton Barnes <cabarnes@indesign-llc.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clifton Barnes [Wed, 2 Nov 2011 20:39:50 +0000 (13:39 -0700)]
drivers/power/ds2780_battery.c: create central point for calling w1 interface
Simply creates one point to call the w1 interface.
Signed-off-by: Clifton Barnes <cabarnes@indesign-llc.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jonathan Cameron [Wed, 2 Nov 2011 20:39:43 +0000 (13:39 -0700)]
w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
Straightforward. As an aside, the ida_init calls are not needed as far as
I can see needed. (DEFINE_IDA does the same already).
Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: Clifton Barnes <cabarnes@indesign-llc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heiko Carstens [Wed, 2 Nov 2011 20:39:41 +0000 (13:39 -0700)]
pps gpio client: add missing dependency
Add "depends on GENERIC_HARDIRQS" to avoid compile breakage on s390:
drivers/built-in.o: In function `pps_gpio_remove':
linux-next/drivers/pps/clients/pps-gpio.c:189: undefined reference to `free_irq'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Nuss <jamesnuss@nanometrics.ca>
Cc: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
James Nuss [Wed, 2 Nov 2011 20:39:38 +0000 (13:39 -0700)]
pps: new client driver using GPIO
This client driver allows you to use a GPIO pin as a source for PPS
signals. Platform data [1] are used to specify the GPIO pin number,
label, assert event edge type, and whether clear events are captured.
This driver is based on the work by Ricardo Martins who submitted an
initial implementation [2] of a PPS IRQ client driver to the linuxpps
mailing-list on Dec 3 2010.
[1] include/linux/pps-gpio.h
[2] http://ml.enneenne.com/pipermail/linuxpps/2010-December/004155.html
[akpm@linux-foundation.org: remove unneeded cast of void*]
Signed-off-by: James Nuss <jamesnuss@nanometrics.ca>
Cc: Ricardo Martins <rasm@fe.up.pt>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Ricardo Martins <rasm@fe.up.pt>
Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Cc: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
James Nuss [Wed, 2 Nov 2011 20:39:34 +0000 (13:39 -0700)]
pps: default echo function
A default echo function has been provided so it is no longer an error when
you specify PPS_ECHOASSERT or PPS_ECHOCLEAR without an explicit echo
function. This allows some code re-use and also makes it easier to write
client drivers since the default echo function does not normally need to
change.
Signed-off-by: James Nuss <jamesnuss@nanometrics.ca>
Reviewed-by: Ben Gardiner <bengardiner@nanometrics.ca>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Cc: Ricardo Martins <rasm@fe.up.pt>
Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Cc: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 2 Nov 2011 20:39:33 +0000 (13:39 -0700)]
include/linux/dma-mapping.h: add dma_zalloc_coherent()
Lots of driver code does a dma_alloc_coherent() and then zeroes out the
memory with a memset. Make it easy for them.
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WANG Cong [Wed, 2 Nov 2011 20:39:25 +0000 (13:39 -0700)]
sysctl: make CONFIG_SYSCTL_SYSCALL default to n
When I tried to send a patch to remove it, Andi told me we still need to
keep compabitlies for old libc, so we can't remove this completely. Then
just make it default to n and remove the doc from
feature-removal-schedule.txt.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lucas De Marchi [Wed, 2 Nov 2011 20:39:22 +0000 (13:39 -0700)]
sysctl: add support for poll()
Adding support for poll() in sysctl fs allows userspace to receive
notifications of changes in sysctl entries. This adds a infrastructure to
allow files in sysctl fs to be pollable and implements it for hostname and
domainname.
[akpm@linux-foundation.org: s/declare/define/ for definitions]
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Greg KH <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Wed, 2 Nov 2011 20:39:19 +0000 (13:39 -0700)]
RapidIO: documentation update
Update rapidio.txt to reflect changes from recent patch.
See http://marc.info/?l=linux-kernel&m=
131285620113589&w=2 for details.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Liu Gang <Gang.Liu@freescale.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Wed, 2 Nov 2011 20:39:15 +0000 (13:39 -0700)]
drivers/net/rionet.c: fix ethernet address macros for LE platforms
Modify Ethernet addess macros to be compatible with BE/LE platforms
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Chul Kim <chul.kim@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: <stable@kernel.org> [2.6.39+]
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Wed, 2 Nov 2011 20:39:11 +0000 (13:39 -0700)]
RapidIO: fix potential null deref in rio_setup_device()
The "goto cleanup" path can deference "rswitch" when it is NULL.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Chul Kim <chul.kim@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Wed, 2 Nov 2011 20:39:09 +0000 (13:39 -0700)]
RapidIO: add mport driver for Tsi721 bridge
Add RapidIO mport driver for IDT TSI721 PCI Express-to-SRIO bridge device.
The driver provides full set of callback functions defined for mport
devices in RapidIO subsystem. It also is compatible with current version
of RIONET driver (Ethernet over RapidIO messaging services).
This patch is applicable to kernel versions starting from 2.6.39.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Chul Kim <chul.kim@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Liu Gang [Wed, 2 Nov 2011 20:39:07 +0000 (13:39 -0700)]
arch/powerpc/sysdev/fsl_rio.c: release rapidio port I/O region resource if port failed to initialize
The "struct rio_mport" contains a member of master port I/O memory
resource structure "struct resource iores". This resource will be read
from device tree and be used for rapidio R/W transaction memory space.
Rapidio requests the port I/O memory resource under the root resource
"iomem_resource".
struct rio_mport *port;
port = kzalloc(sizeof(struct rio_mport), GFP_KERNEL);
request_resource(&iomem_resource, &port->iores);
When port failed to initialize, allocated "rio_mport" structure memory
will be freed, and the port I/O memory resource structure pointer
"&port->iores" will be invalid. If other requests resource under
"iomem_resource", "&port->iores" node may be operated in the child
resources list and this will cause the system to crash.
So the requested port I/O memory resource should be released before
freeing allocated "rio_mport" structure.
Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Liu Gang [Wed, 2 Nov 2011 20:39:05 +0000 (13:39 -0700)]
drivers/rapidio/rio-scan.c: use discovered bit to test if enumeration is complete
The discovered bit in PGCCSR register indicates if the device has been
discovered by system host. In Rapidio systems, some agent devices can also
be master devices. They can issue requests into the system.
Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Will Drewry [Wed, 2 Nov 2011 20:38:59 +0000 (13:38 -0700)]
init: add root=PARTUUID=UUID/PARTNROFF=%d support
Expand root=PARTUUID=UUID syntax to support selecting a root partition by
integer offset from a known, unique partition. This approach provides
similar properties to specifying a device and partition number, but using
the UUID as the unique path prior to evaluating the offset.
For example,
root=PARTUUID=
99DE9194-FC15-4223-9192-
FC243948F88B/PARTNROFF=1
selects the partition with UUID 99DE.. then select the next
partition.
This change is motivated by a particular usecase in Chromium OS where the
bootloader can easily determine what partition it is on (by UUID) but
doesn't perform general partition table walking.
That said, support for this model provides a direct mechanism for the user
to modify the root partition to boot without specifically needing to
extract each UUID or update the bootloader explicitly when the root
partition UUID is changed (if it is recreated to be larger, for instance).
Pinning to a /boot-style partition UUID allows the arbitrary root
partition reconfiguration/modifications with slightly less ambiguity than
just [dev][partition] and less stringency than the specific root partition
UUID.
[sfr@canb.auug.org.au: fix init sections warning]
Signed-off-by: Will Drewry <wad@chromium.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Wed, 2 Nov 2011 20:38:56 +0000 (13:38 -0700)]
include/linux/sem.h: make sysv_sem empty if SYSVIPC is disabled
For the sysvsem undo, each task struct contains a sysv_sem structure with
a pointer to the undo information.
This pointer is only necessary if sysvipc is enabled - thus the pointer
can be made conditional on CONFIG_SYSVIPC.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Wed, 2 Nov 2011 20:38:54 +0000 (13:38 -0700)]
ipc/sem.c: remove private structures from public header file
include/linux/sem.h contains several structures that are only used within
ipc/sem.c.
The patch moves them into ipc/sem.c - there is no need to expose the
structures to the whole kernel.
No functional changes, only whitespace cleanups and 80-char per line
fixes.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Wed, 2 Nov 2011 20:38:52 +0000 (13:38 -0700)]
ipc/sem.c: handle spurious wakeups
semtimedop() does not handle spurious wakeups, it returns -EINTR to user
space. Most other schedule() users would just loop and not return to user
space. The patch adds such a loop to semtimedop()
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Wed, 2 Nov 2011 20:38:50 +0000 (13:38 -0700)]
ipc/sem.c: fix return code race with semop vs. semop +semctl(IPC_RMID)
sys_semtimedop() may return -EIDRM although the semaphore operation
completed successfully:
thread 1: thread 2:
semtimedop(), sleeps
semop():
* acquires sem_lock()
semtimedop() woken up due to timeout
sem_lock() loops
* notices that thread 2 could be completed.
* performs the operations that thread 2 is sleeping on.
* marks the semaphore operation as IN_WAKEUP
* drops sem_lock(), does wakeup, sets return code to 0
* thread delayed due to interrupt, whatever
* returns to user space
* thread still delayed
semctl(IPC_RMID)
* acquires sem_lock()
* ipc_rmid(), ipcp->deleted=1
* drops sem_lock()
* thread finally continues - but seem_lock()
now fails due to ipcp->deleted == 1
* returns -EIDRM instead of 0
The fix is trivial: Always use the return code in queue.status.
In real world, the race probably doesn't matter:
If the semaphore array is destroyed, the app is probably not interested
if the last operation succeeded or was already cancelled.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Wed, 2 Nov 2011 20:38:46 +0000 (13:38 -0700)]
ida: make ida_simple_get/put() IRQ safe
It's often convenient to be able to release resource from IRQ context.
Make ida_simple_*() use irqsave/restore spin ops so that they are IRQ
safe.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vasiliy Kulikov [Wed, 2 Nov 2011 20:38:44 +0000 (13:38 -0700)]
proc: fix races against execve() of /proc/PID/fd**
fd* files are restricted to the task's owner, and other users may not get
direct access to them. But one may open any of these files and run any
setuid program, keeping opened file descriptors. As there are permission
checks on open(), but not on readdir() and read(), operations on the kept
file descriptors will not be checked. It makes it possible to violate
procfs permission model.
Reading fdinfo/* may disclosure current fds' position and flags, reading
directory contents of fdinfo/ and fd/ may disclosure the number of opened
files by the target task. This information is not sensible per se, but it
can reveal some private information (like length of a password stored in a
file) under certain conditions.
Used existing (un)lock_trace functions to check for ptrace_may_access(),
but instead of using EPERM return code from it use EACCES to be consistent
with existing proc_pid_follow_link()/proc_pid_readlink() return code. If
they differ, attacker can guess what fds exist by analyzing stat() return
code. Patched handlers: stat() for fd/*, stat() and read() for fdindo/*,
readdir() and lookup() for fd/ and fdinfo/.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Wed, 2 Nov 2011 20:38:42 +0000 (13:38 -0700)]
procfs: report EISDIR when reading sysctl dirs in proc
On reading sysctl dirs we should return -EISDIR instead of -EINVAL.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 2 Nov 2011 20:38:39 +0000 (13:38 -0700)]
cpusets: avoid looping when storing to mems_allowed if one node remains set
{get,put}_mems_allowed() exist so that general kernel code may locklessly
access a task's set of allowable nodes without having the chance that a
concurrent write will cause the nodemask to be empty on configurations
where MAX_NUMNODES > BITS_PER_LONG.
This could incur a significant delay, however, especially in low memory
conditions because the page allocator is blocking and reclaim requires
get_mems_allowed() itself. It is not atypical to see writes to
cpuset.mems take over 2 seconds to complete, for example. In low memory
conditions, this is problematic because it's one of the most imporant
times to change cpuset.mems in the first place!
The only way a task's set of allowable nodes may change is through cpusets
by writing to cpuset.mems and when attaching a task to a generic code is
not reading the nodemask with get_mems_allowed() at the same time, and
then clearing all the old nodes. This prevents the possibility that a
reader will see an empty nodemask at the same time the writer is storing a
new nodemask.
If at least one node remains unchanged, though, it's possible to simply
set all new nodes and then clear all the old nodes. Changing a task's
nodemask is protected by cgroup_mutex so it's guaranteed that two threads
are not changing the same task's nodemask at the same time, so the
nodemask is guaranteed to be stored before another thread changes it and
determines whether a node remains set or not.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Paul Menage <paul@paulmenage.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
H Hartley Sweeten [Wed, 2 Nov 2011 20:38:36 +0000 (13:38 -0700)]
mm/page_cgroup.c: quiet sparse noise
warning: symbol 'swap_cgroup_ctrl' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Rostedt [Wed, 2 Nov 2011 20:38:33 +0000 (13:38 -0700)]
memcg: Fix race condition in memcg_check_events() with this_cpu usage
Various code in memcontrol.c () calls this_cpu_read() on the calculations
to be done from two different percpu variables, or does an open-coded
read-modify-write on a single percpu variable.
Disable preemption throughout these operations so that the writes go to
the correct palces.
[hannes@cmpxchg.org: added this_cpu to __this_cpu conversion]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 2 Nov 2011 20:38:29 +0000 (13:38 -0700)]
memcg: close race between charge and putback
There is a potential race between a thread charging a page and another
thread putting it back to the LRU list:
charge: putback:
SetPageCgroupUsed SetPageLRU
PageLRU && add to memcg LRU PageCgroupUsed && add to memcg LRU
The order of setting one flag and checking the other is crucial, otherwise
the charge may observe !PageLRU while the putback observes !PageCgroupUsed
and the page is not linked to the memcg LRU at all.
Global memory pressure may fix this by trying to isolate and putback the
page for reclaim, where that putback would link it to the memcg LRU again.
Without that, the memory cgroup is undeletable due to a charge whose
physical page can not be found and moved out.
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Cc: Ying Han <yinghan@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 2 Nov 2011 20:38:23 +0000 (13:38 -0700)]
memcg: skip scanning active lists based on individual size
Reclaim decides to skip scanning an active list when the corresponding
inactive list is above a certain size in comparison to leave the assumed
working set alone while there are still enough reclaim candidates around.
The memcg implementation of comparing those lists instead reports whether
the whole memcg is low on the requested type of inactive pages,
considering all nodes and zones.
This can lead to an oversized active list not being scanned because of the
state of the other lists in the memcg, as well as an active list being
scanned while its corresponding inactive list has enough pages.
Not only is this wrong, it's also a scalability hazard, because the global
memory state over all nodes and zones has to be gathered for each memcg
and zone scanned.
Make these calculations purely based on the size of the two LRU lists
that are actually affected by the outcome of the decision.
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Igor Mammedov [Wed, 2 Nov 2011 20:38:21 +0000 (13:38 -0700)]
memcg: do not expose uninitialized mem_cgroup_per_node to world
If somebody is touching data too early, it might be easier to diagnose a
problem when dereferencing NULL at mem->info.nodeinfo[node] than trying to
understand why mem_cgroup_per_zone is [un|partly]initialized.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Wed, 2 Nov 2011 20:38:18 +0000 (13:38 -0700)]
memcg: fix oom schedule_timeout()
Before calling schedule_timeout(), task state should be changed.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Raghavendra K T [Wed, 2 Nov 2011 20:38:15 +0000 (13:38 -0700)]
memcg: rename mem variable to memcg
The memcg code sometimes uses "struct mem_cgroup *mem" and sometimes uses
"struct mem_cgroup *memcg". Rename all mem variables to memcg in source
file.
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Rostedt [Wed, 2 Nov 2011 20:38:11 +0000 (13:38 -0700)]
cgroup/kmemleak: Annotate alloc_page() for cgroup allocations
When the cgroup base was allocated with kmalloc, it was necessary to
annotate the variable with kmemleak_not_leak(). But because it has
recently been changed to be allocated with alloc_page() (which skips
kmemleak checks) causes a warning on boot up.
I was triggering this output:
allocated
8388608 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
kmemleak: Trying to color unknown object at 0xf5840000 as Grey
Pid: 0, comm: swapper Not tainted 3.0.0-test #12
Call Trace:
[<
c17e34e6>] ? printk+0x1d/0x1f^M
[<
c10e2941>] paint_ptr+0x4f/0x78
[<
c178ab57>] kmemleak_not_leak+0x58/0x7d
[<
c108ae9f>] ? __rcu_read_unlock+0x9/0x7d
[<
c1cdb462>] kmemleak_init+0x19d/0x1e9
[<
c1cbf771>] start_kernel+0x346/0x3ec
[<
c1cbf1b4>] ? loglevel+0x18/0x18
[<
c1cbf0aa>] i386_start_kernel+0xaa/0xb0
After a bit of debugging I tracked the object 0xf840000 (and others) down
to the cgroup code. The change from allocating base with kmalloc to
alloc_page() has the base not calling kmemleak_alloc() which adds the
pointer to the object_tree_root, but kmemleak_not_leak() adds it to the
crt_early_log[] table. On kmemleak_init(), the entry is found in the
early_log[] but not the object_tree_root, and this error message is
displayed.
If alloc_page() fails then it defaults back to vmalloc() which still uses
the kmemleak_alloc() which makes us still need the kmemleak_not_leak()
call. The solution is to call the kmemleak_alloc() directly if the
alloc_page() succeeds.
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Blum [Wed, 2 Nov 2011 20:38:07 +0000 (13:38 -0700)]
cgroups: don't attach task to subsystem if migration failed
If a task has exited to the point it has called cgroup_exit() already,
then we can't migrate it to another cgroup anymore.
This can happen when we are attaching a task to a new cgroup between the
call to ->can_attach_task() on subsystems and the migration that is
eventually tried in cgroup_task_migrate().
In this case cgroup_task_migrate() returns -ESRCH and we don't want to
attach the task to the subsystems because the attachment to the new cgroup
itself failed.
Fix this by only calling ->attach_task() on the subsystems if the cgroup
migration succeeded.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Blum [Wed, 2 Nov 2011 20:38:05 +0000 (13:38 -0700)]
cgroups: more safe tasklist locking in cgroup_attach_proc
Fix unstable tasklist locking in cgroup_attach_proc.
According to this thread - https://lkml.org/lkml/2011/7/27/243 - RCU is
not sufficient to guarantee the tasklist is stable w.r.t. de_thread and
exit. Taking tasklist_lock for reading, instead of rcu_read_lock, ensures
proper exclusion.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Phillip Lougher [Wed, 2 Nov 2011 20:38:01 +0000 (13:38 -0700)]
hfs: fix hfs_find_init() sb->ext_tree NULL ptr oops
Clement Lecigne reports a filesystem which causes a kernel oops in
hfs_find_init() trying to dereference sb->ext_tree which is NULL.
This proves to be because the filesystem has a corrupted MDB extent
record, where the extents file does not fit into the first three extents
in the file record (the first blocks).
In hfs_get_block() when looking up the blocks for the extent file
(HFS_EXT_CNID), it fails the first blocks special case, and falls
through to the extent code (which ultimately calls hfs_find_init())
which is in the process of being initialised.
Hfs avoids this scenario by always having the extents b-tree fitting
into the first blocks (the extents B-tree can't have overflow extents).
The fix is to check at mount time that the B-tree fits into first
blocks, i.e. fail if HFS_I(inode)->alloc_blocks >=
HFS_I(inode)->first_blocks
Note, the existing commit
47f365eb57573 ("hfs: fix oops on mount with
corrupted btree extent records") becomes subsumed into this as a special
case, but only for the extents B-tree (HFS_EXT_CNID), it is perfectly
acceptable for the catalog B-Tree file to grow beyond three extents,
with the remaining extent descriptors in the extents overfow.
This fixes CVE-2011-2203
Reported-by: Clement LECIGNE <clement.lecigne@netasq.com>
Signed-off-by: Phillip Lougher <plougher@redhat.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namjae Jeon [Wed, 2 Nov 2011 20:38:00 +0000 (13:38 -0700)]
isofs: add readpages support
Use mpage_readpages() instead of multiple calls to isofs_readpage() to
reduce the CPU utilization and make performance higher.
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sami Kerola [Wed, 2 Nov 2011 20:37:58 +0000 (13:37 -0700)]
minix: describe usage of different magic numbers
One can get this information from minix/inode.c, but adding the
explanations at the definition sites is more appropriate.
Signed-off-by: Sami Kerola <kerolasa@iki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Wed, 2 Nov 2011 20:37:56 +0000 (13:37 -0700)]
drivers/rtc/rtc-mc13xxx.c: move probe and remove callbacks to .init.text and .exit.text
The driver is added using platform_driver_probe(), so the callbacks can be
discarded more aggessively.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Anders [Wed, 2 Nov 2011 20:37:53 +0000 (13:37 -0700)]
rtc: add initial support for mcp7941x parts
Add initial support for the microchip mcp7941x series of real time clocks.
The mcp7941x series is generally compatible with the ds1307 and ds1337 rtc
devices from dallas semiconductor. minor differences include a backup
battery enable bit, and the polarity of the oscillator enable bit.
Signed-off-by: David Anders <danders.dev@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jonathan Cameron [Wed, 2 Nov 2011 20:37:49 +0000 (13:37 -0700)]
drivers/rtc/class.c: convert idr to ida and use ida_simple_get()
This is the one use of an ida that doesn't retry on receiving -EAGAIN.
I'm assuming do so will cause no harm and may help on a rare occasion.
Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Neil Armstrong [Wed, 2 Nov 2011 20:37:47 +0000 (13:37 -0700)]
init/do_mounts_rd.c: fix ramdisk identification for padded cramfs
When a cramfs ramdisk padded with 512 bytes is given to the kernel, the
current identify_ramdisk_image function fails to identify it.
Tested with a padded cramfs image on an ARM based board.
Signed-off-by: Neil Armstrong <narmstrong@neotion.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Davidlohr Bueso <dave@gnu.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Wed, 2 Nov 2011 20:37:45 +0000 (13:37 -0700)]
ramfs: remove module leftovers
Since ramfs is hard-selected to "y", the module leftovers make no sense.
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Kosina [Wed, 2 Nov 2011 20:37:41 +0000 (13:37 -0700)]
binfmt_elf: fix PIE execution with randomization disabled
The case of address space randomization being disabled in runtime through
randomize_va_space sysctl is not treated properly in load_elf_binary(),
resulting in SIGKILL coming at exec() time for certain PIE-linked binaries
in case the randomization has been disabled at runtime prior to calling
exec().
Handle the randomize_va_space == 0 case the same way as if we were not
supporting .text randomization at all.
Based on original patch by H.J. Lu and Josh Boyer.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: H.J. Lu <hongjiu.lu@intel.com>
Cc: <stable@kernel.org>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:36 +0000 (13:37 -0700)]
thp: share get_huge_page_tail()
This avoids duplicating the function in every arch gup_fast.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:31 +0000 (13:37 -0700)]
sparc: gup_pte_range() support THP based tail recounting
Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:28 +0000 (13:37 -0700)]
s390: gup_huge_pmd() return 0 if pte changes
s390 didn't return 0 in that case, if it's rolling back the *nr pointer it
should also return zero to avoid adding pages to the array at the wrong
offset.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:25 +0000 (13:37 -0700)]
s390: gup_huge_pmd() support THP tail recounting
Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:19 +0000 (13:37 -0700)]
powerpc: gup_huge_pmd() return 0 if pte changes
powerpc didn't return 0 in that case, if it's rolling back the *nr pointer
it should also return zero to avoid adding pages to the array at the wrong
offset.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:15 +0000 (13:37 -0700)]
powerpc: gup_hugepte() support THP based tail recounting
Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:11 +0000 (13:37 -0700)]
powerpc: gup_hugepte() avoid freeing the head page too many times
We only taken "refs" pins on the head page not "*nr" pins.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:08 +0000 (13:37 -0700)]
powerpc: get_hugepte() don't put_page() the wrong page
"page" may have changed to point to the next hugepage after the loop
completed, The references have been taken on the head page, so the
put_page must happen there too.
This is a longstanding issue pre-thp inclusion.
It's totally unclear how these page_cache_add_speculative and
pte_val(pte) != pte_val(*ptep) checks are necessary across all the
powerpc gup_fast code, when x86 doesn't need any of that: there's no way
the page can be freed with irq disabled so we're guaranteed the
atomic_inc will happen on a page with page_count > 0 (so not needing the
speculative check).
The pte check is also meaningless on x86: no need to rollback on x86 if
the pte changed, because the pte can still change a CPU tick after the
check succeeded and it won't be rolled back in that case. The important
thing is we got a reference on a valid page that was mapped there a CPU
tick ago. So not knowing the soft tlb refill code of ppc64 in great
detail I'm not removing the "speculative" page_count increase and the
pte checks across all the code, but unless there's a strong reason for
it they should be later cleaned up too.
If a pte can change from huge to non-huge (like it could happen with
THP) passing a pte_t *ptep to gup_hugepte() would also require to repeat
the is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs
only so I'm not altering that.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:03 +0000 (13:37 -0700)]
powerpc: remove superfluous PageTail checks on the pte gup_fast
This part of gup_fast doesn't seem capable of handling hugetlbfs ptes,
those should be handled by gup_hugepd only, so these checks are
superfluous.
Plus if this wasn't a noop, it would have oopsed because, the insistence
of using the speculative refcounting would trigger a VM_BUG_ON if a tail
page was encountered in the page_cache_get_speculative().
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 2 Nov 2011 20:36:59 +0000 (13:36 -0700)]
mm: thp: tail page refcounting fix
Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().
He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().
So the best way to fix the problem is to keep page_tail->_count zero at
all times. This will guarantee that get_page_unless_zero() can never
succeed on any tail page. page_tail->_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page->_count in __split_huge_page_refcount() (in addition to the
head_page->_mapcount).
While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages. That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic. As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns. It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()). get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).
The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages. The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it. A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead. So it's worth it. Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages(). The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.
This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 2 Nov 2011 22:00:56 +0000 (15:00 -0700)]
Merge git://github.com/rustyrussell/linux
* git://github.com/rustyrussell/linux:
virtio-blk: use ida to allocate disk index
virtio: Add platform bus driver for memory mapped virtio device
virtio: Dont add "config" to list for !per_vq_vector
virtio: console: wait for first console port for early console output
virtio: console: add port stats for bytes received, sent and discarded
virtio: console: make discard_port_data() use get_inbuf()
virtio: console: rename variable
virtio: console: make get_inbuf() return port->inbuf if present
virtio: console: Fix return type for get_inbuf()
virtio: console: Use wait_event_freezable instead of _interruptible
virtio: console: Ignore port name update request if name already set
virtio: console: Fix indentation
virtio: modify vring_init and vring_size to take account of the layout containing *_event_idx
virtio.h: correct comment for struct virtio_driver
virtio-net: Use virtio_config_val() for retrieving config
virtio_config: Add virtio_config_val_len()
virtio-console: Use virtio_config_val() for retrieving config
Linus Torvalds [Wed, 2 Nov 2011 18:41:01 +0000 (11:41 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/hch/vfs-queue
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
vfs: add d_prune dentry operation
vfs: protect i_nlink
filesystems: add set_nlink()
filesystems: add missing nlink wrappers
logfs: remove unnecessary nlink setting
ocfs2: remove unnecessary nlink setting
jfs: remove unnecessary nlink setting
hypfs: remove unnecessary nlink setting
vfs: ignore error on forced remount
readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
vfs: fix dentry leak in simple_fill_super()
Linus Torvalds [Wed, 2 Nov 2011 17:06:20 +0000 (10:06 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits)
jbd2: Unify log messages in jbd2 code
jbd/jbd2: validate sb->s_first in journal_get_superblock()
ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined
ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled
ext4: fix a typo in struct ext4_allocation_context
ext4: Don't normalize an falloc request if it can fit in 1 extent.
ext4: remove comments about extent mount option in ext4_new_inode()
ext4: let ext4_discard_partial_buffers handle unaligned range correctly
ext4: return ENOMEM if find_or_create_pages fails
ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()
ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
ext4: optimize locking for end_io extent conversion
ext4: remove unnecessary call to waitqueue_active()
ext4: Use correct locking for ext4_end_io_nolock()
ext4: fix race in xattr block allocation path
ext4: trace punch_hole correctly in ext4_ext_map_blocks
ext4: clean up AGGRESSIVE_TEST code
ext4: move variables to their scope
ext4: fix quota accounting during migration
ext4: migrate cleanup
...