Malahal Naineni [Fri, 24 Sep 2010 18:25:49 +0000 (20:25 +0200)]
block: set the bounce_pfn to the actual DMA limit rather than to max memory
The bounce_pfn of the request queue in 64 bit systems is set to the
current max_low_pfn. Adding more memory later makes this incorrect.
Memory allocated beyond this boot time max_low_pfn appear to require
bounce buffers (bounce buffers are actually not allocated but used in
calculating segments that may result in "over max segments limit"
errors).
Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Mark Lord [Fri, 24 Sep 2010 13:51:13 +0000 (09:51 -0400)]
block: Prevent hang_check firing during long I/O
During long I/O operations, the hang_check timer may fire,
trigger stack dumps that unnecessarily alarm the user.
Eg. hdparm --security-erase NULL /dev/sdb ## can take *hours* to complete
So, if hang_check is armed, we should wake up periodically
to prevent it from triggering. This patch uses a wake-up interval
equal to half the hang_check timer period, which keeps overhead low enough.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Corrado Zoccolo [Mon, 20 Sep 2010 13:24:50 +0000 (15:24 +0200)]
cfq: improve fsync performance for small files
Fsync performance for small files achieved by cfq on high-end disks is
lower than what deadline can achieve, due to idling introduced between
the sync write happening in process context and the journal commit.
Moreover, when competing with a sequential reader, a process writing
small files and fsync-ing them is starved.
This patch fixes the two problems by:
- marking journal commits as WRITE_SYNC, so that they get the REQ_NOIDLE
flag set,
- force all queues that have REQ_NOIDLE requests to be put in the noidle
tree.
Having the queue associated to the fsync-ing process and the one associated
to journal commits in the noidle tree allows:
- switching between them without idling,
- fairness vs. competing idling queues, since they will be serviced only
after the noidle tree expires its slice.
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Jens Axboe [Fri, 17 Sep 2010 08:00:46 +0000 (10:00 +0200)]
do_mounts: only enable PARTUUID for CONFIG_BLOCK
When CONFIG_BLOCK is not enabled:
init/do_mounts.c:71: error: implicit declaration of function 'dev_to_part'
init/do_mounts.c:71: warning: initialization makes pointer from integer without a cast
init/do_mounts.c:73: error: dereferencing pointer to incomplete type
init/do_mounts.c:76: error: dereferencing pointer to incomplete type
init/do_mounts.c:76: error: dereferencing pointer to incomplete type
init/do_mounts.c:102: error: implicit declaration of function 'part_pack_uuid'
init/do_mounts.c:104: error: 'block_class' undeclared (first use in this function)
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Jan Kara [Thu, 16 Sep 2010 18:36:36 +0000 (20:36 +0200)]
block: Fix race during disk initialization
When a new disk is being discovered, add_disk() first ties the bdev to gendisk
(via register_disk()->blkdev_get()) and only after that calls
bdi_register_bdev(). Because register_disk() also creates disk's kobject, it
can happen that userspace manages to open and modify the device's data (or
inode) before its BDI is properly initialized leading to a warning in
__mark_inode_dirty().
Fix the problem by registering BDI early enough.
This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=16312
Cc: stable@kernel.org
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Vivek Goyal [Wed, 15 Sep 2010 21:06:38 +0000 (17:06 -0400)]
blkio: Documentation Update
o Documentation update
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Vivek Goyal [Wed, 15 Sep 2010 21:06:37 +0000 (17:06 -0400)]
blkio: Implementation of IOPS limit logic
o core logic of implementing IOPS throttling.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Vivek Goyal [Wed, 15 Sep 2010 21:06:36 +0000 (17:06 -0400)]
blk-cgroup: cgroup changes for IOPS limit support
o cgroup changes for IOPS throttling rules.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Vivek Goyal [Wed, 15 Sep 2010 21:06:35 +0000 (17:06 -0400)]
blkio: Core implementation of throttle policy
o Actual implementation of throttling policy in block layer. Currently it
implements READ and WRITE bytes per second throttling logic. IOPS throttling
comes in later patches.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Vivek Goyal [Wed, 15 Sep 2010 21:06:34 +0000 (17:06 -0400)]
blk-cgroup: Introduce cgroup changes for throttling policy
o cgroup chagnes for throttle policy.
o Introduces READ and WRITE bytes per second throttling rules.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Vivek Goyal [Wed, 15 Sep 2010 21:06:33 +0000 (17:06 -0400)]
blk-cgroup: Prepare the base for supporting more than one IO control policies
o This patch prepares the base for introducing new IO control policies.
Currently all the code is written knowing there is only one policy
and that is proportional bandwidth. Creating infrastructure for newer
policies to come in.
o Also there were many functions which were generated using macro. It was
very confusing. Got rid of those.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Vivek Goyal [Wed, 15 Sep 2010 21:06:32 +0000 (17:06 -0400)]
blk-cgroup: Kill the header printed at the start of blkio.weight_device file
o Kill extra "dev weight" header which is printed when somebody reads
blkio.weight_device file. This really seems to be out of convention. No other
blkio files are printing any header at the start of file. I think it is ok
to just print values and how to interpret values should be part of
documentation.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Jens Axboe [Thu, 16 Sep 2010 06:33:54 +0000 (08:33 +0200)]
core: match_dev_by_uuid() should not be marked __init
It is also called outside the scope of init functions. Stephen
reports:
WARNING: init/mounts.o(.text+0x21a): Section mismatch in reference from the function name_to_dev_t() to the function .init.text:match_dev_by_uuid()
The function name_to_dev_t() references
the function __init match_dev_by_uuid().
This is often because name_to_dev_t lacks a __init
annotation or the annotation of match_dev_by_uuid is wrong.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Namhyung Kim [Thu, 16 Sep 2010 03:55:57 +0000 (12:55 +0900)]
sg: fix a warning in blk_rq_aligned() call
2nd argument of blk_rq_aligned() has changed to 'unsigned long' by
the previous commit 'block: fix an address space warning in blk-map.c'.
That commit neglected to update a user of that function.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Will Drewry [Tue, 31 Aug 2010 20:47:07 +0000 (15:47 -0500)]
init: add support for root devices specified by partition UUID
This is the third patch in a series which adds support for
storing partition metadata, optionally, off of the hd_struct.
One major use for that data is being able to resolve partition
by other identities than just the index on a block device. Device
enumeration varies by platform and there's a benefit to being able
to use something like EFI GPT's GUIDs to determine the correct
block device and partition to mount as the root.
This change adds that support to root= by adding support for
the following syntax:
root=PARTUUID=hex-uuid
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Will Drewry [Tue, 31 Aug 2010 20:47:06 +0000 (15:47 -0500)]
genhd, efi: add efi partition metadata to hd_structs
This change extends the partition_meta_info structure to
support EFI GPT-specific metadata and ensures that data
is copied in on partition scanning.
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Will Drewry [Tue, 31 Aug 2010 20:47:05 +0000 (15:47 -0500)]
block, partition: add partition_meta_info to hd_struct
I'm reposting this patch series as v4 since there have been no additional
comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches
are against Linus's tree:
2bfc96a127bc1cc94d26bfaa40159966064f9c8c
(2.6.36-rc3).
Would this patchset be suitable for inclusion in an mm branch?
This changes adds a partition_meta_info struct which itself contains a
union of structures that provide partition table specific metadata.
This change leaves the union empty. The subsequent patch includes an
implementation for CONFIG_EFI_PARTITION-based metadata.
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Namhyung Kim [Wed, 15 Sep 2010 11:08:27 +0000 (13:08 +0200)]
block: fix an address space warning in blk-map.c
Change type of 2nd parameter of blk_rq_aligned() into unsigned long
and remove unnecessary casting. Now we can call it with 'uaddr'
instead of 'ubuf' in __blk_rq_map_user() so that it can remove
following warnings from sparse:
block/blk-map.c:57:31: warning: incorrect type in argument 2 (different address spaces)
block/blk-map.c:57:31: expected void *addr
block/blk-map.c:57:31: got void [noderef] <asn:1>*ubuf
However blk_rq_map_kern() needs one more local variable to handle it.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
San Mehat [Tue, 14 Sep 2010 06:48:01 +0000 (08:48 +0200)]
block: block_dump: Add number of sectors to debug output
Signed-off-by: San Mehat <san@android.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Christof Schmitt [Fri, 10 Sep 2010 18:50:40 +0000 (20:50 +0200)]
zfcp: Report scatter gather limit for DIX protection information
When sending DIX integrity segments with an I/O request, the
restriction for the maximum number of segments is still the same for
the zfcp hardware. Report the new sg_prot_tablesize for the SCSI host,
so that the number of integrity segments plus the number of data
segments is not larger than the hardware limit. This results in using
half of the hardware segments for integrity data and the other half
for regular data.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
Martin K. Petersen [Fri, 10 Sep 2010 18:50:10 +0000 (20:50 +0200)]
block/scsi: Provide a limit on the number of integrity segments
Some controllers have a hardware limit on the number of protection
information scatter-gather list segments they can handle.
Introduce a max_integrity_segments limit in the block layer and provide
a new scsi_host_template setting that allows HBA drivers to provide a
value suitable for the hardware.
Add support for honoring the integrity segment limit when merging both
bios and requests.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
Martin K. Petersen [Fri, 10 Sep 2010 18:07:38 +0000 (20:07 +0200)]
Consolidate min_not_zero
We have several users of min_not_zero, each of them using their own
definition. Move the define to kernel.h.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
Linus Torvalds [Mon, 23 Aug 2010 00:43:29 +0000 (17:43 -0700)]
Linux 2.6.36-rc2
Linus Torvalds [Sun, 22 Aug 2010 18:27:36 +0000 (11:27 -0700)]
Merge branch 'kvm-updates/2.6.36' of git://git./virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PIT: free irq source id in handling error path
KVM: destroy workqueue on kvm_create_pit() failures
KVM: fix poison overwritten caused by using wrong xstate size
Linus Torvalds [Sun, 22 Aug 2010 18:03:27 +0000 (11:03 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (58 commits)
drm/i915,intel_agp: Add support for Sandybridge D0
drm/i915: fix render pipe control notify on sandybridge
agp/intel: set 40-bit dma mask on Sandybridge
drm/i915: Remove the conflicting BUG_ON()
drm/i915/suspend: s/IS_IRONLAKE/HAS_PCH_SPLIT/
drm/i915/suspend: Flush register writes before busy-waiting.
i915: disable DAC on Ironlake also when doing CRT load detection.
drm/i915: wait for actual vblank, not just 20ms
drm/i915: make sure eDP PLL is enabled at the right time
drm/i915: fix VGA plane disable for Ironlake+
drm/i915: eDP mode set sequence corrections
drm/i915: add panel reset workaround
drm/i915: Enable RC6 on Ironlake.
drm/i915/sdvo: Only set is_lvds if we have a valid fixed mode.
drm/i915: Set up a render context on Ironlake
drm/i915 invalidate indirect state pointers at end of ring exec
drm/i915: Wake-up wait_request() from elapsed hang-check (v2)
drm/i915: Apply i830 errata for cursor alignment
drm/i915: Only update i845/i865 CURBASE when disabled (v2)
drm/i915: FBC is updated within set_base() so remove second call in mode_set()
...
Linus Torvalds [Sun, 22 Aug 2010 17:08:52 +0000 (10:08 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/penberg/slab-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slab: fix object alignment
slub: add missing __percpu markup in mm/slub_def.h
Linus Torvalds [Sun, 22 Aug 2010 16:44:47 +0000 (09:44 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: wait for discard to finish
Zhenyu Wang [Thu, 19 Aug 2010 01:46:16 +0000 (09:46 +0800)]
drm/i915,intel_agp: Add support for Sandybridge D0
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Zhenyu Wang [Thu, 19 Aug 2010 01:46:15 +0000 (09:46 +0800)]
drm/i915: fix render pipe control notify on sandybridge
This one is missed in last pipe control fix for sandybridge,
that really unmask interrupt bit for notify in render engine IMR.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Zhenyu Wang [Thu, 19 Aug 2010 01:46:13 +0000 (09:46 +0800)]
agp/intel: set 40-bit dma mask on Sandybridge
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Chris Wilson [Sun, 15 Aug 2010 09:52:34 +0000 (10:52 +0100)]
drm/i915: Remove the conflicting BUG_ON()
We now attempt to free "active" objects following a GPU hang as either
the GPU will be reset or the hang is permenant. In either case, the GPU
writes will not be flushed to main memory and it should be safe to
return that memory back to the system.
The BUG_ON(active) is thus overkill and can erroneously fire after a
EIO.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Chris Wilson [Sat, 14 Aug 2010 13:41:23 +0000 (14:41 +0100)]
drm/i915/suspend: s/IS_IRONLAKE/HAS_PCH_SPLIT/
For the shared paths on the next generation chipsets.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Chris Wilson [Sat, 14 Aug 2010 13:41:22 +0000 (14:41 +0100)]
drm/i915/suspend: Flush register writes before busy-waiting.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Dave Airlie [Wed, 4 Aug 2010 05:52:19 +0000 (15:52 +1000)]
i915: disable DAC on Ironlake also when doing CRT load detection.
Like on Sandybridge, disabling the DAC here when doing CRT load detect
avoids forever hangs waiting on the hardware.
test procedure on HP 2740p:
boot with no VGA plugged in, start X,
plug in VGA monitor (1280x1024)
chvt 3
machine hangs waiting forever.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Jesse Barnes [Wed, 18 Aug 2010 20:20:54 +0000 (13:20 -0700)]
drm/i915: wait for actual vblank, not just 20ms
Waiting for a hard coded 20ms isn't always enough to make sure a vblank
period has actually occurred, so add code to make sure we really have
passed through a vblank period (or that the pipe is off when disabling).
This prevents problems with mode setting and link training, and seems to
fix a bug like https://bugs.freedesktop.org/show_bug.cgi?id=29278, but
on an HP 8440p instead. Hopefully also fixes
https://bugs.freedesktop.org/show_bug.cgi?id=29141.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Arjan van de Ven [Sat, 21 Aug 2010 20:07:26 +0000 (13:07 -0700)]
workqueue: Add basic tracepoints to track workqueue execution
With the introduction of the new unified work queue thread pools,
we lost one feature: It's no longer possible to know which worker
is causing the CPU to wake out of idle. The result is that PowerTOP
now reports a lot of "kworker/a:b" instead of more readable results.
This patch adds a pair of tracepoints to the new workqueue code,
similar in style to the timer/hrtimer tracepoints.
With this pair of tracepoints, the next PowerTOP can correctly
report which work item caused the wakeup (and how long it took):
Interrupt (43) i915 time 3.51ms wakeups 141
Work ieee80211_iface_work time 0.81ms wakeups 29
Work do_dbs_timer time 0.55ms wakeups 24
Process Xorg time 21.36ms wakeups 4
Timer sched_rt_period_timer time 0.01ms wakeups 1
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 21 Aug 2010 19:47:05 +0000 (12:47 -0700)]
Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
mtd: nand: Fix probe of Samsung NAND chips
mtd: nand: Fix regression in BBM detection
pxa3xx: fix ns2cycle equation
Samuel Thibault [Sat, 21 Aug 2010 19:32:41 +0000 (21:32 +0200)]
Replace Configure with Enable in description of MAXSMP
The "Configure" word tends to make user believe they have to say 'yes'
to be able to choose the number of procs/nodes. "Enable" should be
unambiguous enough.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Aug 2010 23:49:40 +0000 (16:49 -0700)]
mm: make stack guard page logic use vm_prev pointer
Like the mlock() change previously, this makes the stack guard check
code use vma->vm_prev to see what the mapping below the current stack
is, rather than have to look it up with find_vma().
Also, accept an abutting stack segment, since that happens naturally if
you split the stack with mlock or mprotect.
Tested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Aug 2010 23:39:25 +0000 (16:39 -0700)]
mm: make the mlock() stack guard page checks stricter
If we've split the stack vma, only the lowest one has the guard page.
Now that we have a doubly linked list of vma's, checking this is trivial.
Tested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Aug 2010 23:24:55 +0000 (16:24 -0700)]
mm: make the vma list be doubly linked
It's a really simple list, and several of the users want to go backwards
in it to find the previous vma. So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.
Tested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tilman Sauerbeck [Fri, 20 Aug 2010 21:01:47 +0000 (14:01 -0700)]
mtd: nand: Fix probe of Samsung NAND chips
Apparently, the check for a 6-byte ID string introduced by commit
426c457a3216fac74e3d44dd39729b0689f4c7ab ("mtd: nand: extend NAND flash
detection to new MLC chips") is NOT sufficient to determine whether or
not a Samsung chip uses their new MLC detection scheme or the old,
standard scheme. This adds a condition to check cell type.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
Signed-off-by: Brian Norris <norris@broadcom.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
Linus Torvalds [Fri, 20 Aug 2010 21:25:08 +0000 (14:25 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, apic: Fix apic=debug boot crash
x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
x86-32: Fix dummy trampoline-related inline stubs
x86-32: Separate 1:1 pagetables from swapper_pg_dir
x86, cpu: Fix regression in AMD errata checking code
Stephen Rothwell [Fri, 20 Aug 2010 09:56:31 +0000 (19:56 +1000)]
Documentation: fix ozlabs.org mailing list address
This list moved to lists.ozlabs.org quite some time ago.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Rothwell [Fri, 20 Aug 2010 09:52:45 +0000 (19:52 +1000)]
MAINTAINERS: Fix ozlabs.org mailing list addresses
All these lists moved to lists.ozlabs.org quite a while ago.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stefan Richter [Thu, 19 Aug 2010 21:13:43 +0000 (14:13 -0700)]
Documentation: kernel-locking: mutex_trylock cannot be used in interrupt context
Chapter 6 is right about mutex_trylock, but chapter 10 wasn't. This error
was introduced during semaphore-to-mutex conversion of the Unreliable
guide. :-)
If user context which performs mutex_lock() or mutex_trylock() is
preempted by interrupt context which performs mutex_trylock() on the same
mutex instance, a deadlock occurs. This is because these functions do not
disable local IRQs when they operate on mutex->wait_lock.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 19 Aug 2010 21:13:42 +0000 (14:13 -0700)]
drivers/scsi/qla4xxx: fix build
gcc-4.0.2:
drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4_8xxx_error_recovery':
drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
drivers/scsi/qla4xxx/ql4_os.c:2377: sorry, unimplemented: called from here
drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
drivers/scsi/qla4xxx/ql4_os.c:2393: sorry, unimplemented: called from here
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Thu, 19 Aug 2010 21:13:40 +0000 (14:13 -0700)]
uml: fix compile error in dma_get_cache_alignment()
Fix uml compile error:
include/linux/dma-mapping.h:145: error: redefinition of 'dma_get_cache_alignment'
arch/um/include/asm/dma-mapping.h:99: note: previous definition of 'dma_get_cache_alignment' was here
Introduced by commit
4565f0170dfc ("dma-mapping: unify
dma_get_cache_alignment implementations")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Thu, 19 Aug 2010 21:13:39 +0000 (14:13 -0700)]
oom: __task_cred() need rcu_read_lock()
dump_tasks() needs to hold the RCU read lock around its access of the
target task's UID. To this end it should use task_uid() as it only needs
that one thing from the creds.
The fact that dump_tasks() holds tasklist_lock is insufficient to prevent the
target process replacing its credentials on another CPU.
Then, this patch change to call rcu_read_lock() explicitly.
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
mm/oom_kill.c:410 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
4 locks held by kworker/1:2/651:
#0: (events){+.+.+.}, at: [<
ffffffff8106aae7>]
process_one_work+0x137/0x4a0
#1: (moom_work){+.+...}, at: [<
ffffffff8106aae7>]
process_one_work+0x137/0x4a0
#2: (tasklist_lock){.+.+..}, at: [<
ffffffff810fafd4>]
out_of_memory+0x164/0x3f0
#3: (&(&p->alloc_lock)->rlock){+.+...}, at: [<
ffffffff810fa48e>]
find_lock_task_mm+0x2e/0x70
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Thu, 19 Aug 2010 21:13:39 +0000 (14:13 -0700)]
oom: fix tasklist_lock leak
Commit
0aad4b3124 ("oom: fold __out_of_memory into out_of_memory")
introduced a tasklist_lock leak. Then it caused following obvious
danger warnings and panic.
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
rsyslogd/1422 is leaving the kernel with locks still held!
1 lock held by rsyslogd/1422:
#0: (tasklist_lock){.+.+.+}, at: [<
ffffffff810faf64>] out_of_memory+0x164/0x3f0
BUG: scheduling while atomic: rsyslogd/1422/0x00000002
INFO: lockdep is turned off.
This patch fixes it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Thu, 19 Aug 2010 21:13:38 +0000 (14:13 -0700)]
oom: fix NULL pointer dereference
Commit
b940fd7035 ("oom: remove unnecessary code and cleanup") added an
unnecessary NULL pointer dereference. remove it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyungmin Park [Thu, 19 Aug 2010 21:13:37 +0000 (14:13 -0700)]
drivers/mmc/host/sdhci-s3c.c: use the correct mutex and card detect function
There's some merge problem between sdhic core and sdhci-s3c host. After
mutex is changed to spinlock. It needs to use use spin lock functions and
use the correct card detection function.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyungmin Park [Thu, 19 Aug 2010 21:13:35 +0000 (14:13 -0700)]
sdhci: add no hi-speed bit quirk support
Some SDHCI controllers like s5pc110 don't have an HISPD bit in the HOSTCTL
register.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyungmin Park [Thu, 19 Aug 2010 21:13:35 +0000 (14:13 -0700)]
s5pc110: SDHCI-s3c support on s5pc110
s5pc110 (aka s5pv210) uses the same SDHCI IP.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyungmin Park [Thu, 19 Aug 2010 21:13:34 +0000 (14:13 -0700)]
s5pc110: SDHCI-s3c can override host capabilities
Each board can override the default sdhci host capabilities.
Some board has broken features by hardwares and support 8-bit bandwidth.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 19 Aug 2010 21:13:33 +0000 (14:13 -0700)]
lib/radix-tree.c: fix overflow in radix_tree_range_tag_if_tagged()
When radix_tree_maxindex() is ~0UL, it can happen that scanning overflows
index and tree traversal code goes astray reading memory until it hits
unreadable memory. Check for overflow and exit in that case.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 19 Aug 2010 21:13:31 +0000 (14:13 -0700)]
revert "hwmon: f71882fg: add support for the Fintek
F71808E"
Revert commit
7721fea3d0fd93fb4d000eb737b444369358d6d3 ("hwmon:
f71882fg: add support for the Fintek
F71808E").
Hans said:
: A second review after I've received a data sheet for this device from
: Fintek has turned up a few bugs.
:
: Unfortunately Giel (nor I) have time to fix this in time for the 2.6.36
: cycle. Therefor I would like to see this patch reverted as not having any
: support for the hwmon function of this superio chip is better then having
: unreliable support.
Cc: Giel van Schijndel <me@mortis.eu>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Righi [Thu, 19 Aug 2010 21:13:30 +0000 (14:13 -0700)]
kfifo: add explicit error checking in all the examples
Provide a check in all the kfifo examples to validate the correct
execution of each testcase.
Signed-off-by: Andrea Righi <arighi@develer.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Righi [Thu, 19 Aug 2010 21:13:30 +0000 (14:13 -0700)]
kfifo: fix a memory leak in dma example
We use a dynamically allocated kfifo in the dma example, so we need to
free it when unloading the module.
Signed-off-by: Andrea Righi <arighi@develer.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Righi [Thu, 19 Aug 2010 21:13:29 +0000 (14:13 -0700)]
kfifo: fix kernel BUG in dma example
The scatterlist is used uninitialized in kfifo_dma_in_prepare(). This
triggers the following bug if CONFIG_DEBUG_SG=y:
------------[ cut here ]------------
kernel BUG at include/linux/scatterlist.h:65!
invalid opcode: 0000 [#1] PREEMPT SMP
...
Call Trace:
[<
ffffffff810a1eab>] setup_sgl+0x6b/0xe0
[<
ffffffffa03d7000>] ? example_init+0x0/0x265 [dma_example]
[<
ffffffff810a2021>] __kfifo_dma_in_prepare+0x21/0x30
[<
ffffffffa03d7124>] example_init+0x124/0x265 [dma_example]
[<
ffffffff810f9c55>] ? trace_module_notify+0x25/0x370
[<
ffffffff81110c6e>] ? free_pages_prepare+0x11e/0x1e0
[<
ffffffff8106f2b1>] ? get_parent_ip+0x11/0x50
[<
ffffffff810f9c55>] ? trace_module_notify+0x25/0x370
[<
ffffffff810b65fd>] ? trace_hardirqs_on+0xd/0x10
[<
ffffffff814beade>] ? mutex_unlock+0xe/0x10
[<
ffffffff810f9c71>] ? trace_module_notify+0x41/0x370
[<
ffffffff810a77d5>] ? __blocking_notifier_call_chain+0x45/0x80
[<
ffffffff81137b7a>] ? vfree+0x2a/0x30
[<
ffffffff810a6ac3>] ? up_read+0x23/0x40
[<
ffffffff810a77f5>] ? __blocking_notifier_call_chain+0x65/0x80
[<
ffffffff810001e3>] do_one_initcall+0x43/0x180
[<
ffffffff810c577a>] sys_init_module+0xba/0x200
[<
ffffffff8103819b>] system_call_fastpath+0x16/0x1b
RIP [<
ffffffff810a1e31>] setup_sgl_buf+0x1a1/0x1b0
RSP <
ffff88006720dc98>
---[ end trace
a72b979fd3c1d3a5 ]---
Add the proper initialization to avoid the bug.
Signed-off-by: Andrea Righi <arighi@develer.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Righi [Thu, 19 Aug 2010 21:13:29 +0000 (14:13 -0700)]
kfifo: add explicit error checking in byte stream example
Provide a static array of expected items that kfifo should contain at the
end of the test to validate it.
Signed-off-by: Andrea Righi <arighi@develer.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Righi [Thu, 19 Aug 2010 21:13:28 +0000 (14:13 -0700)]
kfifo: add kfifo_skip() testcase
Add a testcase for kfifo_skip() to the byte stream fifo example.
Signed-off-by: Andrea Righi <arighi@develer.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Righi [Thu, 19 Aug 2010 21:13:27 +0000 (14:13 -0700)]
kfifo: implement missing __kfifo_skip_r()
kfifo_skip() is currently broken, due to the missing of the internal
helper function. Add it.
Signed-off-by: Andrea Righi <arighi@develer.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ondrej Zary [Thu, 19 Aug 2010 21:13:25 +0000 (14:13 -0700)]
matroxfb: fix incorrect use of memcpy_toio()
Screen is completely corrupted since 2.6.34. Bisection revealed that it's
caused by commit
6175ddf06b61720 ("x86: Clean up mem*io functions.").
H. Peter Anvin explained that memcpy_toio() does not copy data in 32bit
chunks anymore on x86.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Petr Vandrovec <vandrove@vc.cvut.cz>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: <stable@kernel.org> [2.6.34.x, 2.6.35.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Kiper [Thu, 19 Aug 2010 22:46:16 +0000 (00:46 +0200)]
x86, apic: Fix apic=debug boot crash
Fix a boot crash when apic=debug is used and the APIC is
not properly initialized.
This issue appears during Xen Dom0 kernel boot but the
fix is generic and the crash could occur on real hardware
as well.
Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: jeremy@goop.org
Cc: <stable@kernel.org> # .35.x, .34.x, .33.x, .32.x
LKML-Reference: <
20100819224616.GB9967@router-fw-old.local.net-space.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Borislav Petkov [Thu, 19 Aug 2010 18:10:29 +0000 (20:10 +0200)]
x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.
Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.
[ v2.1: fix !HOTPLUG_CPU build ]
Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <
20100819181029.GC17171@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Linus Torvalds [Thu, 19 Aug 2010 16:06:49 +0000 (09:06 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kprobes/x86: Fix the return address of multiple kretprobes
perf tools: Fix build error on read only source.
perf, x86: Fix Intel-nhm PMU programming errata workaround
Brian Norris [Wed, 18 Aug 2010 18:25:04 +0000 (11:25 -0700)]
mtd: nand: Fix regression in BBM detection
Commit
c7b28e25cb9beb943aead770ff14551b55fa8c79 ("mtd: nand: refactor BB
marker detection") caused a regression in detection of factory-set bad
block markers, especially for certain small-page NAND. This fix removes
some unneeded constraints on using NAND_SMALL_BADBLOCK_POS, making the
detection code more correct.
This regression can be seen, for example, in Hynix HY27US081G1M and
similar.
Signed-off-by: Brian Norris <norris@broadcom.com>
Tested-by: Michael Guntsche <mike@it-loops.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
KUMANO Syuhei [Sun, 15 Aug 2010 06:18:04 +0000 (15:18 +0900)]
kprobes/x86: Fix the return address of multiple kretprobes
Fix the return address of subsequent kretprobes when multiple
kretprobes are set on the same function.
For example:
# cd /sys/kernel/debug/tracing
# echo "r:event1 sys_symlink" > kprobe_events
# echo "r:event2 sys_symlink" >> kprobe_events
# echo 1 > events/kprobes/enable
# ln -s /tmp/foo /tmp/bar
(without this patch)
# cat trace
ln-897 [000] 20404.133727: event1: (kretprobe_trampoline+0x0/0x4c <- sys_symlink)
ln-897 [000] 20404.133747: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)
(with this patch)
# cat trace
ln-740 [000] 13799.491076: event1: (system_call_fastpath+0x16/0x1b <- sys_symlink)
ln-740 [000] 13799.491096: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)
Signed-off-by: KUMANO Syuhei <kumano.prog@gmail.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
LKML-Reference: <
1281853084.3254.11.camel@camp10-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 19 Aug 2010 10:25:29 +0000 (12:25 +0200)]
Merge branch 'perf/urgent' of git://git./linux/kernel/git/acme/linux-2.6 into perf/urgent
Linus Torvalds [Wed, 18 Aug 2010 22:45:23 +0000 (15:45 -0700)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix an Oops in the NFSv4 atomic open code
NFS: Fix the selection of security flavours in Kconfig
NFS: fix the return value of nfs_file_fsync()
rpcrdma: Fix SQ size calculation when memreg is FRMR
xprtrdma: Do not truncate iova_start values in frmr registrations.
nfs: Remove redundant NULL check upon kfree()
nfs: Add "lookupcache" to displayed mount options
NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)
Linus Torvalds [Wed, 18 Aug 2010 22:29:38 +0000 (15:29 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
USB HID: Add ID for eGalax Multitouch used in JooJoo tablet
HID: hiddev: fix memory corruption due to invalid intfdata
HID: hiddev: protect against disconnect/NULL-dereference race
HID: picolcd: correct ordering of framebuffer freeing
HID: picolcd: testing the wrong variable
Linus Torvalds [Wed, 18 Aug 2010 20:27:41 +0000 (13:27 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Fix build error: conflicting types for ‘sys_execve’
Jesse Barnes [Fri, 13 Aug 2010 22:43:26 +0000 (15:43 -0700)]
drm/i915: make sure eDP PLL is enabled at the right time
We need to make sure the eDP PLL is enabled before the pipes or planes,
so do it as part of the DP prepare mode set function.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Jesse Barnes [Fri, 13 Aug 2010 22:11:26 +0000 (15:11 -0700)]
drm/i915: fix VGA plane disable for Ironlake+
We need to use I/O port instructions to access VGA registers on
Ironlake+, and it doesn't hurt on other platforms, so switch the VGA
plane disable function over to using them. Move it to init time as well
while we're at it, no need to repeatedly disable the VGA plane with
every mode set and DPMS event.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Jesse Barnes [Wed, 11 Aug 2010 17:06:44 +0000 (10:06 -0700)]
drm/i915: eDP mode set sequence corrections
We should disable the panel first when shutting down an eDP link. And
when turning one on, the panel needs to be enabled before link training
or eDP I/O won't be enabled.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Jesse Barnes [Wed, 11 Aug 2010 17:04:43 +0000 (10:04 -0700)]
drm/i915: add panel reset workaround
Ironlake requires that we clear the reset panel bit during power
sequences and restore it afterwards. Uncondtionally add code to do that
since it should be harmless on SNB+.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
H. Peter Anvin [Wed, 18 Aug 2010 18:42:23 +0000 (11:42 -0700)]
x86-32: Fix dummy trampoline-related inline stubs
Fix dummy inline stubs for trampoline-related functions when no
trampolines exist (until we get rid of the no-trampoline case
entirely.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <
4C6C294D.
3030404@zytor.com>
David Howells [Wed, 18 Aug 2010 17:55:33 +0000 (18:55 +0100)]
Fix the declaration of sys_execve() in asm-generic/syscalls.h
Fix the declaration of sys_execve() in asm-generic/syscalls.h to have
various consts applied to its pointers.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tony Luck [Wed, 18 Aug 2010 17:17:44 +0000 (10:17 -0700)]
[IA64] Fix build error: conflicting types for ‘sys_execve’
arch/ia64/kernel/process.c:636: error: conflicting types for ‘sys_execve’
commit
d7627467b7a8dd6944885290a03a07ceb28c10eb
Make do_execve() take a const filename pointer
Missed the declaration of sys_execve in the ia64 asm/unistd.h (perhaps
because there is no reason for it to be there ... it might be a left over
from the COMPAT code?). Just delete the conflicting version.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Wed, 18 Aug 2010 16:35:08 +0000 (09:35 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: brlock vfsmount_lock
fs: scale files_lock
lglock: introduce special lglock and brlock spin locks
tty: fix fu_list abuse
fs: cleanup files_lock locking
fs: remove extra lookup in __lookup_hash
fs: fs_struct rwlock to spinlock
apparmor: use task path helpers
fs: dentry allocation consolidation
fs: fix do_lookup false negative
mbcache: Limit the maximum number of cache entries
hostfs ->follow_link() braino
hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
remove SWRITE* I/O types
kill BH_Ordered flag
vfs: update ctime when changing the file's permission by setfacl
cramfs: only unlock new inodes
fix reiserfs_evict_inode end_writeback second call
Uwe Kleine-König [Wed, 18 Aug 2010 16:25:38 +0000 (09:25 -0700)]
mmc: build fix: mmc_pm_notify is only available with CONFIG_PM=y
This fixes a build breakage introduced by commit
4c2ef25fe0b8 ("mmc: fix
all hangs related to mmc/sd card insert/removal during suspend/resume")
Cc: David Brownell <david-b@pacbell.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-mmc@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Acked-by: Maxim Levitsky <maximlevitsky@gmail.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kusanagi Kouichi [Wed, 18 Aug 2010 16:32:37 +0000 (13:32 -0300)]
perf tools: Fix build error on read only source.
Parts of the build process were generating files outside the specified
O= directory, causing the build to fail on systems where the sources are
in a read only file system.
Fix it by using $(OUTPUT) on these locations.
Also check that $(OUTPUT) actually exists, just like the top level
kernel Makefile does. Otherwise the failure message emitted is
completely misleading.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <
20100817140841.
0859362C03A@msa106.auone-net.jp>
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Wed, 18 Aug 2010 16:32:13 +0000 (09:32 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Fix build on POSIX shells
latencytop: Fix kconfig dependency warnings
perf annotate tui: Fix exit and RIGHT keys handling
tracing: Sanitize value returned from write(trace_marker, "...", len)
tracing/events: Convert format output to seq_file
tracing: Extend recordmcount to better support Blackfin mcount
tracing: Fix ring_buffer_read_page reading out of page boundary
tracing: Fix an unallocated memory access in function_graph
Linus Torvalds [Wed, 18 Aug 2010 16:30:08 +0000 (09:30 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter)
ALSA: hda - Fix ALC680 base model capture
ASoC: Remove DSP mode support for WM8776
ALSA: hda - Add quirk for Dell Vostro 1220
ALSA: riptide - Fix detection / load of firmware files
Linus Torvalds [Wed, 18 Aug 2010 16:27:10 +0000 (09:27 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/gerg/m68knommu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: include sched.h in ColdFire/SPI driver
m68knommu: formatting of pointers in printk()
m68knommu: arch/m68k/include/asm/ide.h fix for nommu
Linus Torvalds [Wed, 18 Aug 2010 16:26:42 +0000 (09:26 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md raid-1/10 Fix bio_rw bit manipulations again
md: provide appropriate return value for spare_active functions.
md: Notify sysfs when RAID1/5/10 disk is In_sync.
Update recovery_offset even when external metadata is used.
Linus Torvalds [Wed, 18 Aug 2010 16:26:17 +0000 (09:26 -0700)]
Merge branch 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6
* 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6:
spi.h: missing kernel-doc notation, please fix
of: fix missing headers for of_address_to_resource() in MTD and SysACE drivers
of: Fix missing includes
ata: update for of_device to platform_device replacement
microblaze: Fix of: eliminate of_device->node and dev_archdata->{of,prom}_node
microblaze: Fix of/address: Merge all of the bus translation code
booting-without-of: Remove nonexistent chapters from TOC, fix numbering
Joerg Roedel [Mon, 16 Aug 2010 12:38:33 +0000 (14:38 +0200)]
x86-32: Separate 1:1 pagetables from swapper_pg_dir
This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:
1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.
2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.
3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.
4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.
5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.
6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.
7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.
There are two ways to fix this issue:
1. Always do a global TLB flush when a new cr3 is loaded and the
old page table was swapper_pg_dir. I consider this a hack hard
to understand and with performance implications
2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
does.
This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.
-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <
20100816123833.GB28147@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Hans Rosenfeld [Wed, 18 Aug 2010 14:19:50 +0000 (16:19 +0200)]
x86, cpu: Fix regression in AMD errata checking code
A bug in the family-model-stepping matching code caused the presence of
errata to go undetected when OSVW was not used. This causes hangs on
some K8 systems because the E400 workaround is not enabled.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
LKML-Reference: <
1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Ryusuke Konishi [Wed, 18 Aug 2010 12:11:11 +0000 (21:11 +0900)]
nilfs2: wait for discard to finish
nilfs_discard_segment() doesn't wait for completion of discard
requests. This specifies BLKDEV_IFL_WAIT flag when calling
blkdev_issue_discard() in order to fix the sync failure.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@lst.de>
Trond Myklebust [Wed, 18 Aug 2010 13:25:42 +0000 (09:25 -0400)]
NFS: Fix an Oops in the NFSv4 atomic open code
Adam Lackorzynski reports:
with 2.6.35.2 I'm getting this reproducible Oops:
[ 110.825396] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 110.828638] IP: [<
ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[ 110.828638] PGD
be89f067 PUD
bf18f067 PMD 0
[ 110.828638] Oops: 0000 [#1] SMP
[ 110.828638] last sysfs file: /sys/class/net/lo/operstate
[ 110.828638] CPU 2
[ 110.828638] Modules linked in: rtc_cmos rtc_core rtc_lib amd64_edac_mod
i2c_amd756 edac_core i2c_core dm_mirror dm_region_hash dm_log dm_snapshot
sg sr_mod usb_storage ohci_hcd mptspi tg3 mptscsih mptbase usbcore nls_base
[last unloaded: scsi_wait_scan]
[ 110.828638]
[ 110.828638] Pid: 11264, comm: setchecksum Not tainted 2.6.35.2 #1
[ 110.828638] RIP: 0010:[<
ffffffff811247b7>] [<
ffffffff811247b7>]
encode_attrs+0x1a/0x2a4
[ 110.828638] RSP: 0000:
ffff88003bf5b878 EFLAGS:
00010296
[ 110.828638] RAX:
ffff8800bddb48a8 RBX:
ffff88003bf5bb18 RCX:
0000000000000000
[ 110.828638] RDX:
ffff8800be258800 RSI:
0000000000000000 RDI:
ffff88003bf5b9f8
[ 110.828638] RBP:
0000000000000000 R08:
ffff8800bddb48a8 R09:
0000000000000004
[ 110.828638] R10:
0000000000000003 R11:
ffff8800be779000 R12:
ffff8800be258800
[ 110.828638] R13:
ffff88003bf5b9f8 R14:
ffff88003bf5bb20 R15:
ffff8800be258800
[ 110.828638] FS:
0000000000000000(0000) GS:
ffff880041e00000(0063)
knlGS:
00000000556bd6b0
[ 110.828638] CS: 0010 DS: 002b ES: 002b CR0:
000000008005003b
[ 110.828638] CR2:
0000000000000000 CR3:
00000000be8ef000 CR4:
00000000000006e0
[ 110.828638] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 110.828638] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
[ 110.828638] Process setchecksum (pid: 11264, threadinfo
ffff88003bf5a000, task
ffff88003f232210)
[ 110.828638] Stack:
[ 110.828638]
0000000000000000 ffff8800bfbcf920 0000000000000000
0000000000000ffe
[ 110.828638] <0>
0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 110.828638] <0>
0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 110.828638] Call Trace:
[ 110.828638] [<
ffffffff81124c1f>] ? nfs4_xdr_enc_setattr+0x90/0xb4
[ 110.828638] [<
ffffffff81371161>] ? call_transmit+0x1c3/0x24a
[ 110.828638] [<
ffffffff813774d9>] ? __rpc_execute+0x78/0x22a
[ 110.828638] [<
ffffffff81371a91>] ? rpc_run_task+0x21/0x2b
[ 110.828638] [<
ffffffff81371b7e>] ? rpc_call_sync+0x3d/0x5d
[ 110.828638] [<
ffffffff8111e284>] ? _nfs4_do_setattr+0x11b/0x147
[ 110.828638] [<
ffffffff81109466>] ? nfs_init_locked+0x0/0x32
[ 110.828638] [<
ffffffff810ac521>] ? ifind+0x4e/0x90
[ 110.828638] [<
ffffffff8111e2fb>] ? nfs4_do_setattr+0x4b/0x6e
[ 110.828638] [<
ffffffff8111e634>] ? nfs4_do_open+0x291/0x3a6
[ 110.828638] [<
ffffffff8111ed81>] ? nfs4_open_revalidate+0x63/0x14a
[ 110.828638] [<
ffffffff811056c4>] ? nfs_open_revalidate+0xd7/0x161
[ 110.828638] [<
ffffffff810a2de4>] ? do_lookup+0x1a4/0x201
[ 110.828638] [<
ffffffff810a4733>] ? link_path_walk+0x6a/0x9d5
[ 110.828638] [<
ffffffff810a42b6>] ? do_last+0x17b/0x58e
[ 110.828638] [<
ffffffff810a5fbe>] ? do_filp_open+0x1bd/0x56e
[ 110.828638] [<
ffffffff811cd5e0>] ? _atomic_dec_and_lock+0x30/0x48
[ 110.828638] [<
ffffffff810a9b1b>] ? dput+0x37/0x152
[ 110.828638] [<
ffffffff810ae063>] ? alloc_fd+0x69/0x10a
[ 110.828638] [<
ffffffff81099f39>] ? do_sys_open+0x56/0x100
[ 110.828638] [<
ffffffff81027a22>] ? ia32_sysret+0x0/0x5
[ 110.828638] Code: 83 f1 01 e8 f5 ca ff ff 48 83 c4 50 5b 5d 41 5c c3 41
57 41 56 41 55 49 89 fd 41 54 49 89 d4 55 48 89 f5 53 48 81 ec 18 01 00 00
<8b> 06 89 c2 83 e2 08 83 fa 01 19 db 83 e3 f8 83 c3 18 a8 01 8d
[ 110.828638] RIP [<
ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[ 110.828638] RSP <
ffff88003bf5b878>
[ 110.828638] CR2:
0000000000000000
[ 112.840396] ---[ end trace
95282e83fd77358f ]---
We need to ensure that the O_EXCL flag is turned off if the user doesn't
set O_CREAT.
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Takashi Iwai [Wed, 18 Aug 2010 13:22:18 +0000 (15:22 +0200)]
Merge branch 'fix/asoc' into for-linus
Takashi Iwai [Wed, 18 Aug 2010 13:22:15 +0000 (15:22 +0200)]
Merge branch 'fix/hda' into for-linus
Jaroslav Kysela [Wed, 18 Aug 2010 12:08:17 +0000 (14:08 +0200)]
ALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter)
With some hardware combinations, the PCM interrupts are acknowledged
before the period boundary from the emu10k1 chip. The midlevel PCM code
gets confused and the playback stream is interrupted.
It seems that the interrupt processing shift by 2 samples is enough
to fix this issue. This default value does not harm other,
non-affected hardware.
More information: Kernel bugzilla bug#16300
[A copmile warning fixed by tiwai]
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Nick Piggin [Tue, 17 Aug 2010 18:37:39 +0000 (04:37 +1000)]
fs: brlock vfsmount_lock
fs: brlock vfsmount_lock
Use a brlock for the vfsmount lock. It must be taken for write whenever
modifying the mount hash or associated fields, and may be taken for read when
performing mount hash lookups.
A new lock is added for the mnt-id allocator, so it doesn't need to take
the heavy vfsmount write-lock.
The number of atomics should remain the same for fastpath rlock cases, though
code would be slightly slower due to per-cpu access. Scalability is not not be
much improved in common cases yet, due to other locks (ie. dcache_lock) getting
in the way. However path lookups crossing mountpoints should be one case where
scalability is improved (currently requiring the global lock).
The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
Altix system (high latency to remote nodes), a simple umount microbenchmark
(mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
took 6.8s, afterwards took 7.1s, about 5% slower.
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Nick Piggin [Tue, 17 Aug 2010 18:37:38 +0000 (04:37 +1000)]
fs: scale files_lock
fs: scale files_lock
Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).
One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.
However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.
A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.
Testing results:
On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.
Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
kbuild -j16 locks=
2281913 cpu-hits=
2208126 (96.8%) node-hits=
2252674 (98.7%)
dbench 64 locks=
4306582 cpu-hits=
4287247 (99.6%) node-hits=
4299527 (99.8%)
So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.
Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.
throughput
2.6.34-rc2 24.5
+patch 24.9
us sys idle IO wait (in %)
2.6.34-rc2 51.25 28.25 17.25 3.25
+patch 53.75 18.5 19 8.75
So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.
Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.
Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Nick Piggin [Tue, 17 Aug 2010 18:37:37 +0000 (04:37 +1000)]
lglock: introduce special lglock and brlock spin locks
lglock: introduce special lglock and brlock spin locks
This patch introduces "local-global" locks (lglocks). These can be used to:
- Provide fast exclusive access to per-CPU data, with exclusive access to
another CPU's data allowed but possibly subject to contention, and to provide
very slow exclusive access to all per-CPU data.
- Or to provide very fast and scalable read serialisation, and to provide
very slow exclusive serialisation of data (not necessarily per-CPU data).
Brlocks are also implemented as a short-hand notation for the latter use
case.
Thanks to Paul for local/global naming convention.
Cc: linux-kernel@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Nick Piggin [Tue, 17 Aug 2010 18:37:36 +0000 (04:37 +1000)]
tty: fix fu_list abuse
tty: fix fu_list abuse
tty code abuses fu_list, which causes a bug in remount,ro handling.
If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".
Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.
The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.
[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Nick Piggin [Tue, 17 Aug 2010 18:37:35 +0000 (04:37 +1000)]
fs: cleanup files_lock locking
fs: cleanup files_lock locking
Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>