Lars Ellenberg [Thu, 27 May 2010 09:51:56 +0000 (11:51 +0200)]
drbd: use drbd specific ratelimit instead of global printk_ratelimit
using the global printk_ratelimit() may mask other messages.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Lars Ellenberg [Thu, 27 May 2010 07:45:45 +0000 (09:45 +0200)]
drbd: fix hang on local read errors while disconnected
"canceled" w_read_retry_remote never completed, if they have been
canceled after drbd_disconnect connection teardown cleanup has already
run (or we are currently not connected anyways).
Fixed by not queueing a remote retry if we already know it won't work
(pdsk not uptodate), and cleanup ourselves on "cancel", in case we hit a
race with drbd_disconnect.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Philipp Reisner [Wed, 26 May 2010 15:13:18 +0000 (17:13 +0200)]
drbd: Removed the now empty w_io_error() function
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Andrea Gelmini [Sun, 23 May 2010 19:48:13 +0000 (21:48 +0200)]
drbd: removed duplicated #includes
drbd/drbd_receiver.c: linux/mm.h is included more than once.
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Lars Ellenberg [Tue, 25 May 2010 14:26:16 +0000 (16:26 +0200)]
drbd: improve usage of MSG_MORE
It seems to improve performance if we allow the "p_data" header in its
own frame (no MSG_MORE), but sendpage all but the last page with MSG_MORE.
This is also in preparation of a later zero copy receive implementation.
Suggested by Eduard.Guzovsky@stratus.com on drbd-dev.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Lars Ellenberg [Tue, 25 May 2010 14:18:01 +0000 (16:18 +0200)]
drbd: need to set socket bufsize early to take effect
quoting tcp(7):
On individual connections, the socket buffer size must be set prior to the
listen(2) or connect(2) calls in order to have it take effect.
This adds a wrapper to do so, and uses it appropriately.
Improves performance in certain situations.
Note that because we cannot easily determine which socket will be
"meta" and wich "data" (bulk) socket, we adjust both sockets.
Previously, DRBD only adjusted the bufsizes of the "data" socket.
Thanks again to Eduard.Guzovsky@stratus.com.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Lars Ellenberg [Tue, 25 May 2010 12:23:57 +0000 (14:23 +0200)]
drbd: improve network latency, TCP_QUICKACK
On Thu, Apr 29, 2010 at 04:00:50PM -0400, Eduard.Guzovsky@stratus.com
wrote on drbd-dev@lists.linbit.com
Subject: [Drbd-dev] DRBD small synchronous writes performance improvements
> 1. TCP_QUICKACK option is set incorrectly. The goal was force TCP to
> send and ACK as a "one time" event. Instead the code permanently sets
> connection in the QUICKACK mode.
He is right, we actually want to use an even val with TCP_QUICKACK.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Philipp Reisner [Tue, 25 May 2010 12:32:03 +0000 (14:32 +0200)]
drbd: Revert "drbd: Create new current UUID as late as possible"
The late-UUID writing is delayed until the next release.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Nick Piggin [Wed, 26 May 2010 13:41:14 +0000 (15:41 +0200)]
brd: support discard
Support discard requests in brd by zeroing or deleting the underlying backing
pages. This is simply to help with testing and documentation nature of
brd code.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 1 Jun 2010 09:08:43 +0000 (11:08 +0200)]
Revert "writeback: fix WB_SYNC_NONE writeback from umount"
This reverts commit
e913fc825dc685a444cb4c1d0f9d32f372f59861.
We are investigating a hang associated with the WB_SYNC_NONE changes,
so revert them for now.
Conflicts:
fs/fs-writeback.c
mm/page-writeback.c
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Jens Axboe [Tue, 1 Jun 2010 09:05:22 +0000 (11:05 +0200)]
Revert "writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync"
This reverts commit
7c8a3554c683f512dbcee26faedb42e4c05f12fa.
We are investigating a hang associated with the WB_SYNC_NONE changes,
so revert them for now.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Nick Piggin [Tue, 25 May 2010 08:25:26 +0000 (10:25 +0200)]
fs/splice.c: fix mapping_gfp_mask usage
mapping_gfp_mask() is not supposed to store allocation contex details,
only page location details. So mapping_gfp_mask should be applied to the
pagecache page allocation, wheras normal (kernel mapped) memory should be
used for surrounding allocations such as radix-tree nodes allocated by
add_to_page_cache. Context modifiers should be applied on a per-callsite
basis.
So change splice to follow this convention (which is followed in similar
code patterns in core code).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Shaohua Li [Tue, 25 May 2010 08:16:53 +0000 (10:16 +0200)]
cfq-iosched: fix an oops caused by slab leak
I got below oops when unloading cfq-iosched. Considering scenario:
queue A merge to B, C merge to D and B will be merged to D. Before B is merged
to D, we do split B. We should put B's reference for D.
[ 807.768536] =============================================================================
[ 807.768539] BUG cfq_queue: Objects remaining on kmem_cache_close()
[ 807.768541] -----------------------------------------------------------------------------
[ 807.768543]
[ 807.768546] INFO: Slab 0xffffea0003e6b4e0 objects=26 used=1 fp=0xffff88011d584fd8 flags=0x200000000004082
[ 807.768550] Pid: 5946, comm: rmmod Tainted: G W
2.6.34-07097-gf4b87de-dirty #724
[ 807.768552] Call Trace:
[ 807.768560] [<
ffffffff81104e8d>] slab_err+0x8f/0x9d
[ 807.768564] [<
ffffffff811059e1>] ? flush_cpu_slab+0x0/0x93
[ 807.768569] [<
ffffffff8164be52>] ? add_preempt_count+0xe/0xca
[ 807.768572] [<
ffffffff8164bd9c>] ? sub_preempt_count+0xe/0xb6
[ 807.768577] [<
ffffffff81648871>] ? _raw_spin_unlock+0x15/0x30
[ 807.768580] [<
ffffffff8164bd9c>] ? sub_preempt_count+0xe/0xb6
[ 807.768584] [<
ffffffff811061bc>] list_slab_objects+0x9b/0x19f
[ 807.768588] [<
ffffffff8164bf0a>] ? add_preempt_count+0xc6/0xca
[ 807.768591] [<
ffffffff81109e27>] kmem_cache_destroy+0x13f/0x21d
[ 807.768597] [<
ffffffffa000ff13>] cfq_slab_kill+0x1a/0x43 [cfq_iosched]
[ 807.768601] [<
ffffffffa000ffcf>] cfq_exit+0x93/0x9e [cfq_iosched]
[ 807.768606] [<
ffffffff810973a2>] sys_delete_module+0x1b1/0x219
[ 807.768612] [<
ffffffff8102fb5b>] system_call_fastpath+0x16/0x1b
[ 807.768618] INFO: Object 0xffff88011d584618 @offset=1560
[ 807.768622] INFO: Allocated in cfq_get_queue+0x11e/0x274 [cfq_iosched] age=7173 cpu=1 pid=5496
[ 807.768626] =============================================================================
Cc: stable@kernel.org
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Mon, 24 May 2010 17:34:43 +0000 (19:34 +0200)]
pipe: make F_{GET,SET}PIPE_SZ deal with byte sizes
Instead of requiring an exact number of pages as the argument and
return value, change the API to deal with number of bytes instead.
This also relaxes the requirement that the passed in size must
result in a power-of-2 page array size. Round up to the nearest
power-of-2 automatically and return the resulting size of the pipe
on success.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Mon, 24 May 2010 17:15:57 +0000 (19:15 +0200)]
pipe: F_SETPIPE_SZ should return -EPERM for non-root
If the passed in size is larger than what has been set as the
system wide limit and the user is not root, we want to return
permission denied (not invalid value).
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Mike Snitzer [Mon, 24 May 2010 07:07:32 +0000 (09:07 +0200)]
block: Adjust elv_iosched_show to return "none" for bio-based DM
Bio-based DM doesn't use an elevator (queue is !blk_queue_stackable()).
Longer-term DM will not allocate an elevator for bio-based DM. But even
then there will be small potential for an elevator to be allocated for
a request-based DM table only to have a bio-based table be loaded in the
end.
Displaying "none" for bio-based DM will help avoid user confusion.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Konstantin Khlebnikov [Thu, 20 May 2010 19:21:41 +0000 (23:21 +0400)]
cfq-iosched: compact io_context radix_tree
Use small consequent indexes as radix tree keys instead of sparse cfqd address.
This change will reduce radix tree depth from 11 (6 for 32-bit hosts)
to 1 if host have <=64 disks under cfq control, or to 0 if there only one disk.
So, this patch save 10*560 bytes for each process (5*296 for 32-bit hosts)
For each cfqd allocate cic index from ida.
To unlink dead cic from tree without cfqd access store index into ->key.
(bit 0 -- dead mark, bits 1..30 -- index: ida produce id in range 0..2^31-1)
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Konstantin Khlebnikov [Thu, 20 May 2010 19:21:34 +0000 (23:21 +0400)]
cfq-iosched: remove dead_key from cfq_io_context
Remove ->dead_key field from cfq_io_context to shrink its size to 128 bytes.
(64 bytes for 32-bit hosts)
Use lower bit in ->key as dead-mark, instead of moving key to separate field.
After this for dead cfq_io_context we got cic->key != cfqd automatically.
Thus, io_context's last-hit cache should work without changing.
Now to check ->key for non-dead state compare it with cfqd,
instead of checking ->key for non-null value as it was before.
Plus remove obsolete race protection in cfq_cic_lookup.
This race gone after
v2.6.24-1728-g4ac845a
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Randy Dunlap [Fri, 21 May 2010 19:44:47 +0000 (12:44 -0700)]
fbmem: avoid printk format warning with 32-bit resources
Fix printk formats:
drivers/video/fbmem.c: In function 'fb_do_apertures_overlap':
drivers/video/fbmem.c:1494: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'resource_size_t'
drivers/video/fbmem.c:1494: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'resource_size_t'
drivers/video/fbmem.c:1494: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
drivers/video/fbmem.c:1494: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'resource_size_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Sat, 22 May 2010 03:16:50 +0000 (20:16 -0700)]
linux/elfcore.h: hide kernel functions
The declarations for elf_core_extra_phdrs() et al got added on the
wrong side of #ifdef __KERNEL__ in linux/elfcore.h so they leak into
the user header copy and we get a warning at build time about it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 22 May 2010 02:37:45 +0000 (19:37 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits)
fix handling of offsets in cris eeprom.c, get rid of fake on-stack files
get rid of home-grown mutex in cris eeprom.c
switch ecryptfs_write() to struct inode *, kill on-stack fake files
switch ecryptfs_get_locked_page() to struct inode *
simplify access to ecryptfs inodes in ->readpage() and friends
AFS: Don't put struct file on the stack
Ban ecryptfs over ecryptfs
logfs: replace inode uid,gid,mode initialization with helper function
ufs: replace inode uid,gid,mode initialization with helper function
udf: replace inode uid,gid,mode init with helper
ubifs: replace inode uid,gid,mode initialization with helper function
sysv: replace inode uid,gid,mode initialization with helper function
reiserfs: replace inode uid,gid,mode initialization with helper function
ramfs: replace inode uid,gid,mode initialization with helper function
omfs: replace inode uid,gid,mode initialization with helper function
bfs: replace inode uid,gid,mode initialization with helper function
ocfs2: replace inode uid,gid,mode initialization with helper function
nilfs2: replace inode uid,gid,mode initialization with helper function
minix: replace inode uid,gid,mode init with helper
ext4: replace inode uid,gid,mode init with helper
...
Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)
Linus Torvalds [Sat, 22 May 2010 01:58:52 +0000 (18:58 -0700)]
Merge branch 'linux-next' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (36 commits)
PCI: hotplug: pciehp: Removed check for hotplug of display devices
PCI: read memory ranges out of Broadcom CNB20LE host bridge
PCI: Allow manual resource allocation for PCI hotplug bridges
x86/PCI: make ACPI MCFG reserved error messages ACPI specific
PCI hotplug: Use kmemdup
PM/PCI: Update PCI power management documentation
PCI: output FW warning in pci_read/write_vpd
PCI: fix typos pci_device_dis/enable to pci_dis/enable_device in comments
PCI quirks: disable msi on AMD rs4xx internal gfx bridges
PCI: Disable MSI for MCP55 on P5N32-E SLI
x86/PCI: irq and pci_ids patch for additional Intel Cougar Point DeviceIDs
PCI: aerdrv: trivial cleanup for aerdrv_core.c
PCI: aerdrv: trivial cleanup for aerdrv.c
PCI: aerdrv: introduce default_downstream_reset_link
PCI: aerdrv: rework find_aer_service
PCI: aerdrv: remove is_downstream
PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS
PCI: aerdrv: redefine PCI_ERR_ROOT_*_SRC
PCI: aerdrv: rework do_recovery
PCI: aerdrv: rework get_e_source()
...
Linus Torvalds [Sat, 22 May 2010 00:25:01 +0000 (17:25 -0700)]
Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6:
intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables
intel-iommu: Combine the BIOS DMAR table warning messages
panic: Add taint flag TAINT_FIRMWARE_WORKAROUND ('I')
panic: Allow warnings to set different taint flags
intel-iommu: intel_iommu_map_range failed at very end of address space
intel-iommu: errors with smaller iommu widths
intel-iommu: Fix boot inside 64bit virtualbox with io-apic disabled
intel-iommu: use physfn to search drhd for VF
intel-iommu: Print out iommu seq_id
intel-iommu: Don't complain that ACPI_DMAR_SCOPE_TYPE_IOAPIC is not supported
intel-iommu: Avoid global flushes with caching mode.
intel-iommu: Use correct domain ID when caching mode is enabled
intel-iommu mistakenly uses offset_pfn when caching mode is enabled
intel-iommu: use for_each_set_bit()
intel-iommu: Fix section mismatch dmar_ir_support() uses dmar_tbl.
Linus Torvalds [Sat, 22 May 2010 00:22:52 +0000 (17:22 -0700)]
Merge branch 'virtio' of git://git./linux/kernel/git/rusty/linux-2.6-for-linus
* 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
drivers/char: Eliminate use after free
virtio: console: Accept console size along with resize control message
virtio: console: Store each console's size in the console structure
virtio: console: Resize console port 0 on config intr only if multiport is off
virtio: console: Add support for nonblocking write()s
virtio: console: Rename wait_is_over() to will_read_block()
virtio: console: Don't always create a port 0 if using multiport
virtio: console: Use a control message to add ports
virtio: console: Move code around for future patches
virtio: console: Remove config work handler
virtio: console: Don't call hvc_remove() on unplugging console ports
virtio: console: Return -EPIPE to hvc_console if we lost the connection
virtio: console: Let host know of port or device add failures
virtio: console: Add a __send_control_msg() that can send messages without a valid port
virtio: Revert "virtio: disable multiport console support."
virtio: add_buf_gfp
trans_virtio: use virtqueue_xxx wrappers
virtio-rng: use virtqueue_xxx wrappers
virtio_ring: remove a level of indirection
virtio_net: use virtqueue_xxx wrappers
...
Fix up conflicts in drivers/net/virtio_net.c due to new virtqueue_xxx
wrappers changes conflicting with some other cleanups.
Linus Torvalds [Sat, 22 May 2010 00:16:21 +0000 (17:16 -0700)]
Merge branch 'kvm-updates/2.6.35' of git://git./virt/kvm/kvm
* 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits)
KVM: x86: Add missing locking to arch specific vcpu ioctls
KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
KVM: MMU: Segregate shadow pages with different cr0.wp
KVM: x86: Check LMA bit before set_efer
KVM: Don't allow lmsw to clear cr0.pe
KVM: Add cpuid.txt file
KVM: x86: Tell the guest we'll warn it about tsc stability
x86, paravirt: don't compute pvclock adjustments if we trust the tsc
x86: KVM guest: Try using new kvm clock msrs
KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
KVM: x86: add new KVMCLOCK cpuid feature
KVM: x86: change msr numbers for kvmclock
x86, paravirt: Add a global synchronization point for pvclock
x86, paravirt: Enable pvclock flags in vcpu_time_info structure
KVM: x86: Inject #GP with the right rip on efer writes
KVM: SVM: Don't allow nested guest to VMMCALL into host
KVM: x86: Fix exception reinjection forced to true
KVM: Fix wallclock version writing race
KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
...
Linus Torvalds [Sat, 22 May 2010 00:15:44 +0000 (17:15 -0700)]
Merge branch 'modules' of git://git./linux/kernel/git/rusty/linux-2.6-for-linus
* 'modules' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
module: drop the lock while waiting for module to complete initialization.
MODULE_DEVICE_TABLE(isapnp, ...) does nothing
hisax_fcpcipnp: fix broken isapnp device table.
isapnp: move definitions to mod_devicetable.h so file2alias can reach them.
Linus Torvalds [Sat, 22 May 2010 00:13:24 +0000 (17:13 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/mjg59/platform-drivers-x86
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (32 commits)
Move N014, N051 and CR620 dmi information to load scm dmi table
drivers/platform/x86/eeepc-wmi.c: fix build warning
X86 platfrom wmi: Add debug facility to dump WMI data in a readable way
X86 platform wmi: Also log GUID string when an event happens and debug is set
X86 platform wmi: Introduce debug param to log all WMI events
Clean up all objects used by scm model when driver initial fail or exit
msi-laptop: fix up some coding style issues found by checkpatch
msi-laptop: Add i8042 filter to sync sw state with BIOS when function key pressed
msi-laptop: Set rfkill init state when msi-laptop intiial
msi-laptop: Add MSI CR620 notebook dmi information to scm models table
msi-laptop: Add N014 N051 dmi information to scm models table
drivers/platform/x86: Use kmemdup
drivers/platform/x86: Use kzalloc
drivers/platform/x86: Clarify the MRST IPC driver description slightly
eeepc-wmi: depends on BACKLIGHT_CLASS_DEVICE
IPC driver for Intel Mobile Internet Device (MID) platforms
classmate-laptop: Add RFKILL support.
thinkpad-acpi: document backlight level writeback at driver init
thinkpad-acpi: clean up ACPI handles handling
thinkpad-acpi: don't depend on led_path for led firmware type (v2)
...
Linus Torvalds [Sat, 22 May 2010 00:05:46 +0000 (17:05 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
DMAENGINE: extend the control command to include an arg
async_tx: trim dma_async_tx_descriptor in 'no channel switch' case
DMAENGINE: DMA40 fix for allocation of logical channel 0
DMAENGINE: DMA40 support paused channel status
dmaengine: mpc512x: Use resource_size
DMA ENGINE: Do not reset 'private' of channel
ioat: Remove duplicated devm_kzalloc() calls for ioatdma_device
ioat3: disable cacheline-unaligned transfers for raid operations
ioat2,3: convert to producer/consumer locking
ioat: convert to circ_buf
DMAENGINE: Support for ST-Ericssons DMA40 block v3
async_tx: use of kzalloc/kfree requires the include of slab.h
dmaengine: provide helper for setting txstate
DMAENGINE: generic channel status v2
DMAENGINE: generic slave control v2
dma: timb-dma: Update comment and fix compiler warning
dma: Add timb-dma
DMAENGINE: COH 901 318 fix bytesleft
DMAENGINE: COH 901 318 rename confusing vars
Linus Torvalds [Fri, 21 May 2010 22:49:14 +0000 (15:49 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (45 commits)
md: don't insist on valid event count for spare devices.
md: simplify updating of event count to sometimes avoid updating spares.
md/raid6: Fix raid-6 read-error correction in degraded state
md: restore ability of spare drives to spin down.
md: Fix read balancing in RAID1 and RAID10 on drives > 2TB
md/linear: standardise all printk messages
md/raid0: tidy up printk messages.
md/raid10: tidy up printk messages.
md/raid1: improve printk messages
md/raid5: improve consistency of error messages.
md: remove EXPERIMENTAL designation from RAID10
md: allow integers to be passed to md/level
md: notify mdstat waiters of level change
md/raid4: permit raid0 takeover
md/raid1: delay reads that could overtake behind-writes.
md/raid1: fix confusing 'redirect sector' message.
md: don't unregister the thread in mddev_suspend
md: factor out init code for an mddev
md: pass mddev to make_request functions rather than request_queue
md: call md_stop_writes from md_stop
...
NeilBrown [Fri, 21 May 2010 22:31:36 +0000 (08:31 +1000)]
Merge commit '
3ff195b011d7decf501a4d55aeed312731094796' into for-linus
Conflicts:
drivers/md/md.c
- Resolved conflict in md_update_sb
- Added extra 'NULL' arg to new instance of sysfs_get_dirent.
Signed-off-by: NeilBrown <neilb@suse.de>
Al Viro [Fri, 21 May 2010 15:26:35 +0000 (11:26 -0400)]
fix handling of offsets in cris eeprom.c, get rid of fake on-stack files
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 21 May 2010 15:19:18 +0000 (11:19 -0400)]
get rid of home-grown mutex in cris eeprom.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 21 May 2010 15:09:58 +0000 (11:09 -0400)]
switch ecryptfs_write() to struct inode *, kill on-stack fake files
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 21 May 2010 15:02:14 +0000 (11:02 -0400)]
switch ecryptfs_get_locked_page() to struct inode *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 21 May 2010 14:56:12 +0000 (10:56 -0400)]
simplify access to ecryptfs inodes in ->readpage() and friends
we can get to them from page->mapping->host, no need to mess with
file.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 21 May 2010 14:27:09 +0000 (15:27 +0100)]
AFS: Don't put struct file on the stack
Don't put struct file on the stack as it takes up quite a lot of space
and violates lifetime rules for struct file.
Rather than calling afs_readpage() indirectly from the directory routines by
way of read_mapping_page(), split afs_readpage() to have afs_page_filler()
that's given a key instead of a file and call read_cache_page(), specifying the
new function directly. Use it in afs_readpages() as well.
Also make use of this in afs_mntpt_check_symlink() too for the same reason.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Al Viro [Mon, 17 May 2010 04:59:46 +0000 (00:59 -0400)]
Ban ecryptfs over ecryptfs
This is a seriously simplified patch from Eric Sandeen; copy of
rationale follows:
===
mounting stacked ecryptfs on ecryptfs has been shown to lead to bugs
in testing. For crypto info in xattr, there is no mechanism for handling
this at all, and for normal file headers, we run into other trouble:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
IP: [<
ffffffffa015b0b3>] ecryptfs_d_revalidate+0x43/0xa0 [ecryptfs]
...
There doesn't seem to be any good usecase for this, so I'd suggest just
disallowing the configuration.
Based on a patch originally, I believe, from Mike Halcrow.
===
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 15 May 2010 08:02:54 +0000 (04:02 -0400)]
logfs: replace inode uid,gid,mode initialization with helper function
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:32:23 +0000 (17:32 +0300)]
ufs: replace inode uid,gid,mode initialization with helper function
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:32:22 +0000 (17:32 +0300)]
udf: replace inode uid,gid,mode init with helper
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:32:21 +0000 (17:32 +0300)]
ubifs: replace inode uid,gid,mode initialization with helper function
Acked-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:32:20 +0000 (17:32 +0300)]
sysv: replace inode uid,gid,mode initialization with helper function
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:32:19 +0000 (17:32 +0300)]
reiserfs: replace inode uid,gid,mode initialization with helper function
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:32:18 +0000 (17:32 +0300)]
ramfs: replace inode uid,gid,mode initialization with helper function
- seems what ramfs_get_inode is only locally, make it static.
[AV: the hell it is; it's used by shmem, so shmem needed conversion too
and no, that function can't be made static]
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:32:17 +0000 (17:32 +0300)]
omfs: replace inode uid,gid,mode initialization with helper function
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:31:46 +0000 (17:31 +0300)]
bfs: replace inode uid,gid,mode initialization with helper function
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:32:16 +0000 (17:32 +0300)]
ocfs2: replace inode uid,gid,mode initialization with helper function
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:32:15 +0000 (17:32 +0300)]
nilfs2: replace inode uid,gid,mode initialization with helper function
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:32:14 +0000 (17:32 +0300)]
minix: replace inode uid,gid,mode init with helper
- also redesign minix_new_inode interface
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:31:51 +0000 (17:31 +0300)]
ext4: replace inode uid,gid,mode init with helper
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:31:50 +0000 (17:31 +0300)]
ext3: replace inode uid,gid,mode init with helper
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:31:49 +0000 (17:31 +0300)]
ext2: replace inode uid,gid,mode init with helper
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:31:48 +0000 (17:31 +0300)]
exofs: replace inode uid,gid,mode initialization with helper function
Ack-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:31:47 +0000 (17:31 +0300)]
btrfs: replace inode uid,gid,mode initialization with helper function
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:30:58 +0000 (17:30 +0300)]
jfs: replace inode uid,gid,mode init with helper
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:30:57 +0000 (17:30 +0300)]
9p: replace inode uid,gid,mode initialization with helper function
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Monakhov [Thu, 4 Mar 2010 14:29:14 +0000 (17:29 +0300)]
vfs: Add inode uid,gid,mode init helper
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
H Hartley Sweeten [Fri, 2 Apr 2010 01:36:30 +0000 (20:36 -0500)]
fs-writeback.c: bitfields should be unsigned
This fixes sparse noise:
error: dubious one-bit signed bitfield
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Huang Shijie [Fri, 2 Apr 2010 09:37:13 +0000 (17:37 +0800)]
namei.c : update mnt when it needed
update the mnt of the path when it is not equal to the new one.
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Roland Dreier [Tue, 27 Apr 2010 21:23:57 +0000 (14:23 -0700)]
vfs: add lockdep annotation to s_vfs_rename_key for ecryptfs
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.31-2-generic #14~rbd3
> ---------------------------------------------
> firefox-3.5/4162 is trying to acquire lock:
> (&s->s_vfs_rename_mutex){+.+.+.}, at: [<
ffffffff81139d31>] lock_rename+0x41/0xf0
>
> but task is already holding lock:
> (&s->s_vfs_rename_mutex){+.+.+.}, at: [<
ffffffff81139d31>] lock_rename+0x41/0xf0
>
> other info that might help us debug this:
> 3 locks held by firefox-3.5/4162:
> #0: (&s->s_vfs_rename_mutex){+.+.+.}, at: [<
ffffffff81139d31>] lock_rename+0x41/0xf0
> #1: (&sb->s_type->i_mutex_key#11/1){+.+.+.}, at: [<
ffffffff81139d5a>] lock_rename+0x6a/0xf0
> #2: (&sb->s_type->i_mutex_key#11/2){+.+.+.}, at: [<
ffffffff81139d6f>] lock_rename+0x7f/0xf0
>
> stack backtrace:
> Pid: 4162, comm: firefox-3.5 Tainted: G C 2.6.31-2-generic #14~rbd3
> Call Trace:
> [<
ffffffff8108ae74>] print_deadlock_bug+0xf4/0x100
> [<
ffffffff8108ce26>] validate_chain+0x4c6/0x750
> [<
ffffffff8108d2e7>] __lock_acquire+0x237/0x430
> [<
ffffffff8108d585>] lock_acquire+0xa5/0x150
> [<
ffffffff81139d31>] ? lock_rename+0x41/0xf0
> [<
ffffffff815526ad>] __mutex_lock_common+0x4d/0x3d0
> [<
ffffffff81139d31>] ? lock_rename+0x41/0xf0
> [<
ffffffff81139d31>] ? lock_rename+0x41/0xf0
> [<
ffffffff8120eaf9>] ? ecryptfs_rename+0x99/0x170
> [<
ffffffff81552b36>] mutex_lock_nested+0x46/0x60
> [<
ffffffff81139d31>] lock_rename+0x41/0xf0
> [<
ffffffff8120eb2a>] ecryptfs_rename+0xca/0x170
> [<
ffffffff81139a9e>] vfs_rename_dir+0x13e/0x160
> [<
ffffffff8113ac7e>] vfs_rename+0xee/0x290
> [<
ffffffff8113c212>] ? __lookup_hash+0x102/0x160
> [<
ffffffff8113d512>] sys_renameat+0x252/0x280
> [<
ffffffff81133eb4>] ? cp_new_stat+0xe4/0x100
> [<
ffffffff8101316a>] ? sysret_check+0x2e/0x69
> [<
ffffffff8108c34d>] ? trace_hardirqs_on_caller+0x14d/0x190
> [<
ffffffff8113d55b>] sys_rename+0x1b/0x20
> [<
ffffffff81013132>] system_call_fastpath+0x16/0x1b
The trace above is totally reproducible by doing a cross-directory
rename on an ecryptfs directory.
The issue seems to be that sys_renameat() does lock_rename() then calls
into the filesystem; if the filesystem is ecryptfs, then
ecryptfs_rename() again does lock_rename() on the lower filesystem, and
lockdep can't tell that the two s_vfs_rename_mutexes are different. It
seems an annotation like the following is sufficient to fix this (it
does get rid of the lockdep trace in my simple tests); however I would
like to make sure I'm not misunderstanding the locking, hence the CC
list...
Signed-off-by: Roland Dreier <rdreier@cisco.com>
Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cesar Eduardo Barros [Sat, 17 Apr 2010 22:28:09 +0000 (19:28 -0300)]
fs/partitions: use ADDPART_FLAG_RAID instead of magic number
ADDPART_FLAG_RAID was introduced in commit
d18d768, and most places were
converted to use it instead of a hardcoded value. However, some places seem
to have been missed.
Change all of them to the symbolic names via the following semantic patch:
@@
struct parsed_partitions *state;
expression E;
@@
(
- state->parts[E].flags = 1
+ state->parts[E].flags = ADDPART_FLAG_RAID
|
- state->parts[E].flags |= 1
+ state->parts[E].flags |= ADDPART_FLAG_RAID
|
- state->parts[E].flags = 2
+ state->parts[E].flags = ADDPART_FLAG_WHOLEDISK
|
- state->parts[E].flags |= 2
+ state->parts[E].flags |= ADDPART_FLAG_WHOLEDISK
)
Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Mon, 22 Mar 2010 16:32:25 +0000 (17:32 +0100)]
sanitize vfs_fsync calling conventions
Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.
The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Mon, 22 Mar 2010 16:32:14 +0000 (17:32 +0100)]
nfsd: open a file descriptor for fsync in nfs4 recovery
Instead of just looking up a path use do_filp_open to get us a file
structure for the nfs4 recovery directory. This allows us to get
rid of the last non-standard vfs_fsync caller with a NULL file
pointer.
[AV: should be using fput(), not filp_close()]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Richard Kennedy [Fri, 14 May 2010 09:49:22 +0000 (10:49 +0100)]
fs: inode.c use atomic_inc_return in __iget
Using atomic_inc_return in __iget(struct inode *inode) makes the intent
of this code clearer and generates less code on processors that have
this operation.
On x86_64 this patch reduces the text size of inode.o by 12 bytes.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
----
patch against 2.6.34-rc7
compiled & tested on x86_64 AMD X2
I've been running with this patch applied for several weeks with no
obvious problems.
regards
Richard
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Eric Paris [Fri, 14 May 2010 15:44:10 +0000 (11:44 -0400)]
anon_inode: set S_IFREG on the anon_inode
anon_inode_mkinode() sets inode->i_mode = S_IRUSR | S_IWUSR; This means
that (inode->i_mode & S_IFMT) == 0. This trips up some SELinux code that
needs to determine if a given inode is a regular file, a directory, etc.
The easiest solution is to just make sure that the anon_inode also sets
S_IFREG.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stephen Hemminger [Fri, 14 May 2010 00:53:23 +0000 (17:53 -0700)]
gfs: constify xattr_handler
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stephen Hemminger [Fri, 14 May 2010 00:53:22 +0000 (17:53 -0700)]
ocfs: constify xattr_handler
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stephen Hemminger [Fri, 14 May 2010 00:53:21 +0000 (17:53 -0700)]
jffs2: constify xattr_handler
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stephen Hemminger [Fri, 14 May 2010 00:53:20 +0000 (17:53 -0700)]
xfs: constify xattr_handler
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stephen Hemminger [Fri, 14 May 2010 00:53:19 +0000 (17:53 -0700)]
reiserfs: constify xattr_handler
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stephen Hemminger [Fri, 14 May 2010 00:53:18 +0000 (17:53 -0700)]
ext4: constify xattr_handler
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stephen Hemminger [Fri, 14 May 2010 00:53:17 +0000 (17:53 -0700)]
ext3: constify xattr handlers
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stephen Hemminger [Fri, 14 May 2010 00:53:16 +0000 (17:53 -0700)]
ext2: constify xattr_handler
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stephen Hemminger [Fri, 14 May 2010 00:53:15 +0000 (17:53 -0700)]
btrfs: constify xattr_handler
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stephen Hemminger [Fri, 14 May 2010 00:53:14 +0000 (17:53 -0700)]
fs: xattr_handler table should be const
The entries in xattr handler table should be immutable (ie const)
like other operation tables.
Later patches convert common filesystems. Uncoverted filesystems
will still work, but will generate a compiler warning.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Josef Bacik [Tue, 23 Mar 2010 14:34:56 +0000 (10:34 -0400)]
Introduce freeze_super and thaw_super for the fsfreeze ioctl
Currently the way we do freezing is by passing sb>s_bdev to freeze_bdev and then
letting it do all the work. But freezing is more of an fs thing, and doesn't
really have much to do with the bdev at all, all the work gets done with the
super. In btrfs we do not populate s_bdev, since we can have multiple bdev's
for one fs and setting s_bdev makes removing devices from a pool kind of tricky.
This means that freezing a btrfs filesystem fails, which causes us to corrupt
with things like tux-on-ice which use the fsfreeze mechanism. So instead of
populating sb->s_bdev with a random bdev in our pool, I've broken the actual fs
freezing stuff into freeze_super and thaw_super. These just take the
super_block that we're freezing and does the appropriate work. It's basically
just copy and pasted from freeze_bdev. I've then converted freeze_bdev over to
use the new super helpers. I've tested this with ext4 and btrfs and verified
everything continues to work the same as before.
The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if
the fs is already frozen. I thought this was a better solution than adding a
freeze counter to the super_block, but if everybody hates this idea I'm open to
suggestions. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 17:56:07 +0000 (13:56 -0400)]
Trim includes in fs/super.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 15:11:05 +0000 (11:11 -0400)]
Move grabbing s_umount to callers of grab_super()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 14:37:36 +0000 (10:37 -0400)]
Take statfs variants to fs/statfs.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 10:36:54 +0000 (06:36 -0400)]
switch selinux delayed superblock handling to iterate_supers()
... kill their private list, while we are at it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 10:06:58 +0000 (06:06 -0400)]
new helper: iterate_supers()
... and switch the simple "loop over superblocks and do something"
loops to it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 01:13:53 +0000 (21:13 -0400)]
Bury __put_super_and_need_restart()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 00:27:55 +0000 (20:27 -0400)]
fix prune_dcache()/umount() race
... and get rid of the last __put_super_and_need_restart() caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 00:23:25 +0000 (20:23 -0400)]
In get_super() and user_get_super() restarts are unconditional
If superblock had been still alive, we would've returned it...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 00:15:33 +0000 (20:15 -0400)]
fix get_active_super()/umount() race
This one needs restarts...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 00:11:53 +0000 (20:11 -0400)]
fix do_emergency_remount()/umount() races
need list_for_each_entry_safe() here. Original didn't even
have restart logics, so if you race with umount() it blew up.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 23 Mar 2010 00:09:33 +0000 (20:09 -0400)]
Convert simple loops over superblocks to list_for_each_entry_safe
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 22 Mar 2010 23:56:42 +0000 (19:56 -0400)]
get rid of restarts in sync_filesystems()
At the same time we can kill s_need_restart and local mutex in there.
__put_super() made public for a while; will be gone later.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 22 Mar 2010 23:36:35 +0000 (19:36 -0400)]
Leave superblocks on s_list until the end
We used to remove from s_list and s_instances at the same
time. So let's *not* do the former and skip superblocks
that have empty s_instances in the loops over s_list.
The next step, of course, will be to get rid of rescan logics
in those loops.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 22 Mar 2010 19:22:31 +0000 (15:22 -0400)]
Saner locking around deactivate_super()
Make sure that s_umount is acquired *before* we drop the final
active reference; we still have the fast path (atomic_dec_unless)
and we have gotten rid of the window between the moment when
s_active hits zero and s_umount is acquired. Which simplifies
the living hell out of grab_super() and inotify pin_to_kill()
stuff.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 22 Mar 2010 12:53:19 +0000 (08:53 -0400)]
get rid of S_BIAS
use atomic_inc_not_zero(&sb->s_active) instead of playing games with
checking ->s_count > S_BIAS
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 22 Mar 2010 02:34:11 +0000 (22:34 -0400)]
get rid of open-coded grab_super() in get_active_super()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 21 Mar 2010 23:24:23 +0000 (19:24 -0400)]
sb_entry() has been killed a couple of years ago and resurrected on mismerge
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 21 Mar 2010 23:22:29 +0000 (19:22 -0400)]
ceph: should use deactivate_locked_super() on failure exits
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 21 Mar 2010 16:24:29 +0000 (12:24 -0400)]
Clean ecryptfs ->get_sb() up
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 21 Mar 2010 02:32:26 +0000 (22:32 -0400)]
fix a couple of ecryptfs leaks
First of all, get_sb_nodev() grabs anon dev minor and we
never free it in ecryptfs ->kill_sb(). Moreover, on one
of the failure exits in ecryptfs_get_sb() we leak things -
it happens before we set ->s_root and ->put_super() won't
be called in that case. Solution: kill ->put_super(), do
all that stuff in ->kill_sb(). And use kill_anon_sb() instead
of generic_shutdown_super() to deal with anon dev leak.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 21 Mar 2010 01:57:43 +0000 (21:57 -0400)]
Simplify devpts_get_sb() failure exits
postpone simple_set_mnt() until we know we won't fail.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Mon, 1 Feb 2010 20:55:52 +0000 (21:55 +0100)]
remove incorrect comment in do_emergency_remount
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 21 May 2010 20:11:04 +0000 (16:11 -0400)]
clean DCACHE_CANT_MOUNT in d_delete()
We set the "it's dead, don't mount on it" flag _and_ do not remove it if
we turn the damn thing negative and leave it around. And if it goes
positive afterwards, well...
Fortunately, there's only one place where that needs to be caught:
only d_delete() can turn the sucker negative without immediately freeing
it; all other places that can lead to ->d_iput() call are followed by
unconditionally freeing struct dentry in question. So the fix is obvious:
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16014
Reported-by: Adam Tkac <vonsch@gmail.com>
Tested-by: Adam Tkac <vonsch@gmail.com>
Cc: <stable@kernel.org> [2.6.34.x]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Fri, 21 May 2010 22:26:46 +0000 (15:26 -0700)]
Merge git://git./linux/kernel/git/gregkh/staging-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (577 commits)
Staging: ramzswap: Handler for swap slot free callback
swap: Add swap slot free callback to block_device_operations
swap: Add flag to identify block swap devices
Staging: vt6655: use ETH_FRAME_LEN macro instead of custom one
Staging: vt6655: use ETH_DATA_LEN macro instead of custom one
Staging: vt6655: use ETH_FCS_LEN macro instead of custom one
Staging: vt6656: use ETH_HLEN macro instead of custom one
Staging: comedi: quatech_daqp_cs.c Replace eos semaphore with a completion.
Staging: dt3155v4l: remove private memory allocator
Staging: crystalhd: Remove typedefs from driver
Staging: winbond: Fix for pointer name format issue in mds.c
Staging: vt6656: removed custom UCHAR/USHORT/UINT/ULONG/ULONGLONG typedefs
Staging: vt6656: removed custom CHAR/SHORT/INT/LONG typedefs
Staging: comedi: Altered the way printk is used in 8255.c
staging: iio: adis16350 and similar IMU driver
Staging: iio: max1363 Fix two bugs in single_channel_from_ring
Staging: iio: adis16220 extract bin_attribute structures from state
Staging: iio: adis16220 vibration sensor driver
Staging: comedi: Kconfig dependancy fixes
Staging: comedi: fix up build error from last Kconfig changes
...