GitHub/exynos8895/android_kernel_samsung_universal8895.git
16 years agoblock: replace remaining __FUNCTION__ occurrences
Harvey Harrison [Mon, 21 Apr 2008 07:51:04 +0000 (09:51 +0200)]
block: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoKconfig: clean up block/Kconfig help descriptions
Nick Andrew [Mon, 21 Apr 2008 07:51:04 +0000 (09:51 +0200)]
Kconfig: clean up block/Kconfig help descriptions

Modify the help descriptions of block/Kconfig for clarity, accuracy and
consistency.

Refactor the BLOCK description a bit.  The wording "This permits ...  to be
removed" isn't quite right; the block layer is removed when the option is
disabled, whereas most descriptions talk about what happens when the option is
enabled.  Reformat the list of what is affected by disabling the block layer.

Add more examples of large block devices to LBD and strive for technical
accuracy; block devices of size _exactly_ 2TB require CONFIG_LBD, not only
"bigger than 2TB".  Also try to say (perhaps not very clearly) that the config
option is only needed when you want to have individual block devices of size
>= 2TB, for example if you had 3 x 1TB disks in your computer you'd have a
total storage size of 3TB but you wouldn't need the option unless you want to
aggregate those disks into a RAID or LVM.

Improve terminology and grammar on BLK_DEV_IO_TRACE.

I also added the boilerplate "If unsure, say N" to most options.

Precisely say "2TB and larger" for LSF.

Indent the help text for BLK_DEV_BSG by 2 spaces in accordance with the
standard.

Signed-off-by: Nick Andrew <nick@nick-andrew.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agocciss: fix warning oops on rmmod of driver
scameron@beardog.cca.cpqcorp.net [Thu, 17 Apr 2008 11:19:04 +0000 (13:19 +0200)]
cciss: fix warning oops on rmmod of driver

* Fix oops on cciss rmmod due to calling pci_free_consistent with
  irqs disabled.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agocciss: Fix race between disk-adding code and interrupt handler
scameron@beardog.cca.cpqcorp.net [Thu, 17 Apr 2008 11:19:03 +0000 (13:19 +0200)]
cciss: Fix race between disk-adding code and interrupt handler

Fix race condition between cciss_init_one(), cciss_update_drive_info(),
and cciss_check_queues().

Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: move the padding adjustment to blk_rq_map_sg
FUJITA Tomonori [Fri, 11 Apr 2008 10:56:52 +0000 (12:56 +0200)]
block: move the padding adjustment to blk_rq_map_sg

blk_rq_map_user adjusts bi_size of the last bio. It breaks the rule
that req->data_len (the true data length) is equal to sum(bio). It
broke the scsi command completion code.

commit e97a294ef6938512b655b1abf17656cf2b26f709 was introduced to fix
the above issue. However, the partial completion code doesn't work
with it. The commit is also a layer violation (scsi mid-layer should
not know about the block layer's padding).

This patch moves the padding adjustment to blk_rq_map_sg (suggested by
James). The padding works like the drain buffer. This patch breaks the
rule that req->data_len is equal to sum(sg), however, the drain buffer
already broke it. So this patch just restores the rule that
req->data_len is equal to sub(bio) without breaking anything new.

Now when a low level driver needs padding, blk_rq_map_user and
blk_rq_map_user_iov guarantee there's enough room for padding.
blk_rq_map_sg can safely extend the last entry of a scatter list.

blk_rq_map_sg must extend the last entry of a scatter list only for a
request that got through bio_copy_user_iov. This patches introduces
new REQ_COPY_USER flag.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: add bio_copy_user_iov support to blk_rq_map_user_iov
FUJITA Tomonori [Fri, 11 Apr 2008 10:56:51 +0000 (12:56 +0200)]
block: add bio_copy_user_iov support to blk_rq_map_user_iov

With this patch, blk_rq_map_user_iov uses bio_copy_user_iov when a low
level driver needs padding or a buffer in sg_iovec isn't aligned. That
is, it uses temporary kernel buffers instead of mapping user pages
directly.

When a LLD needs padding, later blk_rq_map_sg needs to extend the last
entry of a scatter list. bio_copy_user_iov guarantees that there is
enough space for padding by using temporary kernel buffers instead of
user pages.

blk_rq_map_user_iov needs buffers in sg_iovec to be aligned. The
comment in blk_rq_map_user_iov indicates that drivers/scsi/sg.c also
needs buffers in sg_iovec to be aligned. Actually, drivers/scsi/sg.c
works with unaligned buffers in sg_iovec (it always uses temporary
kernel buffers).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: convert bio_copy_user to bio_copy_user_iov
FUJITA Tomonori [Fri, 11 Apr 2008 10:56:49 +0000 (12:56 +0200)]
block: convert bio_copy_user to bio_copy_user_iov

This patch enables bio_copy_user to take struct sg_iovec (renamed
bio_copy_user_iov). bio_copy_user uses bio_copy_user_iov internally as
bio_map_user uses bio_map_user_iov.

The major changes are:

- adds sg_iovec array to struct bio_map_data

- adds __bio_copy_iov that copy data between bio and
sg_iovec. bio_copy_user_iov and bio_uncopy_user use it.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoloop: manage partitions in disk image
Laurent Vivier [Wed, 26 Mar 2008 11:11:53 +0000 (12:11 +0100)]
loop: manage partitions in disk image

This patch allows to use loop device with partitionned disk image.

Original behavior of loop is not modified.

A new parameter is introduced to define how many partition we want to be
able to manage per loop device. This parameter is "max_part".

For instance, to manage 63 partitions / loop device, we will do:
# modprobe loop max_part=63
# ls -l /dev/loop?*
brw-rw---- 1 root disk 7,   0 2008-03-05 14:55 /dev/loop0
brw-rw---- 1 root disk 7,  64 2008-03-05 14:55 /dev/loop1
brw-rw---- 1 root disk 7, 128 2008-03-05 14:55 /dev/loop2
brw-rw---- 1 root disk 7, 192 2008-03-05 14:55 /dev/loop3
brw-rw---- 1 root disk 7, 256 2008-03-05 14:55 /dev/loop4
brw-rw---- 1 root disk 7, 320 2008-03-05 14:55 /dev/loop5
brw-rw---- 1 root disk 7, 384 2008-03-05 14:55 /dev/loop6
brw-rw---- 1 root disk 7, 448 2008-03-05 14:55 /dev/loop7

And to attach a raw partitionned disk image, the original losetup is used:

# losetup -f etch.img
# ls -l /dev/loop?*
brw-rw---- 1 root disk 7,   0 2008-03-05 14:55 /dev/loop0
brw-rw---- 1 root disk 7,   1 2008-03-05 14:57 /dev/loop0p1
brw-rw---- 1 root disk 7,   2 2008-03-05 14:57 /dev/loop0p2
brw-rw---- 1 root disk 7,   5 2008-03-05 14:57 /dev/loop0p5
brw-rw---- 1 root disk 7,  64 2008-03-05 14:55 /dev/loop1
brw-rw---- 1 root disk 7, 128 2008-03-05 14:55 /dev/loop2
brw-rw---- 1 root disk 7, 192 2008-03-05 14:55 /dev/loop3
brw-rw---- 1 root disk 7, 256 2008-03-05 14:55 /dev/loop4
brw-rw---- 1 root disk 7, 320 2008-03-05 14:55 /dev/loop5
brw-rw---- 1 root disk 7, 384 2008-03-05 14:55 /dev/loop6
brw-rw---- 1 root disk 7, 448 2008-03-05 14:55 /dev/loop7
# mount /dev/loop0p1 /mnt
# ls /mnt
bench  cdrom  home        lib         mnt   root     srv  usr
bin    dev    initrd      lost+found  opt   sbin     sys  var
boot   etc    initrd.img  media       proc  selinux  tmp  vmlinuz
# umount /mnt
# losetup -d /dev/loop0

Of course, the same behavior can be done using kpartx on a loop device,
but modifying loop avoids to stack several layers of block device (loop +
device mapper), this is a very light modification (40% of modifications
are to manage the new parameter).

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agocdrom: use kmalloced buffers instead of buffers on stack
Thomas Bogendoerfer [Wed, 26 Mar 2008 11:09:38 +0000 (12:09 +0100)]
cdrom: use kmalloced buffers instead of buffers on stack

If cdrom commands are issued to a scsi drive in most cases the buffer will be
filled via dma.  This leads to bad stack corruption on non coherent platforms,
because the buffers are neither cache line aligned nor is the size a multiple
of the cache line size.  Using kmalloced buffers avoids this.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agocdrom: make unregister_cdrom() return void
Akinobu Mita [Wed, 26 Mar 2008 11:09:02 +0000 (12:09 +0100)]
cdrom: make unregister_cdrom() return void

Now unregister_cdrom() always returns 0.
Make it return void and update all callers that check the return value.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agocdrom: use list_head for cdrom_device_info list
Akinobu Mita [Wed, 26 Mar 2008 11:09:02 +0000 (12:09 +0100)]
cdrom: use list_head for cdrom_device_info list

Use list_head for cdrom_device_info list instead of opencoded
singly list handling.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agocdrom: protect cdrom_device_info list by mutex
Akinobu Mita [Wed, 26 Mar 2008 11:09:01 +0000 (12:09 +0100)]
cdrom: protect cdrom_device_info list by mutex

This patch protects the list of cdrom_device_info by cdrom_mutex
when the file in /proc/sys/dev/cdrom/ is written.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agocdrom: cleanup hardcoded error-code
Akinobu Mita [Wed, 26 Mar 2008 11:09:00 +0000 (12:09 +0100)]
cdrom: cleanup hardcoded error-code

This patch eliminates hardcoded return value of register_cdrom().

It also changes the return value to -EINVAL.
It is more appropriate than -2 (-ENOENT) because it is only
happen invalid usage of register_cdrom() by broken cdrom driver.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agocdrom: remove ifdef CONFIG_SYSCTL
Akinobu Mita [Wed, 26 Mar 2008 11:08:59 +0000 (12:08 +0100)]
cdrom: remove ifdef CONFIG_SYSCTL

This patch removes #ifdef for CONFIG_SYSCTL by defining empty
cdrom_sysctl_register and cdrom_sysctl_unregister when CONFIG_SYSCTL
is not defined.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Sat, 19 Apr 2008 01:18:30 +0000 (18:18 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  security: fix up documentation for security_module_enable
  Security: Introduce security= boot parameter
  Audit: Final renamings and cleanup
  SELinux: use new audit hooks, remove redundant exports
  Audit: internally use the new LSM audit hooks
  LSM/Audit: Introduce generic Audit LSM hooks
  SELinux: remove redundant exports
  Netlink: Use generic LSM hook
  Audit: use new LSM hooks instead of SELinux exports
  SELinux: setup new inode/ipc getsecid hooks
  LSM: Introduce inode_getsecid and ipc_getsecid hooks

16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26
Linus Torvalds [Sat, 19 Apr 2008 01:02:35 +0000 (18:02 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6.26

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
  [NET]: Fix and allocate less memory for ->priv'less netdevices
  [IPV6]: Fix dangling references on error in fib6_add().
  [NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
  [PKT_SCHED]: Fix datalen check in tcf_simp_init().
  [INET]: Uninline the __inet_inherit_port call.
  [INET]: Drop the inet_inherit_port() call.
  SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
  [netdrvr] forcedeth: internal simplifications; changelog removal
  phylib: factor out get_phy_id from within get_phy_device
  PHY: add BCM5464 support to broadcom PHY driver
  cxgb3: Fix __must_check warning with dev_dbg.
  tc35815: Statistics cleanup
  natsemi: fix MMIO for PPC 44x platforms
  [TIPC]: Cleanup of TIPC reference table code
  [TIPC]: Optimized initialization of TIPC reference table
  [TIPC]: Remove inlining of reference table locking routines
  e1000: convert uint16_t style integers to u16
  ixgb: convert uint16_t style integers to u16
  sb1000.c: make const arrays static
  sb1000.c: stop inlining largish static functions
  ...

16 years agosecurity: fix up documentation for security_module_enable
James Morris [Fri, 7 Mar 2008 01:23:49 +0000 (12:23 +1100)]
security: fix up documentation for security_module_enable

security_module_enable() can only be called during kernel init.

Signed-off-by: James Morris <jmorris@namei.org>
16 years agoSecurity: Introduce security= boot parameter
Ahmed S. Darwish [Thu, 6 Mar 2008 16:09:10 +0000 (18:09 +0200)]
Security: Introduce security= boot parameter

Add the security= boot parameter. This is done to avoid LSM
registration clashes in case of more than one bult-in module.

User can choose a security module to enable at boot. If no
security= boot parameter is specified, only the first LSM
asking for registration will be loaded. An invalid security
module name will be treated as if no module has been chosen.

LSM modules must check now if they are allowed to register
by calling security_module_enable(ops) first. Modify SELinux
and SMACK to do so.

Do not let SMACK register smackfs if it was not chosen on
boot. Smackfs assumes that smack hooks are registered and
the initial task security setup (swapper->security) is done.

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
16 years agoAudit: Final renamings and cleanup
Ahmed S. Darwish [Fri, 18 Apr 2008 23:59:43 +0000 (09:59 +1000)]
Audit: Final renamings and cleanup

Rename the se_str and se_rule audit fields elements to
lsm_str and lsm_rule to avoid confusion.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
16 years agoSELinux: use new audit hooks, remove redundant exports
Ahmed S. Darwish [Sat, 1 Mar 2008 20:03:14 +0000 (22:03 +0200)]
SELinux: use new audit hooks, remove redundant exports

Setup the new Audit LSM hooks for SELinux.
Remove the now redundant exported SELinux Audit interface.

Audit: Export 'audit_krule' and 'audit_field' to the public
since their internals are needed by the implementation of the
new LSM hook 'audit_rule_known'.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
16 years agoAudit: internally use the new LSM audit hooks
Ahmed S. Darwish [Sat, 1 Mar 2008 20:01:11 +0000 (22:01 +0200)]
Audit: internally use the new LSM audit hooks

Convert Audit to use the new LSM Audit hooks instead of
the exported SELinux interface.

Basically, use:
security_audit_rule_init
secuirty_audit_rule_free
security_audit_rule_known
security_audit_rule_match

instad of (respectively) :
selinux_audit_rule_init
selinux_audit_rule_free
audit_rule_has_selinux
selinux_audit_rule_match

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
16 years agoLSM/Audit: Introduce generic Audit LSM hooks
Ahmed S. Darwish [Sat, 1 Mar 2008 20:00:05 +0000 (22:00 +0200)]
LSM/Audit: Introduce generic Audit LSM hooks

Introduce a generic Audit interface for security modules
by adding the following new LSM hooks:

audit_rule_init(field, op, rulestr, lsmrule)
audit_rule_known(krule)
audit_rule_match(secid, field, op, rule, actx)
audit_rule_free(rule)

Those hooks are only available if CONFIG_AUDIT is enabled.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years agoSELinux: remove redundant exports
Ahmed S. Darwish [Sat, 1 Mar 2008 19:58:32 +0000 (21:58 +0200)]
SELinux: remove redundant exports

Remove the following exported SELinux interfaces:
selinux_get_inode_sid(inode, sid)
selinux_get_ipc_sid(ipcp, sid)
selinux_get_task_sid(tsk, sid)
selinux_sid_to_string(sid, ctx, len)

They can be substitued with the following generic equivalents
respectively:
new LSM hook, inode_getsecid(inode, secid)
new LSM hook, ipc_getsecid*(ipcp, secid)
LSM hook, task_getsecid(tsk, secid)
LSM hook, sid_to_secctx(sid, ctx, len)

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years agoNetlink: Use generic LSM hook
Ahmed S. Darwish [Sat, 1 Mar 2008 19:56:22 +0000 (21:56 +0200)]
Netlink: Use generic LSM hook

Don't use SELinux exported selinux_get_task_sid symbol.
Use the generic LSM equivalent instead.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years agoAudit: use new LSM hooks instead of SELinux exports
Ahmed S. Darwish [Sat, 1 Mar 2008 19:54:38 +0000 (21:54 +0200)]
Audit: use new LSM hooks instead of SELinux exports

Stop using the following exported SELinux interfaces:
selinux_get_inode_sid(inode, sid)
selinux_get_ipc_sid(ipcp, sid)
selinux_get_task_sid(tsk, sid)
selinux_sid_to_string(sid, ctx, len)
kfree(ctx)

and use following generic LSM equivalents respectively:
security_inode_getsecid(inode, secid)
security_ipc_getsecid*(ipcp, secid)
security_task_getsecid(tsk, secid)
security_sid_to_secctx(sid, ctx, len)
security_release_secctx(ctx, len)

Call security_release_secctx only if security_secid_to_secctx
succeeded.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years agoSELinux: setup new inode/ipc getsecid hooks
Ahmed S. Darwish [Sat, 1 Mar 2008 19:52:30 +0000 (21:52 +0200)]
SELinux: setup new inode/ipc getsecid hooks

Setup the new inode_getsecid and ipc_getsecid() LSM hooks
for SELinux.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years agoLSM: Introduce inode_getsecid and ipc_getsecid hooks
Ahmed S. Darwish [Sat, 1 Mar 2008 19:51:09 +0000 (21:51 +0200)]
LSM: Introduce inode_getsecid and ipc_getsecid hooks

Introduce inode_getsecid(inode, secid) and ipc_getsecid(ipcp, secid)
LSM hooks. These hooks will be used instead of similar exported
SELinux interfaces.

Let {inode,ipc,task}_getsecid hooks set the secid to 0 by default
if CONFIG_SECURITY is not defined or if the hook is set to
NULL (dummy). This is done to notify the caller that no valid
secid exists.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years ago[NET]: Fix and allocate less memory for ->priv'less netdevices
Alexey Dobriyan [Fri, 18 Apr 2008 22:43:32 +0000 (15:43 -0700)]
[NET]: Fix and allocate less memory for ->priv'less netdevices

This patch effectively reverts commit d0498d9ae1a5cebac363e38907266d5cd2eedf89
aka "[NET]: Do not allocate unneeded memory for dev->priv alignment."
It was found to be buggy because of final unconditional += NETDEV_ALIGN_CONST
removal.

For example, for sizeof(struct net_device) being 2048 bytes, "alloc_size"
was also 2048 bytes, but allocator with debugging options turned on started
giving out !32-byte aligned memory resulting in redzones overwrites.

Patch does small optimization in ->priv'less case: bumping size to next
32-byte boundary was always done to ensure ->priv will also be aligned.
But, no ->priv, no need to do that.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agox86 PAT: fix mmap() of holes
Ingo Molnar [Fri, 18 Apr 2008 19:32:22 +0000 (21:32 +0200)]
x86 PAT: fix mmap() of holes

do not return a -EINVAL when mmap()-ing PCI holes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
Linus Torvalds [Fri, 18 Apr 2008 18:25:31 +0000 (11:25 -0700)]
Merge git://git./linux/kernel/git/jejb/scsi-misc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (137 commits)
  [SCSI] iscsi: bidi support for iscsi_tcp
  [SCSI] iscsi: bidi support at the generic libiscsi level
  [SCSI] iscsi: extended cdb support
  [SCSI] zfcp: Fix error handling for blocked unit for send FCP command
  [SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock
  [SCSI] zfcp: fix 31 bit compile warnings
  [SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands
  [SCSI] bsg: remove minor in struct bsg_device
  [SCSI] bsg: use better helper list functions
  [SCSI] bsg: replace kobject_get with blk_get_queue
  [SCSI] bsg: takes a ref to struct device in fops->open
  [SCSI] qla1280: remove version check
  [SCSI] libsas: fix endianness bug in sas_ata
  [SCSI] zfcp: fix compiler warning caused by poking inside new semaphore (linux-next)
  [SCSI] aacraid: Do not describe check_reset parameter with its value
  [SCSI] aacraid: Fix down_interruptible() to check the return value
  [SCSI] sun3_scsi_vme: add MODULE_LICENSE
  [SCSI] st: rename flush_write_buffer()
  [SCSI] tgt: use KMEM_CACHE macro
  [SCSI] initio: fix big endian problems for auto request sense
  ...

16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394...
Linus Torvalds [Fri, 18 Apr 2008 18:24:29 +0000 (11:24 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (43 commits)
  firewire: cleanups
  firewire: fix synchronization of gap counts
  firewire: wait until PHY configuration packet was transmitted (fix bus reset loop)
  firewire: remove unused struct member
  firewire: use bitwise and to get reg in handle_registers
  firewire: replace more hex values with defined csr constants
  firewire: reread config ROM when device reset the bus
  firewire: replace static ROM cache by allocated cache
  firewire: fw-ohci: work around generation bug in TI controllers (fix AV/C and more)
  firewire: fw-ohci: extend logging of bus generations and node ID
  firewire: fw-ohci: conditionally log busReset interrupts
  firewire: fw-ohci: don't append to AT context when it's not active
  firewire: fw-ohci: log regAccessFail events
  firewire: fw-ohci: make sure HCControl register LPS bit is set
  firewire: fw-ohci: missing PPC PMac feature calls in failure path
  firewire: fw-ohci: untangle a mixed unsigned/signed expression
  firewire: debug interrupt events
  firewire: fw-ohci: catch self_id_count == 0
  firewire: fw-ohci: add self ID error check
  firewire: fw-ohci: refactor probe, remove, suspend, resume
  ...

16 years agolibata: fix boot panic with SATAPI devices on non-SFF HBAs
James Bottomley [Fri, 18 Apr 2008 18:18:48 +0000 (13:18 -0500)]
libata: fix boot panic with SATAPI devices on non-SFF HBAs

The kernel now panics reliably on boot if you have a SATAPI device
connected.

The problem was introduced by the libata merge trying to pull out all
the SFF code into a separate module.  Unfortunately, if you're a satapi
device you usually need to call atapi_request_sense, which has a bare
invocation of a SFF callback which is NULL on non-SFF HBAs.  Fix this by
making the call conditional.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfashe...
Linus Torvalds [Fri, 18 Apr 2008 17:15:22 +0000 (10:15 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/mfasheh/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (64 commits)
  ocfs2/net: Add debug interface to o2net
  ocfs2: Only build ocfs2/dlm with the o2cb stack module
  ocfs2/cluster: Get rid of arguments to the timeout routines
  ocfs2: Put tree in MAINTAINERS
  ocfs2: Use BUG_ON
  ocfs2: Convert ocfs2 over to unlocked_ioctl
  ocfs2: Improve rename locking
  fs/ocfs2/aops.c: test for IS_ERR rather than 0
  ocfs2: Add inode stealing for ocfs2_reserve_new_inode
  ocfs2: Add ac_alloc_slot in ocfs2_alloc_context
  ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits
  ocfs2: Enable cross extent block merge.
  ocfs2: Add support for cross extent block
  ocfs2: Move /sys/o2cb to /sys/fs/o2cb
  sysfs: Allow removal of symlinks in the sysfs root
  ocfs2:  Reconnect after idle time out.
  ocfs2/dlm: Cleanup lockres print
  ocfs2/dlm: Fix lockname in lockres print function
  ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c
  ocfs2/dlm: Dumps the purgelist into a debugfs file
  ...

16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
Linus Torvalds [Fri, 18 Apr 2008 17:02:46 +0000 (10:02 -0700)]
Merge git://git./linux/kernel/git/steve/gfs2-2.6-nmw

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (49 commits)
  [GFS2] fix assertion in log_refund()
  [GFS2] fix GFP_KERNEL misuses
  [GFS2] test for IS_ERR rather than 0
  [GFS2] Invalidate cache at correct point
  [GFS2] fs/gfs2/recovery.c: suppress warnings
  [GFS2] Faster gfs2_bitfit algorithm
  [GFS2] Streamline quota lock/check for no-quota case
  [GFS2] Remove drop of module ref where not needed
  [GFS2] gfs2_adjust_quota has broken unstuffing code
  [GFS2] possible null pointer dereference fixup
  [GFS2] Need to ensure that sector_t is 64bits for GFS2
  [GFS2] re-support special inode
  [GFS2] remove gfs2_dev_iops
  [GFS2] fix file_system_type leak on gfs2meta mount
  [GFS2] Allow bmap to allocate extents
  [GFS2] Fix a page lock / glock deadlock
  [GFS2] proper extern for gfs2/locking/dlm/mount.c:gdlm_ops
  [GFS2] gfs2/ops_file.c should #include "ops_inode.h"
  [GFS2] be*_add_cpu conversion
  [GFS2] Fix bug where we called drop_bh incorrectly
  ...

16 years agox86: kgdb build fix
Harvey Harrison [Fri, 18 Apr 2008 16:54:38 +0000 (09:54 -0700)]
x86: kgdb build fix

TF_MASK is no longer defined, use X86_EFLAGS_TF.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years ago[SCSI] iscsi: bidi support for iscsi_tcp
Boaz Harrosh [Fri, 18 Apr 2008 15:11:53 +0000 (10:11 -0500)]
[SCSI] iscsi: bidi support for iscsi_tcp

access the right scsi_in() and/or scsi_out() side of things.
also for resid

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Reviewed-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] iscsi: bidi support at the generic libiscsi level
Boaz Harrosh [Fri, 18 Apr 2008 15:11:52 +0000 (10:11 -0500)]
[SCSI] iscsi: bidi support at the generic libiscsi level

- prepare the additional bidi_read rlength header.
- access the right scsi_in() and/or scsi_out() side of things.
  also for resid.
- Handle BIDI underflow overflow from target

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Reviewed-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] iscsi: extended cdb support
Boaz Harrosh [Fri, 18 Apr 2008 15:11:51 +0000 (10:11 -0500)]
[SCSI] iscsi: extended cdb support

Support for extended CDBs in iscsi.
All we need is to check if command spills over 16 bytes then allocate
an iscsi-extended-header for the leftovers.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Reviewed-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] zfcp: Fix error handling for blocked unit for send FCP command
Christof Schmitt [Fri, 18 Apr 2008 10:51:57 +0000 (12:51 +0200)]
[SCSI] zfcp: Fix error handling for blocked unit for send FCP command

In the case the unit is blocked, zfcp_unit_get has not been called
yet, so the error handling path should not call zfcp_unit_put.

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock
Christof Schmitt [Fri, 18 Apr 2008 10:51:56 +0000 (12:51 +0200)]
[SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock

The testcase
# chchp -v 0 0.da && sleep 59 && chchp -v 1 0.da
results in this deadlock situation:

STACK TRACE FOR TASK: 0x7e9a2048 (zfcperp0.0.c613)
0 schedule+816 [0x356b3c]
1 schedule_timeout+172 [0x357340]
2 wait_for_common+192 [0x3565fc]
3 flush_cpu_workqueue+116 [0x52af0]
4 flush_workqueue+116 [0x533b8]
5 fc_remote_port_add+64 [0x1c83ec]
6 zfcp_erp_thread+4534 [0x26585a]
7 kernel_thread_starter+6 [0x195d2]

STACK TRACE FOR TASK: 0x7f8ec048 (fc_wq_0)
0 schedule+816 [0x356b3c]
1 zfcp_erp_wait+104 [0x264568]
2 zfcp_scsi_slave_destroy+64 [0x261b24]
3 __scsi_remove_device+154 [0x1c24ba]
4 scsi_remove_device+62 [0x1c2512]
5 __scsi_remove_target+198 [0x1c25ea]
6 __remove_child+58 [0x1c26d6]
7 device_for_each_child+66 [0x1ab566]
8 scsi_remove_target+98 [0x1c268a]
9 run_workqueue+200 [0x5272c]
10 worker_thread+146 [0x52882]
11 kthread+140 [0x58360]
12 kernel_thread_starter+6 [0x195d2]

Remove the zfcp_erp_wait call that is not required here to prevent the
deadlock situation.

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] zfcp: fix 31 bit compile warnings
Martin Peschke [Fri, 18 Apr 2008 10:51:55 +0000 (12:51 +0200)]
[SCSI] zfcp: fix 31 bit compile warnings

drivers/s390/scsi/zfcp_aux.c: In function â€˜zfcp_fsf_incoming_els_rscn’:
drivers/s390/scsi/zfcp_aux.c:1379: warning: cast from pointer to integer of
different size
drivers/s390/scsi/zfcp_aux.c: In function â€˜zfcp_fsf_incoming_els_plogi’:
drivers/s390/scsi/zfcp_aux.c:1432: warning: cast from pointer to integer of
different size
drivers/s390/scsi/zfcp_aux.c: In function â€˜zfcp_fsf_incoming_els_logo’:
drivers/s390/scsi/zfcp_aux.c:1457: warning: cast from pointer to integer of
different size
..

Just passing pointers rids us of these warnings and improves readability.

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:42 +0000 (10:03 +0900)]
[SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands

Before bsg_complete_all_commands is called, BSG_F_BLOCK bit is always
set.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] bsg: remove minor in struct bsg_device
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:41 +0000 (10:03 +0900)]
[SCSI] bsg: remove minor in struct bsg_device

minor in struct bsg_device is used as identifier to find the
corresponding struct bsg_device_class. However, request_queuse can be
used as identifier for that and the minor in struct bsg_device is
unnecessary.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] bsg: use better helper list functions
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:40 +0000 (10:03 +0900)]
[SCSI] bsg: use better helper list functions

This replace hlist_for_each and list_entry with hlist_for_each_entry
and list_first_entry respectively.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] bsg: replace kobject_get with blk_get_queue
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:39 +0000 (10:03 +0900)]
[SCSI] bsg: replace kobject_get with blk_get_queue

Both takes a ref to a queue. But blk_get_queue checks QUEUE_FLAG_DEAD
and is more appropriate interface here.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] bsg: takes a ref to struct device in fops->open
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:38 +0000 (10:03 +0900)]
[SCSI] bsg: takes a ref to struct device in fops->open

bsg_register_queue() takes a ref to struct device that a caller
passes. For example, bsg takes a ref to the sdev_gendev for scsi
devices. However, bsg doesn't inrease the refcount in fops->open. So
while an application opens a bsg device, the scsi device that the bsg
device holds can go away (bsg also takes a ref to a queue, but it
doesn't prevent the device from going away).

With this patch, bsg increases the refcount of struct device in
fops->open and decreases it in fops->release.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years agoMerge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
Linus Torvalds [Fri, 18 Apr 2008 16:44:55 +0000 (09:44 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: (27 commits)
  [IA64] kdump: Add crash_save_vmcoreinfo for INIT
  [IA64] Fix NUMA configuration issue
  [IA64] Itanium Spec updates
  [IA64] Untangle sync_icache_dcache() page size determination
  [IA64] arch/ia64/kernel/: use time_* macros
  [IA64] remove redundant display of free swap space in show_mem()
  [IA64] make IOMMU respect the segment boundary limits
  [IA64] kprobes: kprobe-booster for ia64
  [IA64] fix getpid and set_tid_address fast system calls for pid namespaces
  [IA64] Replace explicit jiffies tests with time_* macros.
  [IA64] use goto to jump out do/while_each_thread
  [IA64] Fix unlock ordering in smp_callin
  [IA64] pgd_offset() constfication.
  [IA64] kdump: crash.c coding style fix
  [IA64] kdump: add kdump_on_fatal_mca
  [IA64] Minimize per_cpu reservations.
  [IA64] Correct pernodesize calculation.
  [IA64] Kernel parameter for max number of concurrent global TLB purges
  [IA64] Multiple outstanding ptc.g instruction support
  [IA64] Implement smp_call_function_mask for ia64
  ...

16 years agoocfs2/net: Add debug interface to o2net
Sunil Mushran [Mon, 14 Apr 2008 17:46:19 +0000 (10:46 -0700)]
ocfs2/net: Add debug interface to o2net

This patch exposes o2net information via debugfs. The information includes
the list of sockets (sock_containers) as well as the list of outstanding
messages (send_tracking). Useful for o2dlm debugging.

(This patch is derived from an earlier one written by Zach Brown that
exposed the same information via /proc.)

[Mark: checkpatch fixes]

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Only build ocfs2/dlm with the o2cb stack module
Mark Fasheh [Fri, 4 Apr 2008 19:45:55 +0000 (12:45 -0700)]
ocfs2: Only build ocfs2/dlm with the o2cb stack module

fs/ocfs2/dlm/ocfs2_dlm.ko and fs/ocfs2/dlm/ocfs2_dlmfs.ko get built if
CONFIG_FS_OCFS2 is specified. This isn't quite how it should happen any more
- the "o2cb" dlm modules should only be built if CONFIG_FS_OCFS2_O2CB is
set, so update the dlm Makefile accordingly.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
16 years agoocfs2/cluster: Get rid of arguments to the timeout routines
Jeff Mahoney [Fri, 28 Mar 2008 23:44:13 +0000 (16:44 -0700)]
ocfs2/cluster: Get rid of arguments to the timeout routines

We keep seeing bug reports related to NULL pointer derefs in
o2net_set_nn_state(). When I originally wrote up the configurable timeout
patch, I had tried to plan for multiple clusters. This was silly.

The timeout routines all use o2nm_single_cluster so there's no point in
passing an argument at all. This patch removes the arguments and kills those
bugs dead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Put tree in MAINTAINERS
Joel Becker [Sun, 23 Mar 2008 05:08:07 +0000 (22:08 -0700)]
ocfs2: Put tree in MAINTAINERS

The ocfs2 MAINTAINERS entry should have the git tree URL.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Use BUG_ON
Julia Lawall [Tue, 4 Mar 2008 23:21:05 +0000 (15:21 -0800)]
ocfs2: Use BUG_ON

if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Convert ocfs2 over to unlocked_ioctl
Andi Kleen [Sun, 27 Jan 2008 02:17:17 +0000 (03:17 +0100)]
ocfs2: Convert ocfs2 over to unlocked_ioctl

As far as I can see there is nothing in ocfs2_ioctl that requires the BKL,
so use unlocked_ioctl

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Improve rename locking
Jan Kara [Thu, 21 Feb 2008 17:00:00 +0000 (18:00 +0100)]
ocfs2: Improve rename locking

ocfs2_rename() was being too aggressive with the rename lock - we only need
it for certain forms of directory rename.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agofs/ocfs2/aops.c: test for IS_ERR rather than 0
Julia Lawall [Fri, 28 Mar 2008 21:43:10 +0000 (14:43 -0700)]
fs/ocfs2/aops.c: test for IS_ERR rather than 0

The function ocfs2_start_trans always returns either a valid pointer or a
value made with ERR_PTR, so its result should be tested with IS_ERR, not
with a test for 0.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add inode stealing for ocfs2_reserve_new_inode
Tao Ma [Wed, 5 Mar 2008 08:11:46 +0000 (16:11 +0800)]
ocfs2: Add inode stealing for ocfs2_reserve_new_inode

Inode allocation is modified to look in other nodes allocators during
extreme out of space situations. We retry our own slot when space is freed
back to the global bitmap, or whenever we've allocated more than 1024 inodes
from another slot.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add ac_alloc_slot in ocfs2_alloc_context
Tao Ma [Mon, 3 Mar 2008 09:12:30 +0000 (17:12 +0800)]
ocfs2: Add ac_alloc_slot in ocfs2_alloc_context

In inode stealing, we no longer restrict the allocation to
happen in the local node. So it is neccessary for us to add
a new member in ocfs2_alloc_context to indicate which slot
we are using for allocation. We also modify the process of
local alloc so that this member can be used there also.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits
Tao Ma [Mon, 3 Mar 2008 09:12:09 +0000 (17:12 +0800)]
ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits

In some cases(Inode stealing from other nodes), we may not want
ocfs2_reserve_suballoc_bits to allocate new groups from the
global_bitmap since it may already be full. So add a new parameter
for this.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Enable cross extent block merge.
Tao Ma [Wed, 30 Jan 2008 06:21:32 +0000 (14:21 +0800)]
ocfs2: Enable cross extent block merge.

In ocfs2_figure_merge_contig_type, we judge whether there exists
a cross extent block merge and enable it by setting CONTIG_LEFT
and CONTIG_RIGHT accordingly.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add support for cross extent block
Tao Ma [Wed, 30 Jan 2008 06:21:05 +0000 (14:21 +0800)]
ocfs2: Add support for cross extent block

In ocfs2_merge_rec_left, when we find the merge extent is "CONTIG_RIGHT"
with the first extent record of the next extent block, we will merge it to
the next extent block and change all the related extent blocks accordingly.

In ocfs2_merge_rec_right, when we find the merge extent is "CONTIG_LEFT"
with the last extent record of the previous extent block, we will merge
it to the prevoius extent block and change all the related extent blocks
accordingly.

As for CONTIG_LEFTRIGHT, we will handle CONTIG_RIGHT first so that when
the index is zero, the merge process will be more efficient and easier.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Move /sys/o2cb to /sys/fs/o2cb
Mark Fasheh [Wed, 30 Jan 2008 01:08:26 +0000 (17:08 -0800)]
ocfs2: Move /sys/o2cb to /sys/fs/o2cb

/sys/fs is where we really want file system specific sysfs objects.

Ocfs2-tools has been updated to look in /sys/fs/o2cb. We can maintain
backwards compatibility with old ocfs2-tools by using a sysfs symlink. After
some time (2 years), the symlink can be safely removed. This patch also adds
documentation to make it easier for people to figure out what /sys/fs/o2cb
is used for.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agosysfs: Allow removal of symlinks in the sysfs root
Mark Fasheh [Tue, 29 Jan 2008 22:35:18 +0000 (14:35 -0800)]
sysfs: Allow removal of symlinks in the sysfs root

Allow callers of sysfs_remove_link() to pass a NULL kobj, in which case
sysfs_root will be used as the parent directory. This allows us to tear down
top level symlinks created via sysfs_create_link(), which already has
similar handling of a NULL parent object.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
16 years agoocfs2: Reconnect after idle time out.
Tao Ma [Wed, 5 Mar 2008 07:50:12 +0000 (15:50 +0800)]
ocfs2:  Reconnect after idle time out.

Currently, o2net connects to a node on hb_up and disconnects on
hb_down and net timeout.

It disconnects on net timeout is ok, but it should attempt to
reconnect back. This is because sometimes nodes get overloaded
enough that the network connection breaks but the disk hb does not.
And if we get into that situation, we either fence (unnecessarily)
or wait for its disk hb to die (and sometimes hang in the process).

So in this updated scheme, when the network disconnects, we keep
attempting to reconnect till we succeed or we get a disk hb down
event.

If the other node is really dead, then we will eventually get a
node down event. If not, we should be able to connect again and
continue.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Cleanup lockres print
Sunil Mushran [Fri, 14 Mar 2008 18:18:24 +0000 (11:18 -0700)]
ocfs2/dlm: Cleanup lockres print

A previous patch added KERN_NOTICE to printks printing the lockres that
cluttered the output. This patch removes the log level. For people concerned
with syslog clutter, please note we now use this facility to print lockres
only during an error.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Fix lockname in lockres print function
Sunil Mushran [Mon, 10 Mar 2008 22:16:29 +0000 (15:16 -0700)]
ocfs2/dlm: Fix lockname in lockres print function

__dlm_print_one_lock_resource was printing lockname incorrectly.
Also, we now use printk directly instead of mlog as the latter prints
the line context which is not useful for this print.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c
Sunil Mushran [Mon, 10 Mar 2008 22:16:28 +0000 (15:16 -0700)]
ocfs2/dlm: Move dlm_print_one_mle() from dlmc to dlmdebug.c

This patch helps in consolidating debugging related functions in dlmdebug.c.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Dumps the purgelist into a debugfs file
Sunil Mushran [Mon, 10 Mar 2008 22:16:27 +0000 (15:16 -0700)]
ocfs2/dlm: Dumps the purgelist into a debugfs file

This patch dumps all the lockres' on the purgelist it can fit in one page
into a debugfs file. Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Dumps the mles into a debugfs file
Sunil Mushran [Mon, 10 Mar 2008 22:16:26 +0000 (15:16 -0700)]
ocfs2/dlm: Dumps the mles into a debugfs file

This patch dumps all mles it can fit in one page into a debugfs file.
Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h
Sunil Mushran [Mon, 10 Mar 2008 22:16:25 +0000 (15:16 -0700)]
ocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h

This patch moves some mle related definitions from dlmmaster.c
to dlmcommon.h. Future patches need these definitions to dump mle
debugging information.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.beckeroracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Dumps the lockres' into a debugfs file
Sunil Mushran [Mon, 10 Mar 2008 22:16:24 +0000 (15:16 -0700)]
ocfs2/dlm: Dumps the lockres' into a debugfs file

This patch dumps all the lockres' alongwith all the locks into
a debugfs file. Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Dump the dlm state in a debugfs file
Sunil Mushran [Mon, 10 Mar 2008 22:16:23 +0000 (15:16 -0700)]
ocfs2/dlm: Dump the dlm state in a debugfs file

This patch dumps the dlm state (dlm_ctxt) into a debugfs file.
Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Create debugfs dirs
Sunil Mushran [Mon, 10 Mar 2008 22:16:22 +0000 (15:16 -0700)]
ocfs2/dlm: Create debugfs dirs

This patch creates the debugfs directories that will hold the
files to be used to dump the dlm state.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Link all lockres' to a tracking list
Sunil Mushran [Mon, 10 Mar 2008 22:16:21 +0000 (15:16 -0700)]
ocfs2/dlm: Link all lockres' to a tracking list

This patch links all the lockres' to a tracking list in dlm_ctxt.
We will use this in an upcoming patch that will walk the entire
list and to dump the lockres states to a debugfs file.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Create slabcaches for lock and lockres
Sunil Mushran [Mon, 10 Mar 2008 22:16:20 +0000 (15:16 -0700)]
ocfs2/dlm: Create slabcaches for lock and lockres

This patch makes the o2dlm allocate memory for lockres, lockname and lock
structures from slabcaches rather than kmalloc. This allows us to not only
make these allocs more efficient but also allows us to track the memory being
consumed by these structures.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2/dlm: Rename slabcache dlm_mle_cache to o2dlm_mle
Sunil Mushran [Mon, 10 Mar 2008 22:16:19 +0000 (15:16 -0700)]
ocfs2/dlm: Rename slabcache dlm_mle_cache to o2dlm_mle

This patch renames dlm_mle_slabcache to prevent namespace clashes with fs/dlm.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Document /sys/fs/ocfs2
Joel Becker [Mon, 31 Mar 2008 23:22:55 +0000 (16:22 -0700)]
ocfs2: Document /sys/fs/ocfs2

Add ABI documentation for these files:

/sys/fs/ocfs2/max_locking_protocol
/sys/fs/ocfs2/loaded_cluster_plugins
/sys/fs/ocfs2/active_cluster_plugin
/sys/fs/ocfs2/cluster_stack

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Allow selection of cluster plug-ins.
Joel Becker [Wed, 5 Mar 2008 01:58:56 +0000 (17:58 -0800)]
ocfs2: Allow selection of cluster plug-ins.

ocfs2 now supports plug-ins for the classic O2CB stack as well as
userspace cluster stacks in conjunction with fs/dlm.  This allows zero,
one, or both of the plug-ins to be selected in Kconfig.  For local mounts
(non-clustered), neither plug-in is needed.  Both plugins can be loaded
at one time, the runtime will select the one needed for the cluster
systme in use.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add kbuild for ocfs2_stack_user.ko
Joel Becker [Wed, 28 Nov 2007 22:53:30 +0000 (14:53 -0800)]
ocfs2: Add kbuild for ocfs2_stack_user.ko

Add ocfs2_stack_user.ko to the Makefile so that it builds.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h
Joel Becker [Wed, 5 Mar 2008 00:09:39 +0000 (16:09 -0800)]
ocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h

The masklog code is in the o2cb stack, but ocfs2_lockid.h now needs to
be included by the user stack.  The BUG() in ocfs2_lock_type_string()
does not need masklog support, so change it to a regular BUG_ON().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: add fsdlm to stackglue
David Teigland [Wed, 20 Feb 2008 22:29:27 +0000 (14:29 -0800)]
ocfs2: add fsdlm to stackglue

Add code to use fs/dlm.

[ Modified to be part of the stack_user module -- Joel ]

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add the 'set version' message to the ocfs2_control device.
Joel Becker [Wed, 20 Feb 2008 23:39:44 +0000 (15:39 -0800)]
ocfs2: Add the 'set version' message to the ocfs2_control device.

The "SETV" message sets the filesystem locking protocol version as
negotiated by the client.  The client negotiates based on the maximum
version advertised in /sys/fs/ocfs2/max_locking_protocol.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add the local node id to the handshake.
Joel Becker [Wed, 20 Feb 2008 22:44:34 +0000 (14:44 -0800)]
ocfs2: Add the local node id to the handshake.

This is the second part of the ocfs2_control handshake.  After
negotiating the ocfs2_control protocol, the daemon tells the filesystem
what the local node id is via the SETN message.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Introduce the DOWN message to ocfs2_control
Joel Becker [Tue, 19 Feb 2008 01:07:09 +0000 (17:07 -0800)]
ocfs2: Introduce the DOWN message to ocfs2_control

When the control daemon sees a node go down, it sends a DOWN message
through the ocfs2_control device.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Start the ocfs2_control handshake.
Joel Becker [Tue, 19 Feb 2008 03:40:12 +0000 (19:40 -0800)]
ocfs2: Start the ocfs2_control handshake.

When a control daemon opens the ocfs2_control device, it must perform a
handshake to tell the filesystem it is something capable of monitoring
cluster status.  Only after the handshake is complete will the filesystem
allow mounts.

This is the first part of the handshake.  The daemon reads all supported
ocfs2_control protocols, then writes in the protocol it will use.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add the ocfs2_control misc device.
Joel Becker [Tue, 19 Feb 2008 03:23:28 +0000 (19:23 -0800)]
ocfs2: Add the ocfs2_control misc device.

The ocfs2_control misc device is how a userspace control daemon (controld)
talks to the filesystem.  Introduce the bare-bones filesystem ops.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add the user stack module.
Joel Becker [Wed, 28 Nov 2007 22:38:40 +0000 (14:38 -0800)]
ocfs2: Add the user stack module.

Add a skeleton for the stack_user module.  It's just the barebones module
code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add the 'cluster_stack' sysfs file.
Joel Becker [Fri, 1 Feb 2008 23:17:30 +0000 (15:17 -0800)]
ocfs2: Add the 'cluster_stack' sysfs file.

Userspace can now query and specify the cluster stack in use via the
/sys/fs/ocfs2/cluster_stack file.  By default, it is 'o2cb', which is
the classic stack.  Thus, old tools that do not know how to modify this
file will work just fine.  The stack cannot be modified if there is a
live filesystem.

ocfs2_cluster_connect() now takes the expected cluster stack as an
argument.  This way, the filesystem and the stack glue ensure they are
speaking to the same backend.

If the stack is 'o2cb', the o2cb stack plugin is used.  For any other
value, the fsdlm stack plugin is selected.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Add the USERSPACE_STACK incompat bit.
Joel Becker [Fri, 1 Feb 2008 23:08:23 +0000 (15:08 -0800)]
ocfs2: Add the USERSPACE_STACK incompat bit.

The filesystem gains the USERSPACE_STACK incomat bit and the
s_cluster_info field on the superblock.  When a userspace stack is in
use, the name of the stack is stored on-disk for mount-time
verification.

The "cluster_stack" option is added to mount(2) processing.  The mount
process needs to pass the matching stack name.  If the passed name and
the on-disk name do not match, the mount is failed.

When using the classic o2cb stack, the incompat bit is *not* set and no
mount option is used other than the usual heartbeat=local.  Thus, the
filesystem is compatible with older tools.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Create stack glue sysfs files.
Joel Becker [Fri, 1 Feb 2008 07:56:17 +0000 (23:56 -0800)]
ocfs2: Create stack glue sysfs files.

Introduce a set of sysfs files that describe the current stack glue
state.  The files live under /sys/fs/ocfs2.  The locking_protocol file
displays the version of ocfs2's locking code.  The
loaded_cluster_plugins file displays all of the currently loaded stack
plugins.  When filesystems are mounted, the active_cluster_plugin file
will display the plugin in use.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Break out stackglue into modules.
Joel Becker [Fri, 1 Feb 2008 23:03:57 +0000 (15:03 -0800)]
ocfs2: Break out stackglue into modules.

We define the ocfs2_stack_plugin structure to represent a stack driver.
The o2cb stack code is split into stack_o2cb.c.  This becomes the
ocfs2_stack_o2cb.ko module.

The stackglue generic functions are similarly split into the
ocfs2_stackglue.ko module.  This module now provides an interface to
register drivers.  The ocfs2_stack_o2cb driver registers itself.  As
part of this interface, ocfs2_stackglue can load drivers on demand.
This is accomplished in ocfs2_cluster_connect().

ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
If a hangup is pending, it will not release the driver module and will
let _hangup() do that.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
16 years agoocfs2: Create ocfs2_stack_operations and split out the o2cb stack.
Joel Becker [Fri, 1 Feb 2008 23:02:36 +0000 (15:02 -0800)]
ocfs2: Create ocfs2_stack_operations and split out the o2cb stack.

Define the ocfs2_stack_operations structure.  Build o2cb_stack_ops from
all of the o2cb-specific stack functions.  Change the generic stack glue
functions to call the stack_ops instead of the o2cb functions directly.

The o2cb functions are moved to stack_o2cb.c.  The headers are cleaned up
to where only needed headers are included.

In this code, stackglue.c and stack_o2cb.c refer to some shared
extern variables.  When they become modules, that will change.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Split o2cb code from generic stack functions.
Joel Becker [Fri, 1 Feb 2008 22:51:03 +0000 (14:51 -0800)]
ocfs2: Split o2cb code from generic stack functions.

Split off the o2cb-specific funtionality from the generic stack glue
calls.  This is a precurser to wrapping the o2cb functionality in an
operations vector.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Clean up stackglue initialization
Joel Becker [Thu, 31 Jan 2008 00:58:36 +0000 (16:58 -0800)]
ocfs2: Clean up stackglue initialization

The stack glue initialization function needs a better name so that it can be
used cleanly when stackglue becomes a module.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Abstract out a debugging function for underlying dlms.
Joel Becker [Wed, 30 Jan 2008 00:59:55 +0000 (16:59 -0800)]
ocfs2: Abstract out a debugging function for underlying dlms.

dlmglue.c was still referencing a raw o2dlm lksb in one instance.  Let's
create a generic ocfs2_dlm_dump_lksb() function.  This allows underlying
DLMs to print whatever they want about their lock.

We then move the o2dlm dump into stackglue.c where it belongs.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: handle async EAGAIN from NOQUEUE request
David Teigland [Thu, 31 Jan 2008 00:52:53 +0000 (16:52 -0800)]
ocfs2: handle async EAGAIN from NOQUEUE request

When using fsdlm, -EAGAIN is returned in the async callback for NOQUEUE
requests. Fix up dlmglue to expect this.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Remove CANCELGRANT from the view of dlmglue.
Joel Becker [Fri, 1 Feb 2008 22:45:08 +0000 (14:45 -0800)]
ocfs2: Remove CANCELGRANT from the view of dlmglue.

o2dlm has the non-standard behavior of providing a cancel callback
(unlock_ast) even when the cancel has failed (the locking operation
succeeded without canceling).  This is called CANCELGRANT after the
status code sent to the callback.  fs/dlm does not provide this
callback, so dlmglue must be changed to live without it.
o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.

Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
needs to check for it.  ocfs2_locking_ast() must catch that a cancel was
tried and clear the cancel state.

Making these changes opens up a locking race.  dlmglue uses the the
OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
one time.  But dlmglue must unlock the lockres before calling into the
dlm.  In the small window of time between unlocking the lockres and
calling the dlm, the downconvert thread can try to cancel the lock.  The
downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
know that ocfs2_dlm_lock() has not yet been called.

Because ocfs2_dlm_lock() has not yet been called, the cancel operation
will just be a no-op.  There's nothing to cancel.  With CANCELGRANT,
dlmglue uses the CANCELGRANT callback to clear up the cancel state.
When it comes around again, it will retry the cancel.  Eventually, the
first thread will have called into ocfs2_dlm_lock(), and either the
lock or the cancel will succeed.  The downconvert thread can then do its
downconvert.

Without CANCELGRANT, there is nothing to clean up the cancellation
state.  The downconvert thread does not know to retry its operations.
More importantly, the original lock may be blocking on the other node
that is trying to cancel us.  With neither able to make progress, the
ast is never called and the cancellation state is never cleaned up that
way.  dlmglue is deadlocked.

The OCFS2_LOCK_PENDING flag is introduced to remedy this window.  It is
set at the same time OCFS2_LOCK_BUSY is.  Thus, the downconvert thread
can check whether the lock is cancelable.  If not, it just loops around
to try again.  Once ocfs2_dlm_lock() is called, the thread then clears
OCFS2_LOCK_PENDING and wakes the downconvert thread.  Now, if the
downconvert thread finds the lock BUSY, it can safely try to cancel it.
Whether the cancel works or not, the state will be properly set and the
lock processing can continue.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Fill node number during cluster stack init
Mark Fasheh [Wed, 30 Jan 2008 00:59:56 +0000 (16:59 -0800)]
ocfs2: Fill node number during cluster stack init

It doesn't make sense to query for a node number before connecting to the
cluster stack. This should be safe to do because node_num is only just
printed,
and we're actually only moving the setting of node num a small amount
further in the mount process.

[ Disconnect when node query fails -- Joel ]

Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Move o2hb functionality into the stack glue.
Joel Becker [Wed, 30 Jan 2008 00:59:56 +0000 (16:59 -0800)]
ocfs2: Move o2hb functionality into the stack glue.

The last bit of classic stack used directly in ocfs2 code is o2hb.
Specifically, the check for heartbeat during mount and the call to
ocfs2_hb_ctl during unmount.

We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
to ocfs2_hb_ctl.  Other stacks will just leave hangup() empty.

The check for heartbeat is moved into ocfs2_cluster_connect().  It will
be matched by a similar check for other stacks.

With this change, only stackglue.c includes cluster/ headers.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Abstract out node number queries.
Joel Becker [Wed, 30 Jan 2008 23:38:24 +0000 (15:38 -0800)]
ocfs2: Abstract out node number queries.

ocfs2 asks the cluster stack for the local node's node number for two
reasons; to fill the slot map and to print it. While the slot map isn't
necessary for userspace cluster stacks, the printing is very nice for
debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get
this value. It is anticipated that the slot map will not be used under a
userspace cluster stack, so validity checks of the node num only need to
exist in the slot map code. Otherwise, it just gets used and printed as an
opaque value.

[ Fixed up some "int" versus "unsigned int" issues and made osb->node_num
  truly opaque. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API.
Joel Becker [Fri, 1 Feb 2008 22:39:35 +0000 (14:39 -0800)]
ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API.

This step introduces a cluster stack agnostic API for initializing and
exiting.  fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
connect to the stack.  It is all handled in stackglue.c.

heartbeat.c no longer needs to know how it gets called.
ocfs2_do_node_down() is now a clean recovery trigger.

The big gotcha is the ordering of initializations and de-initializations done
underneath ocfs2_cluster_connect().  ocfs2_dlm_init() used to do all
o2dlm initialization in one block.  Thus, the o2dlm functionality of
ocfs2_cluster_connect() is very straightforward.  ocfs2_dlm_shutdown(),
however, did a few things between de-registration of the eviction
callback and actually shutting down the domain.  Now de-registration and
shutdown of the domain are wrapped within the single
ocfs2_cluster_disconnect() call.  I've checked the code paths to make
sure we can safely tear down things in ocfs2_dlm_shutdown() before
calling ocfs2_cluster_disconnect().  The filesystem has already set
itself to ignore the callback.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>