GitHub/exynos8895/android_kernel_samsung_universal8895.git
15 years ago[SCSI] libiscsi_tcp: update recv tracking for each skb instead of iscsi pdu
Mike Christie [Wed, 13 May 2009 22:57:44 +0000 (17:57 -0500)]
[SCSI] libiscsi_tcp: update recv tracking for each skb instead of iscsi pdu

Everytime we read in a pdu libiscsi will update a tracking field.
It uses this to decide when to check if the transport might be bad.
If we have not got data in recv_timeout seconds then we will
send a iscsi ping/nop.

If we are on a slow link then it could take a while to read in all
the data for a data_in. In that case we might send a ping/nop when
we do not need to or we might drop a session thinking it is bad
when the lower layer is making forward progress on it.

This patch has libiscsi_tcp update the recv tracking for each skb
(basically network packet from our point of view) instead of the
entire iscsi pdu+data, so we account for these cases where data is
coming in slowly.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] libiscsi: fix nop response/reply and session cleanup race
Mike Christie [Wed, 13 May 2009 22:57:43 +0000 (17:57 -0500)]
[SCSI] libiscsi: fix nop response/reply and session cleanup race

If we are responding to a nop from the target by sending our nop,
and the session is getting torn down, then iscsi_start_session_recovery
could set the conn stop bits while the recv path is sending the nop
response and we will hit the bug ons in __iscsi_conn_send_pdu.

This has us check the state in __iscsi_conn_send_pdu and fail all
incoming mgmt IO if we are  not logged in and if the pdu is not login
related. It also changes the ordering of the setting of conn stop state
bits so they are set after the session state is set (both are set under
the session lock).

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] libiscsi: have iscsi_data_in_rsp call iscsi_update_cmdsn
Mike Christie [Wed, 13 May 2009 22:57:42 +0000 (17:57 -0500)]
[SCSI] libiscsi: have iscsi_data_in_rsp call iscsi_update_cmdsn

This has iscsi_data_in_rsp call iscsi_update_cmdsn when a pdu is
completed like is done for other pdu's that are don.

For libiscsi_tcp, this means that it calls iscsi_update_cmdsn when
it is handling the pdu internally to only transfer data, but if there is
status then it does not need to call it since the completion handling
will do it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] libiscsi: export iscsi_itt_to_task for bnx2i
Mike Christie [Wed, 13 May 2009 22:57:41 +0000 (17:57 -0500)]
[SCSI] libiscsi: export iscsi_itt_to_task for bnx2i

bnx2i needs to be able to look up mgmt task like login and nop, because
it does some processing of them on the completion path. This exports
iscsi_itt_to_task so it can look up the task.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] libiscsi: handle param allocation failures
Mike Christie [Wed, 13 May 2009 22:57:40 +0000 (17:57 -0500)]
[SCSI] libiscsi: handle param allocation failures

If we could not allocate the initiator name or some other id like
the hwaddress or netdev, then userspace could deal with the failure
by just running in a dregraded mode.

Now we want to be able to switch values for the params and we
want some feedback, so this patch will check if a string like
the initiatorname could not be allocated and return an error.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] libiscsi: check of LLD has a alloc pdu callout.
Mike Christie [Wed, 13 May 2009 22:57:39 +0000 (17:57 -0500)]
[SCSI] libiscsi: check of LLD has a alloc pdu callout.

bnx2i does not have one. It currently preallocates the bdt
when the session is setup.

We probably want to change that to a dma pool, then allocate from
the pool in the alloc pdu. Until then check if there is a alloc
pdu callout.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] iscsi: pass ep connect shost
Mike Christie [Wed, 13 May 2009 22:57:38 +0000 (17:57 -0500)]
[SCSI] iscsi: pass ep connect shost

When we create the tcp/ip connection by calling ep_connect, we currently
just go by the routing table info.

I think there are two problems with this.

1. Some drivers do not have access to a routing table. Some drivers like
qla4xxx do not even know about other ports.

2. If you have two initiator ports on the same subnet, the user may have
set things up so that session1 was supposed to be run through port1. and
session2 was supposed to be run through port2. It looks like we could
end with both sessions going through one of the ports.

Fixes for cxgb3i from Karen Xie.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] NCR_D700: fix IRQ handler return type
Zhenwen Xu [Tue, 12 May 2009 20:29:13 +0000 (13:29 -0700)]
[SCSI] NCR_D700: fix IRQ handler return type

drivers/scsi/NCR_D700.c: In function `NCR_D700_probe':
drivers/scsi/NCR_D700.c:322: warning: passing argument 2 of `request_irq' from incompatible pointer type
drivers/scsi/NCR_D700.c:322: warning: passing argument 2 of `request_irq' from incompatible pointer type
drivers/scsi/NCR_D700.c:322: warning: passing argument 2 of `request_irq' from incompatible pointer type

Signed-off-by: Zhenwen Xu <helight.xu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] mvsas: performance improvement using domain_device->lldd_dev
Andy Yan [Mon, 11 May 2009 14:19:25 +0000 (22:19 +0800)]
[SCSI] mvsas: performance improvement using domain_device->lldd_dev

Using sticky field to improve retrieve performance by eliminating some
lookups in . Remove some spurious casts.

Signed-off-by: Ying Chu <jasonchu@marvell.com>
Signed-off-by: Andy Yan <ayan@marvell.com>
Signed-off-by: Ke Wei <kewei@marvell.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] mvsas: correct bit map usage
Andy Yan [Mon, 11 May 2009 13:56:31 +0000 (21:56 +0800)]
[SCSI] mvsas: correct bit map usage

Utilize DECLARE_BITMAP to define the tags array.

Signed-off-by: Ying Chu <jasonchu@marvell.com>
Signed-off-by: Andy Yan <ayan@marvell.com>
Signed-off-by: Ke Wei <kewei@marvell.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] mvsas: bug fix, null pointer may be used
Andy Yan [Mon, 11 May 2009 13:49:52 +0000 (21:49 +0800)]
[SCSI] mvsas: bug fix, null pointer may be used

Null pointer check to avoid corruption.

Signed-off-by: Ying Chu <jasonchu@marvell.com>
Signed-off-by: Andy Yan <ayan@marvell.com>
Signed-off-by: Ke Wei <kewei@marvell.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] mvsas: bug fix of dead lock
Andy Yan [Mon, 11 May 2009 12:05:26 +0000 (20:05 +0800)]
[SCSI] mvsas: bug fix of dead lock

TMF task should be issued with Interrupt Disabled, or Deadlock may take place.
Clean-up unused parameters and conditonal lock.

Signed-off-by: Ying Chu <jasonchu@marvell.com>
Signed-off-by: Andy Yan <ayan@marvell.com>
Signed-off-by: Ke Wei <kewei@marvell.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] mvsas: bug fix with setting task management frame type
Andy Yan [Mon, 11 May 2009 12:01:55 +0000 (20:01 +0800)]
[SCSI] mvsas: bug fix with setting task management frame type

Correct frame type setting according to parameter.

Signed-off-by: Ying Chu <jasonchu@marvell.com>
Signed-off-by: Andy Yan <ayan@marvell.com>
Signed-off-by: Ke Wei <kewei@marvell.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] ipr: fix PCI permanent error handler
Kleber S. Souza [Mon, 4 May 2009 13:41:02 +0000 (10:41 -0300)]
[SCSI] ipr: fix PCI permanent error handler

The ipr driver can hang if it encounters enough PCI errors
to trigger the permanent error handler. The driver will attempt
to initiate a "bringdown" of the adapter and fail all pending
ops back. However, this bringdown is unlike any other bringdown
of the adapter in the code as the driver. In this code path we
end up failing back ops with allow_cmds still set to 1. This results
in some commands, the HCAM commands in particular, getting immediately
re-issued to the adapter on the done call, which results in
an infinite loop in ipr_fail_all_ops. Fix this by setting allow_cmds
to zero in this path.

Signed-off-by: Kleber S. Souza <klebers@linux.vnet.ibm.com>
[brking@linux.vnet.ibm.com: alternate patch substituted]
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] Update wording of CONFIG_SCSI_MULTI_LUN help
Eric Piel [Mon, 4 May 2009 10:43:02 +0000 (12:43 +0200)]
[SCSI] Update wording of CONFIG_SCSI_MULTI_LUN help

I had to set CONFIG_SCSI_MULTI_LUN to y in order to get my SE W595
working when plugging it as a mass storage. Looking at SCSI option to
get a phone behaving correctly was convoluted to say the least. There
are quite a few other reports about USB card readers needing this option
as well. This patch improves the help text to make the use of the option
more obvious.

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] ibmvscsi: Remove redundant test on unsigned.
Roel Kluin [Sat, 2 May 2009 20:14:54 +0000 (22:14 +0200)]
[SCSI] ibmvscsi: Remove redundant test on unsigned.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] st: fix gcc 4.4 warning
Kai Makisara [Sat, 2 May 2009 05:49:34 +0000 (08:49 +0300)]
[SCSI] st: fix gcc 4.4 warning

This patch fixes the GCC 4.4 warning reported by David Binderman and Sergey
Senozhatsky. The old version was working correctly but was not easy to read.

Signed-off-by: Kai Makisara <kai.makisara@kolumbus.fi>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] limit state transitions in scsi_internal_device_unblock
Takahiro Yasui [Wed, 29 Apr 2009 16:13:02 +0000 (12:13 -0400)]
[SCSI] limit state transitions in scsi_internal_device_unblock

scsi timeout on two or more devices may cause extremely long execution
time for user applications because SDEV_OFFLINE state is changed to
SDEV_RUNNING state during scsi error recovery procedures triggered by
a bus reset or a host reset of scsi LLD, and scsi timeout can happens
on the same devices many times.

This happens because scsi_internal_device_unblock() changes device's
state to SDEV_RUNNING even if a device in other states than SDEV_BLOCK,
while the following two transitions are required in this function.

  SDEV_BLOCK -> SDEV_RUNNING
  SDEV_CREATED_BLOCK -> SDEV_CREATED

Otherwise, it returns -EINVAL.

Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
[matthew@wil.cx: supplied rewritten base for patch]
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] fcoe, libfc: fix function declarations to be ANSI-compliant
Randy Dunlap [Tue, 28 Apr 2009 04:49:31 +0000 (21:49 -0700)]
[SCSI] fcoe, libfc: fix function declarations to be ANSI-compliant

Fix function declarations:

drivers/scsi/fcoe/fcoe.c:1356:28: warning: non-ANSI function declaration of function 'fcoe_dev_setup'
drivers/scsi/libfc/fc_rport.c:1293:20: warning: non-ANSI function declaration of function 'fc_setup_rport'
drivers/scsi/libfc/fc_rport.c:1302:23: warning: non-ANSI function declaration of function 'fc_destroy_rport'

[jejb: fixed wrong doc in comment noticed during inspection]
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] scsi_debug: fix virtual disk larger than 1TB
FUJITA Tomonori [Thu, 23 Apr 2009 00:42:25 +0000 (17:42 -0700)]
[SCSI] scsi_debug: fix virtual disk larger than 1TB

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] ipr: ipr_remove() marked __devexit
Kleber S. Souza [Wed, 22 Apr 2009 13:50:28 +0000 (10:50 -0300)]
[SCSI] ipr: ipr_remove() marked __devexit

Marking the ipr clean up function ipr_remove() as __devexit and using
__devexit_p() macro in its address reference.

Signed-off-by: Kleber Sacilotto de Souza <kleber@linux.vnet.ibm.com>
Reported-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] scsi_dh_rdac: Retry for NOT_READY(02/04/01) in rdac device handler
Chauhan, Vijay [Mon, 20 Apr 2009 12:44:23 +0000 (18:14 +0530)]
[SCSI] scsi_dh_rdac: Retry for NOT_READY(02/04/01) in rdac device handler

During device discovery read capacity fails with 0x020401 and sets the
device size to 0. As a reason any I/O submitted to this path gets
killed at sd_prep_fn with BLKPREP_KILL. This patch is to retry for
0x020401. NEED_RETRY in scsi_decide_disposition does not give
sufficient time for the device to become ready.

Signed-off-by: Vijay Chauhan <vijay.chauhan@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] sym53c8xx_2: slave_alloc/destroy safety (2.6.27.5)
Aaro Koskinen [Tue, 14 Apr 2009 20:47:00 +0000 (15:47 -0500)]
[SCSI] sym53c8xx_2: slave_alloc/destroy safety (2.6.27.5)

Make the sym53c8xx_2 driver slave_alloc/destroy less unsafe. References
to the destroyed LCB are cleared from the target structure (instead of
leaving a dangling pointer), and when the last LCB for the target is
destroyed the reference to the upper layer target data is cleared. The
host lock is used to prevent a race with the interrupt handler. Also
user commands are prevented for targets with all LCBs destroyed.

Signed-off-by: Aaro Koskinen <Aaro.Koskinen@nokia.com>
Tested-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] sym53c8xx_2: lun to_clear flag not re-initialized (2.6.27.5)
Aaro Koskinen [Tue, 14 Apr 2009 20:46:59 +0000 (15:46 -0500)]
[SCSI] sym53c8xx_2: lun to_clear flag not re-initialized (2.6.27.5)

(Resent with proper formatting)

Fix for the sym53c8xx_2 driver to initialize lun's to_clear flag after
a bus reset (a failed clear can trigger a bus reset and it should not
be attemped again after that).

Signed-off-by: Aaro Koskinen <Aaro.Koskinen@nokia.com>
Tested-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla1280: error recovery rewrite
Michael Reed [Wed, 8 Apr 2009 19:34:33 +0000 (14:34 -0500)]
[SCSI] qla1280: error recovery rewrite

The driver now waits for the scsi commands associated with a
particular error recovery step to be returned to the mid-layer,
and returns the appropriate SUCCESS or FAILED status.  Removes
unneeded polling of chip for interrupts.

This patch also bumps the driver version number.

Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla1280: driver clean up
Michael Reed [Wed, 8 Apr 2009 19:33:48 +0000 (14:33 -0500)]
[SCSI] qla1280: driver clean up

Remove some unneeded, inactive and unused code, make some trivial
corrections to comments and a printk, and return a proper status
in qla1280_queuecommand.  No fundamental logic changes are made.

Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] Increase default timeout for INQUIRY
Alan Stern [Thu, 12 Mar 2009 15:08:51 +0000 (11:08 -0400)]
[SCSI] Increase default timeout for INQUIRY

This patch (as1224) changes the default timeout for INQUIRY commands
from 3 seconds to 20 seconds, which is the value used by Windows for
USB Mass-Storage devices.  Some of these devices, like the Corsair
Flash Voyager (see Bugzilla #12188) really do need a long timeout.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] mvsas: add support for 94xx; layout change; bug fixes
Andy Yan [Fri, 8 May 2009 21:46:40 +0000 (17:46 -0400)]
[SCSI] mvsas: add support for 94xx; layout change; bug fixes

This version contains following main changes
  - Switch to new layout to support more types of ASIC.
  - SSP TMF supported and related Error Handing enhanced.
  - Support flash feature with delay 2*HZ when PHY changed.
  - Support Marvell 94xx series ASIC for 6G SAS/SATA, which has 2
88SE64xx chips but any different register description.
  - Support SPI flash for HBA-related configuration info.
  - Other patch enhanced from kernel side such as increasing PHY type

[jejb: fold back in DMA_BIT_MASK changes]
Signed-off-by: Ying Chu <jasonchu@marvell.com>
Signed-off-by: Andy Yan <ayan@marvell.com>
Signed-off-by: Ke Wei <kewei@marvell.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] mvsas: split driver into multiple files
Jeff Garzik [Fri, 8 May 2009 21:44:01 +0000 (17:44 -0400)]
[SCSI] mvsas: split driver into multiple files

Split mvsas driver into multiple source codes, based on the split
and function distribution found in Marvell's mvsas update.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] mvsas: move into new directory drivers/scsi/mvsas/
Jeff Garzik [Fri, 8 May 2009 20:35:37 +0000 (16:35 -0400)]
[SCSI] mvsas: move into new directory drivers/scsi/mvsas/

Zero functional changes, just file movement.

This commit prepares for the upcoming integration of the
Marvell-provided driver update that splits the driver into support
for both 64xx and 94xx chip families.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Update version number to 8.03.01-k2.
Andrew Vasquez [Tue, 7 Apr 2009 05:33:51 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Update version number to 8.03.01-k2.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Use port number to compute nvram/vpd parameter offsets.
Anirban Chakraborty [Tue, 7 Apr 2009 05:33:50 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Use port number to compute nvram/vpd parameter offsets.

Read adapter's physical port number from interrupt pin register
and use it instead of pci function number to offset into the
nvram to obtain the port's configuration parameters.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Add an override option to specify ISP firmware load semantics.
Andrew Vasquez [Tue, 7 Apr 2009 05:33:49 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Add an override option to specify ISP firmware load semantics.

As it may be useful during debugging to use a specific firmware
image.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Don't try to 'stop' firmware if already in ROM code.
Andrew Vasquez [Tue, 7 Apr 2009 05:33:48 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Don't try to 'stop' firmware if already in ROM code.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Conditionally disable automatic queue full tracking.
Michael Reed [Tue, 7 Apr 2009 05:33:47 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Conditionally disable automatic queue full tracking.

Changing a lun's queue depth (/sys/block/sdX/device/queue_depth)
isn't sticky when the device is connected via a QLogic fibre
channel adapter.

The QLogic qla2xxx fibre channel driver dynamically adjusts a
lun's queue depth.  If a user has a specific need to limit the
number of commands issued to a lun (say a tape drive, or a shared
raid where the total commands issued to all luns is limited at
the controller level, for example) and writes a limiting value to
/sys/block/sdXX/device/queue_depth, the qla2xxx driver will
silently and gradually increase the queue depth back to the
driver limit of ql2xmaxqdepth.  While reducing this value (module
parameter) or increasing the interval between ramp ups
(ql2xqfullrampup) offers the potential for a work around it would
be better to have the option of just disabling the dynamic
adjustment of queue depth.

This patch implements an "off switch" as a module parameter.

Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Perform an implicit login to the Management Server.
Joe Carnuccio [Tue, 7 Apr 2009 05:33:46 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Perform an implicit login to the Management Server.

Set the conditional plogi option bit whenever logging in the
fabric management server (if it is already logged in, it does not
need an explicit login; an implicit login suffices).

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Restrict model-name/description device-table usage.
Andrew Vasquez [Tue, 7 Apr 2009 05:33:45 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Restrict model-name/description device-table usage.

Information present in static table is only valid for pre-ISP25xx
adapters.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Correct hard-coded address of a second-port's NVRAM.
Harish Zunjarrao [Tue, 7 Apr 2009 05:33:44 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Correct hard-coded address of a second-port's NVRAM.

Signed-off-by: Harish Zunjarrao <harish.zunjarrao@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Correct typo in read_nvram() callback.
Andrew Vasquez [Tue, 7 Apr 2009 05:33:43 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Correct typo in read_nvram() callback.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Remove reference to request queue from scsi request block.
Anirban Chakraborty [Tue, 7 Apr 2009 05:33:42 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Remove reference to request queue from scsi request block.

srbs used to maintain a reference to the request queue on which
it was enqueued. This is no longer required as the request queue
pointer is now maintained in the scsi host that issues the srb.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Add CPU affinity support.
Anirban Chakraborty [Tue, 7 Apr 2009 05:33:41 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Add CPU affinity support.

Set the module parameter ql2xmultique_tag to 1 to enable this
feature. In this mode, the total number of response queues
created is equal to the number of online cpus. Turning the block
layer's rq_affinity mode on enables requests to be routed to the
proper cpu and at the same time it enables completion of the IO
in a response queue that is affined to the cpu in the request
path.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Add QoS support.
Anirban Chakraborty [Tue, 7 Apr 2009 05:33:40 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Add QoS support.

Set the number of request queues to the module paramater
ql2xmaxqueues.  Each vport gets a request queue. The QoS value
set to the request queues determines priority control for queued
IOs. If QoS value is not specified, the vports use the default
queue 0.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Correct compilation failures when DEBUG'n' options are enabled.
Andrew Vasquez [Tue, 7 Apr 2009 05:33:39 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Correct compilation failures when DEBUG'n' options are enabled.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Export additional FCoE attributes for application support.
Andrew Vasquez [Tue, 7 Apr 2009 05:33:38 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Export additional FCoE attributes for application support.

Cull and export VN_Port MAC address and VLAN_ID information on
supported FCoE ISPs.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years ago[SCSI] qla2xxx: Correct bus-reset behaviour with recent ISPs.
Seokmann Ju [Tue, 7 Apr 2009 05:33:37 +0000 (22:33 -0700)]
[SCSI] qla2xxx: Correct bus-reset behaviour with recent ISPs.

The short-circuit to skip the non-applicable 'full-login-lip'
process on 81xx ISPs was nested too deeply in the 'bus-reset'
routine, as the code in qla2x00_loop_reset() should skip the
whole enable_lip_full_login process.  The original code could
cause device tear-down due to the qla2x00_wait_for_loop_ready()
call taking a large amount of time.

Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years agoMerge branch 'scsi-fixes' into merge-base
James Bottomley [Wed, 20 May 2009 22:20:44 +0000 (17:20 -0500)]
Merge branch 'scsi-fixes' into merge-base

15 years agoMerge branch 'fixes-for-linus' of git://git.monstr.eu/linux-2.6-microblaze
Linus Torvalds [Tue, 19 May 2009 18:31:56 +0000 (11:31 -0700)]
Merge branch 'fixes-for-linus' of git://git.monstr.eu/linux-2.6-microblaze

* 'fixes-for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Fix kind-of-intr checking against number of interrupts
  microblaze: Update Microblaze defconfig

15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
Linus Torvalds [Tue, 19 May 2009 18:31:24 +0000 (11:31 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/lrg/voltage-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
  regulator: da903x: add missing __devexit_p()

15 years agoAvoid ICE in get_random_int() with gcc-3.4.5
Linus Torvalds [Tue, 19 May 2009 18:25:35 +0000 (11:25 -0700)]
Avoid ICE in get_random_int() with gcc-3.4.5

Martin Knoblauch reports that trying to build 2.6.30-rc6-git3 with
RHEL4.3 userspace (gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)) causes an
internal compiler error (ICE):

    drivers/char/random.c: In function `get_random_int':
    drivers/char/random.c:1672: error: unrecognizable insn:
    (insn 202 148 150 0 /scratch/build/linux-2.6.30-rc6-git3/arch/x86/include/asm/tsc.h:23 (set (reg:SI 0 ax [91])
            (subreg:SI (plus:DI (plus:DI (reg:DI 0 ax [88])
                        (subreg:DI (reg:SI 6 bp) 0))
                    (const_int -4 [0xfffffffffffffffc])) 0)) -1 (nil)
        (nil))
    drivers/char/random.c:1672: internal compiler error: in extract_insn, at recog.c:2083

and after some debugging it turns out that it's due to the code trying
to figure out the rough value of the current stack pointer by taking an
address of an uninitialized variable and casting that to an integer.

This is clearly a compiler bug, but it's not worth fighting - while the
current stack kernel pointer might be somewhat hard to predict in user
space, it's also not generally going to change for a lot of the call
chains for a particular process.

So just drop it, and mumble some incoherent curses at the compiler.

Tested-by: Martin Knoblauch <spamtrap@knobisoft.de>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agonfs: Fix NFS v4 client handling of MAY_EXEC in nfs_permission.
Frank Filz [Mon, 18 May 2009 21:41:40 +0000 (17:41 -0400)]
nfs: Fix NFS v4 client handling of MAY_EXEC in nfs_permission.

The problem is that permission checking is skipped if atomic open is
possible, but when exec opens a file, it just opens it O_READONLY which
means EXEC permission will not be checked at that time.

This problem is observed by the following sequence (executed as root):

  mount -t nfs4 server:/ /mnt4
  echo "ls" >/mnt4/foo
  chmod 744 /mnt4/foo
  su guest -c "mnt4/foo"

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Tested-by: Eugene Teo <eugeneteo@kernel.sg>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years ago[SCSI] mpt2sas: fix driver version inconsistency
Eric Moore [Mon, 18 May 2009 18:57:24 +0000 (12:57 -0600)]
[SCSI] mpt2sas: fix driver version inconsistency

In Commit

commit 3b8b5c9b1f08660583e5dfe095c24170df62f1d2
Author: Eric Moore <eric.moore@lsi.com>
Date:   Tue Apr 21 15:44:27 2009 -0600

    [SCSI] mpt2sas : bump driver version to 01.100.02.00

The MPT2SAS_MAJOR_VERSION didn't get bumped from 00 to 01 so
applications will see it incorrectly as 00.100.02.00 driver instead of
01.100.02.00.  Fix by making MPT2SAS_MAJOR_VERSION match the major
number in MPT2SAS_DRIVER_VERSION

Signed-off-by: Eric Moore <eric.moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
15 years agoMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Linus Torvalds [Mon, 18 May 2009 17:22:04 +0000 (10:22 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Explicit alignment for .data.cacheline_aligned
  powerpc/ps3: Update ps3_defconfig
  powerpc/ftrace: Fix constraint to be early clobber
  powerpc/ftrace: Use pr_devel() in ftrace.c
  powerpc: Do not assert pte_locked for hugepage PTE entries

15 years agoMerge branches 'sched-fixes-for-linus-2' and 'core-fixes-for-linus-2' of git://git...
Linus Torvalds [Mon, 18 May 2009 17:11:06 +0000 (10:11 -0700)]
Merge branches 'sched-fixes-for-linus-2' and 'core-fixes-for-linus-2' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix fallback sched_clock()'s offset when using jiffies

* 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: increase MAX_LOCKDEP_ENTRIES and MAX_LOCKDEP_CHAINS

15 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 18 May 2009 16:17:37 +0000 (09:17 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix performance regression caused by paravirt_ops on native kernels
  xen: use header for EXPORT_SYMBOL_GPL
  x86, 32-bit: fix kernel_trap_sp()
  x86: fix percpu_{to,from}_op()
  x86: mtrr: Fix high_width computation when phys-addr is >= 44bit
  x86: Fix false positive section mismatch warnings in the apic code

15 years agoMerge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 18 May 2009 16:15:41 +0000 (09:15 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Append prompt in /debug/tracing/README file
  x86/function-graph: fix constraint for recording old return value

15 years agoFix oops on close of hot-unplugged FTDI serial converter
David Woodhouse [Mon, 18 May 2009 12:07:35 +0000 (13:07 +0100)]
Fix oops on close of hot-unplugged FTDI serial converter

Commit c45d6320 ("fix reference counting of ftdi_private") stopped
ftdi_sio_port_remove() from directly freeing the port-private data, with
the intention if the port was still open, it would be freed when
ftdi_close() is eventually called and releases the last refcount on the
structure.

That's all very well, but ftdi_sio_port_remove() still contains a call
to usb_set_serial_port_data(port, NULL) -- so by the time we get to
ftdi_close() for the port which was unplugged, it _still_ oopses on
dereferencing that NULL pointer, as it did before (and does in 2.6.29).

The fix is just not to clear the private data in ftdi_sio_port_remove().
Then the refcount is properly reduced to zero when the final kref_put()
happens in ftdi_close().

Remove a bogus comment too, while we're at it. And stop doing things
inside "if (priv)" -- it must _always_ be there.

Based loosely on an earlier patch by Daniel Mack, and suggestions by
Alan Stern.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Tested-by: Daniel Mack <daniel@caiaq.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agomtd_dataflash: unbreak erase support
Peter Korsgaard [Mon, 18 May 2009 10:13:54 +0000 (11:13 +0100)]
mtd_dataflash: unbreak erase support

Commit 5b7f3a50 (fix dataflash 64-bit divisions) unfortunately
introduced a typo. Erase addr and len were swapped in the pageaddr
calculation, causing the wrong sectors to get erased.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoasm-generic: fix local_add_unless macro
Roel Kluin [Mon, 18 May 2009 01:18:58 +0000 (18:18 -0700)]
asm-generic: fix local_add_unless macro

`local_add_unless(x, y, z)' will be expanded to `(&(x)->y, (y), (x))', but
`&(x)->y' should be `&(x)->a'

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agomicroblaze: Fix kind-of-intr checking against number of interrupts
Michal Simek [Thu, 14 May 2009 11:35:52 +0000 (13:35 +0200)]
microblaze: Fix kind-of-intr checking against number of interrupts

+ Fix typographic fault.

Signed-off-by: Michal Simek <monstr@monstr.eu>
15 years agomicroblaze: Update Microblaze defconfig
Michal Simek [Mon, 11 May 2009 07:24:47 +0000 (09:24 +0200)]
microblaze: Update Microblaze defconfig

Signed-off-by: Michal Simek <monstr@monstr.eu>
15 years agoregulator: da903x: add missing __devexit_p()
Mike Frysinger [Fri, 15 May 2009 18:50:33 +0000 (14:50 -0400)]
regulator: da903x: add missing __devexit_p()

The remove function uses __devexit, so the .remove assignment needs
__devexit_p() to fix a build error with hotplug disabled.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
CC: Liam Girdwood <lrg@slimlogic.co.uk>
CC: Mike Rapoport <mike@compulab.co.il>
CC: Eric Miao <eric.miao@marvell.com>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
15 years agopowerpc: Explicit alignment for .data.cacheline_aligned
Benjamin Herrenschmidt [Sun, 17 May 2009 18:29:03 +0000 (18:29 +0000)]
powerpc: Explicit alignment for .data.cacheline_aligned

I don't think anything guarantees that the objects in data.page_aligned
are a multiple of PAGE_SIZE, thus the section may end on any boundary.

So the following section, .data.cacheline_aligned needs an explicit
alignment.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
15 years agopowerpc/ps3: Update ps3_defconfig
Geoff Levand [Fri, 15 May 2009 08:01:59 +0000 (08:01 +0000)]
powerpc/ps3: Update ps3_defconfig

Refresh and set these options:

 CONFIG_SYSFS_DEPRECATED_V2: y -> n
 CONFIG_INPUT_JOYSTICK:      y -> n
 CONFIG_HID_SONY:            n -> m
 CONFIG_RTC_DRV_PS3:         - -> m

Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
15 years agopowerpc/ftrace: Fix constraint to be early clobber
Steven Rostedt [Fri, 15 May 2009 04:33:54 +0000 (04:33 +0000)]
powerpc/ftrace: Fix constraint to be early clobber

After upgrading my distcc boxes from gcc 4.2.2 to 4.4.0, the function
graph tracer broke. This was discovered on my x86 boxes.

The issue is that gcc used the same register for an output as it did for
an input in an asm statement. I first thought this was a bug in gcc and
reported it. I was notified that gcc was correct and that the output had
to be flagged as an "early clobber".

I noticed that powerpc had the same issue and this patch fixes it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
15 years agopowerpc/ftrace: Use pr_devel() in ftrace.c
Michael Ellerman [Wed, 13 May 2009 20:30:24 +0000 (20:30 +0000)]
powerpc/ftrace: Use pr_devel() in ftrace.c

pr_debug() can now result in code being generated even when #DEBUG
is not defined. That's not really desirable in the ftrace code
which we want to be snappy.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   3334     672       4    4010     faa arch/powerpc/kernel/ftrace.o

size after:
   text    data     bss     dec     hex filename
   2616     360       4    2980     ba4 arch/powerpc/kernel/ftrace.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
15 years agopowerpc: Do not assert pte_locked for hugepage PTE entries
Mel Gorman [Thu, 30 Apr 2009 10:59:19 +0000 (10:59 +0000)]
powerpc: Do not assert pte_locked for hugepage PTE entries

With CONFIG_DEBUG_VM, an assertion is made when changing the protection
flags of a PTE that the PTE is locked. Huge pages use a different pagetable
format and the assertion is bogus and will always trigger with a bug looking
something like

 Unable to handle kernel paging request for data at address 0xf1a00235800006f8
 Faulting instruction address: 0xc000000000034a80
 Oops: Kernel access of bad area, sig: 11 [#1]
 SMP NR_CPUS=32 NUMA Maple
 Modules linked in: dm_snapshot dm_mirror dm_region_hash
  dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic
  pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid
  windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor
  windfarm_cpufreq_clamp windfarm_core i2c_powermac
 NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003
 REGS: c000000003037600 TRAP: 0300   Not tainted (2.6.30-rc3-autokern1)
 MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 28002484  XER: 200fffff
 DAR: f1a00235800006f8, DSISR: 0000000040010000
 TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2
 GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500
 GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001
 GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8
 GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000
 GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20
 GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000
 GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8
 GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880
 NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0
 LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
 Call Trace:
 [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable)
 [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
 [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674
 [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8
 [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828
 [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584
 [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c
 Instruction dump:
 7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4
 7d290214 7929a302 1d290068 7d6b4a14 <800b00107c000074 7800d182 0b000000

This patch fixes the problem by not asseting the PTE is locked for VMAs
backed by huge pages.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
15 years agopage-writeback: fix the calculation of the oldest_jif in wb_kupdate()
Toshiyuki Okajima [Sun, 17 May 2009 05:56:28 +0000 (22:56 -0700)]
page-writeback: fix the calculation of the oldest_jif in wb_kupdate()

wb_kupdate() function has a bug on linux-2.6.30-rc5.  This bug causes
generic_sync_sb_inodes() to start to write inodes back much earlier than
our expectations because it miscalculates oldest_jif in wb_kupdate().

This bug was introduced in 704503d836042d4a4c7685b7036e7de0418fbc0f
('mm: fix proc_dointvec_userhz_jiffies "breakage"').

Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Sun, 17 May 2009 22:48:05 +0000 (15:48 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: padlock - Revert aes-all alias to aes
  crypto: api - Fix algorithm module auto-loading
  crypto: eseqiv - Fix IV generation for sync algorithms
  crypto: ixp4xx - check firmware for crypto support

15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sun, 17 May 2009 18:46:22 +0000 (11:46 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/rafael/suspend-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: check sysdev_suspend(PMSG_FREEZE) return value

15 years agoreiserfs: fixup perms when xattrs are disabled
Jeff Mahoney [Sun, 17 May 2009 05:02:03 +0000 (01:02 -0400)]
reiserfs: fixup perms when xattrs are disabled

This adds CONFIG_REISERFS_FS_XATTR protection from reiserfs_permission.

This is needed to avoid warnings during file deletions and chowns with
xattrs disabled.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoreiserfs: deal with NULL xattr root w/ xattrs disabled
Jeff Mahoney [Sun, 17 May 2009 05:02:02 +0000 (01:02 -0400)]
reiserfs: deal with NULL xattr root w/ xattrs disabled

This avoids an Oops in open_xa_root that can occur when deleting a file
with xattrs disabled.  It assumes that the xattr root will be there, and
that is not guaranteed.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoreiserfs: clean up ifdefs
Jeff Mahoney [Sun, 17 May 2009 05:02:01 +0000 (01:02 -0400)]
reiserfs: clean up ifdefs

With xattr cleanup even with xattrs disabled, much of the initial setup
is still performed.  Some #ifdefs are just not needed since the options
they protect wouldn't be available anyway.

This cleans those up.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg...
Linus Torvalds [Sun, 17 May 2009 18:44:19 +0000 (11:44 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/penberg/slab-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  mm: SLOB fix reclaim_state
  mm: SLUB fix reclaim_state
  slub: add Documentation/ABI/testing/sysfs-kernel-slab
  slub: enforce MAX_ORDER

15 years agoFix caller information for warn_slowpath_null
Linus Torvalds [Sat, 16 May 2009 20:41:28 +0000 (13:41 -0700)]
Fix caller information for warn_slowpath_null

Ian Campbell noticed that since "Eliminate thousands of warnings with
gcc 3.2 build" (commit 57adc4d2dbf968fdbe516359688094eef4d46581) all
WARN_ON()'s currently appear to come from warn_slowpath_null(), eg:

  WARNING: at kernel/softirq.c:143 warn_slowpath_null+0x1c/0x20()

because now that warn_slowpath_null() is in the call path, the
__builtin_return_address(0) returns that, rather than the place that
caused the warning.

Fix this by splitting up the warn_slowpath_null/fmt cases differently,
using a common helper function, and getting the return address in the
right place.  This also happens to avoid the unnecessary stack usage for
the non-stdargs case, and just generally cleans things up.

Make the function name printout use %pS while at it.

Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
Linus Torvalds [Sat, 16 May 2009 19:47:11 +0000 (12:47 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/bart/ide-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
  piix: The Sony TZ90 needs the cable type hardcoding
  icside: register second channel of version 6 PCB
  ide-tape: remove back-to-back REQUEST_SENSE detection

15 years agoMerge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux...
Linus Torvalds [Sat, 16 May 2009 18:22:06 +0000 (11:22 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPI: Idle C-states disabled by max_cstate should not disable the TSC
  ACPI: idle: fix init-time TSC check regression
  ACPI processor: reset the throttling state once it's invalid
  ACPI processor: introduce module parameter processor.ignore_tpc
  ACPI, i915: build fix
  ACPI: suspend: restore BM_RLD on resume
  ACPI: resume: re-enable SCI-enable workaround
  thermal: fix off-by-1 error in trip point trigger condition
  eeepc-laptop: unregister_rfkill_notifier on failure
  asus-laptop: fix input keycode
  eeepc-laptop: support for super hybrid engine (SHE)
  eeepc-laptop: Work around rfkill firmware bug
  eeepc-laptop: report brightness control events via the input layer
  eeepc-laptop: fix wlan rfkill state change during init
  ACPI: suspend: don't let device _PS3 failure prevent suspend
  ACPI: power: update error message
  ACPI: video: DMI workaround another broken Acer BIOS enabling display brightness
  ACPICA: use acpi.* modparam namespace
  ACPI video: dmi check for broken _BQC on Acer Aspire 5720

15 years agopiix: The Sony TZ90 needs the cable type hardcoding
Alan Cox [Sat, 16 May 2009 17:03:36 +0000 (19:03 +0200)]
piix: The Sony TZ90 needs the cable type hardcoding

The Sony TZ90 needs the cable type hardcoding. See bug #12734

Signed-off-by: Alan Cox <alan@linux.intel.com>
Reported-by: Jonathan E. Snow <jesnow@uh.edu>
[bart: port it from ata_piix to piix and give reporter the proper credit]
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
15 years agoicside: register second channel of version 6 PCB
Sergei Shtylyov [Sat, 16 May 2009 17:03:36 +0000 (19:03 +0200)]
icside: register second channel of version 6 PCB

The second IDE channel of version 6 PCB is not being registered anymore since
the commit 48c3c1072651922ed153bcf0a33ea82cf20df390 (ide: add struct ide_host
(take 3)).

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
15 years agoide-tape: remove back-to-back REQUEST_SENSE detection
Tejun Heo [Sat, 18 Apr 2009 22:00:41 +0000 (07:00 +0900)]
ide-tape: remove back-to-back REQUEST_SENSE detection

Impact: fix an oops which always triggers

ide_tape_issue_pc() assumed drive->pc isn't NULL on invocation when
checking for back-to-back request sense issues but drive->pc can be
NULL and even when it's not NULL, it's not safe to dereference it once
the previous command is complete because pc could have been freed or
was on stack.  Kill back-to-back REQUEST_SENSE detection.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
15 years agoMerge branches 'release', 'bugzilla-13032', 'bugzilla-13041+', 'bugzilla-13121',...
Len Brown [Sat, 16 May 2009 05:55:59 +0000 (01:55 -0400)]
Merge branches 'release', 'bugzilla-13032', 'bugzilla-13041+', 'bugzilla-13121', 'bugzilla-13165', 'bugzilla-13243', 'bugzilla-13259', 'resume-sci-en-regression', 'thermal-regression', 'tsc-regression' and 'asus-2.6.30' into release

15 years agoACPI: Idle C-states disabled by max_cstate should not disable the TSC
Len Brown [Fri, 15 May 2009 05:29:31 +0000 (01:29 -0400)]
ACPI: Idle C-states disabled by max_cstate should not disable the TSC

Processor idle power states C2 and C3 stop the TSC on many machines.
Linux recognizes this situation and marks the TSC as unstable:

Marking TSC unstable due to TSC halts in idle

But if those same machines are booted with "processor.max_cstate=1",
then there is no need to validate C2 and C3, and no need to
disable the TSC, which can be reliably used as a clocksource.

Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
15 years agoACPI: idle: fix init-time TSC check regression
Len Brown [Thu, 14 May 2009 21:27:38 +0000 (17:27 -0400)]
ACPI: idle: fix init-time TSC check regression

A previous 2.6.30 patch, a71e4917dc0ebbcb5a0ecb7ca3486643c1c9a6e2,
(ACPI: idle: mark_tsc_unstable() at init-time, not run-time)
erroneously disabled the TSC on systems that did not actually
have valid deep C-states.

Move the check after the deep-C-states are validated,
via new helper, tsc_check_state(), hich replaces tsc_halts_in_c().

Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Frans Pop <elendil@planet.nl>
15 years agoLinux 2.6.30-rc6
Linus Torvalds [Sat, 16 May 2009 04:12:57 +0000 (21:12 -0700)]
Linux 2.6.30-rc6

15 years agoACPI processor: reset the throttling state once it's invalid
Zhang Rui [Mon, 11 May 2009 01:36:01 +0000 (09:36 +0800)]
ACPI processor: reset the throttling state once it's invalid

If the BIOS hands us an invalid throttling state,
write a valid state.

http://bugzilla.kernel.org/show_bug.cgi?id=13259

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: James Ettle <theholyettlz@googlemail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
15 years agoACPI processor: introduce module parameter processor.ignore_tpc
Zhang Rui [Mon, 11 May 2009 01:35:57 +0000 (09:35 +0800)]
ACPI processor: introduce module parameter processor.ignore_tpc

Introduce module parameter processor.ignore_tpc.

Some laptops are shipped with buggy _TPC,
this module parameter is used to to disable the buggy support.

http://bugzilla.kernel.org/show_bug.cgi?id=13259

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: James Ettle <theholyettlz@googlemail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
15 years agoACPI, i915: build fix
Len Brown [Fri, 24 Apr 2009 15:33:47 +0000 (11:33 -0400)]
ACPI, i915: build fix

drivers/built-in.o: In function `intel_opregion_init':
(.text+0x9d540): undefined reference to `acpi_video_register'

http://bugzilla.kernel.org/show_bug.cgi?id=13165

Signed-off-by: Len Brown <len.brown@intel.com>
15 years agoACPI: suspend: restore BM_RLD on resume
Len Brown [Fri, 8 May 2009 02:19:45 +0000 (22:19 -0400)]
ACPI: suspend: restore BM_RLD on resume

In 2.6.29,
31878dd86b7df9a147f5e6cc6e07092b4308782b
"ACPI: remove BM_RLD access from idle entry path"
moved BM_RLD initialization to init-time from run time.

But we discovered that some BIOS do not restore BM_RLD
after suspend, causing device errors on C3 and C4
after resume.  So now the kernel restores BM_RLD.

http://bugzilla.kernel.org/show_bug.cgi?id=13032

Signed-off-by: Len Brown <len.brown@intel.com>
15 years agoACPI: resume: re-enable SCI-enable workaround
Lin Ming [Sat, 16 May 2009 02:27:49 +0000 (22:27 -0400)]
ACPI: resume: re-enable SCI-enable workaround

The BIOS bug workaround mistakenly got disabled
when we followed the ACPI specification more closely
by ignoring OS updates to that bit.

(The BIOS is supposed to update SCI_EN, not the OS)

http://bugzilla.kernel.org/show_bug.cgi?id=13289

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes...
Linus Torvalds [Fri, 15 May 2009 23:47:55 +0000 (16:47 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI MSI: Fix MSI-X with NIU cards
  PCI: Fix pci-e port driver slot_reset bad default return value

15 years agoPM: check sysdev_suspend(PMSG_FREEZE) return value
Bjorn Helgaas [Fri, 15 May 2009 21:30:50 +0000 (23:30 +0200)]
PM: check sysdev_suspend(PMSG_FREEZE) return value

Check the return value of sysdev_suspend().  I think this was a typo.
Without this change, the following "if" check is always false.
I also changed the error message so it's distinguishable from the
similar message a few lines above.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6
Linus Torvalds [Fri, 15 May 2009 21:29:53 +0000 (14:29 -0700)]
Merge git://git./linux/kernel/git/holtmann/bluetooth-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6:
  Bluetooth: Don't trigger disconnect timeout for security mode 3 pairing
  Bluetooth: Don't use hci_acl_connect_cancel() for incoming connections
  Bluetooth: Fix wrong module refcount when connection setup fails

Another case of me handling the fallout from Davem's unfortunate
addiction to shuffleboard.

Won't anybody think of the children? Join the anti-shuffleboard league
today!

15 years agoMerge branch 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt...
Linus Torvalds [Fri, 15 May 2009 20:22:11 +0000 (13:22 -0700)]
Merge branch 'drm-intel-next' of git://git./linux/kernel/git/anholt/drm-intel

* 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
  drm/i915: Add new GET_PIPE_FROM_CRTC_ID ioctl.
  drm/i915: Set HDMI hot plug interrupt enable for only the output in question.
  drm/i915: Include 965GME pci ID in IS_I965GM(dev) to match UMS.
  drm/i915: Use the GM45 VGA hotplug workaround on G45 as well.
  drm/i915: ignore LVDS on intel graphics systems that lie about having it
  drm/i915: sanity check IER at wait_request time
  drm/i915: workaround IGD i2c bus issue in kernel side (v2)
  drm/i915: Don't allow binding objects into the last page of the aperture.
  drm/i915: save/restore fence registers across suspend/resume
  drm/i915: x86 always has writeq. Add I915_READ64 for symmetry.

15 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Fri, 15 May 2009 19:04:37 +0000 (12:04 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: Media rotation rate and form factor heuristics
  libata: Report disk alignment and physical block size
  sata_fsl: Fix the command description of FSL SATA controller
  sata_fsl: Fix compile warnings
  [libata] sata_sx4: fixup interrupt handling
  [libata] sata_sx4: convert to new exception handling methods

15 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
Linus Torvalds [Fri, 15 May 2009 19:01:59 +0000 (12:01 -0700)]
Merge git://git./linux/kernel/git/linville/wireless-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6:
  iwlwifi: fix device id registration for 6000 series 2x2 devices
  ath5k: update channel in sw state after stopping RX and TX
  rtl8187: use DMA-aware buffers with usb_control_msg
  mac80211: avoid NULL ptr deref when finding max_rates in PID and minstrel
  airo: airo_get_encode{,ext} potential buffer overflow

Pulled directly by Linus because Davem is off playing shuffle-board at
some Alaskan cruise, and the NULL ptr deref issue hits people and should
get merged sooner rather than later.

David - make us proud on the shuffle-board tournament!

15 years agolibata: Media rotation rate and form factor heuristics
Martin K. Petersen [Fri, 15 May 2009 04:40:35 +0000 (00:40 -0400)]
libata: Media rotation rate and form factor heuristics

This patch provides new heuristics for parsing both the form factor and
media rotation rate ATA IDENFITY words.

The reported ATA version must be 7 or greater and the device must return
values defined as valid in the standard.  Only then are the
characteristics reported to SCSI via the VPD B1 page.

This seems like a reasonable compromise to me considering that we have
been shipping several kernel releases that key off the rotation rate bit
without any version checking whatsoever.  With no complaints so far.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agolibata: Report disk alignment and physical block size
Martin K. Petersen [Fri, 15 May 2009 04:40:34 +0000 (00:40 -0400)]
libata: Report disk alignment and physical block size

For disks with 4KB sectors, report the correct block size and alignment
when filling out the READ CAPACITY(16) response.

This patch is based upon code from Matthew Wilcox' 4KB ATA tree.  I
fixed the bug I reported a while back caused by ATA and SCSI using
different approaches to describing the alignment.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agosata_fsl: Fix the command description of FSL SATA controller
Dave Liu [Thu, 14 May 2009 14:47:07 +0000 (09:47 -0500)]
sata_fsl: Fix the command description of FSL SATA controller

The bit 11 of command description is reserved bit in Freescale
SATA controller and needs to be set to '1'.  This is needed to
make sure the last write from the controller to the buffer
descriptor is seen before an interrupt is raised.

Signed-off-by: Dave Liu <daveliu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agosata_fsl: Fix compile warnings
Kumar Gala [Thu, 14 May 2009 03:10:50 +0000 (22:10 -0500)]
sata_fsl: Fix compile warnings

We we build with dma_addr_t as a 64-bit quantity we get:

drivers/ata/sata_fsl.c: In function 'sata_fsl_fill_sg':
drivers/ata/sata_fsl.c:340: warning: format '%x' expects type 'unsigned int', but argument 4 has type 'dma_addr_t'

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years ago[libata] sata_sx4: fixup interrupt handling
David Milburn [Wed, 13 May 2009 23:02:21 +0000 (18:02 -0500)]
[libata] sata_sx4: fixup interrupt handling

Issuing ATA_CMD_SET_FEATURES (0xef) times out because
pdc20621_interrupt ignores command completion since
ATA_TFLAG_POLLING flag is set.

This has already been fixed for sata_promise:

commit 51b94d2a5a90d4800e74d7348bcde098a28f4fb3
Author: Tejun Heo <htejun@gmail.com>
Date:   Fri Jun 8 13:46:55 2007 -0700

    sata_promise: use TF interface for polling NODATA commands

Also, this patch includes Mikael's original patches:

http://marc.info/?l=linux-ide&m=121135828227724&w=2
http://marc.info/?l=linux-ide&m=121144512109826&w=2

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agox86: Fix performance regression caused by paravirt_ops on native kernels
Jeremy Fitzhardinge [Thu, 14 May 2009 00:16:55 +0000 (17:16 -0700)]
x86: Fix performance regression caused by paravirt_ops on native kernels

Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.

It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:

 | commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8
 | Date:   Mon Jul 7 12:07:51 2008 -0700
 |
 |     paravirt: introduce a "lock-byte" spinlock implementation

The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test).  The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons).  This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions.  But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).

If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).

Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:

nopv Pv-nospin Pv-spin
CPU cycles 100.00% 99.89% 102.18%
instructions 100.00% 100.10% 100.15%
CPI 100.00% 99.79% 102.03%
cache ref 100.00% 100.84% 100.28%
cache miss 100.00% 90.47% 88.56%
cache miss rate 100.00% 89.72% 88.31%
branches 100.00% 99.93% 100.04%
branch miss 100.00% 103.66% 107.72%
branch miss rt 100.00% 103.73% 107.67%
wallclock 100.00% 99.90% 102.20%

The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.

(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates.  Not too surprising, but it suggests that
the non-pvops kernel is over-inlined.  On the flipside,
the branch misses go up correspondingly...)

So, what's the fix?

Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls.  For example, the compiler
generated code for paravirtualized _spin_lock is:

<_spin_lock+0>: mov    %gs:0xb4c8,%rax
<_spin_lock+9>: incl   0xffffffffffffe044(%rax)
<_spin_lock+15>: callq  *0xffffffff805a5b30
<_spin_lock+22>: retq

The indirect call will get patched to:
<_spin_lock+0>: mov    %gs:0xb4c8,%rax
<_spin_lock+9>: incl   0xffffffffffffe044(%rax)
<_spin_lock+15>: callq <__ticket_spin_lock>
<_spin_lock+20>: nop; nop /* or whatever 2-byte nop */
<_spin_lock+22>: retq

One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled).  That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case.  The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.

The other obvious answer is to disable pv-spinlocks.  Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin).  Still it is a
reasonable short-term workaround.

[ Impact: fix pvops performance regression when running native ]

Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: "Li Xin" <xin.li@intel.com>
Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>