Eric Wheeler [Fri, 26 Feb 2016 22:33:56 +0000 (14:33 -0800)]
bcache: cleaned up error handling around register_cache()
Fix null pointer dereference by changing register_cache() to return an int
instead of being void. This allows it to return -ENOMEM or -ENODEV and
enables upper layers to handle the OOM case without NULL pointer issues.
See this thread:
http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3521
Fixes this error:
gargamel:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register
bcache: register_cache() error opening sdh2: cannot allocate memory
BUG: unable to handle kernel NULL pointer dereference at
00000000000009b8
IP: [<
ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache]
PGD
120dff067 PUD
1119a3067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: veth ip6table_filter ip6_tables
(...)
CPU: 4 PID: 3371 Comm: kworker/4:3 Not tainted 4.4.2-amd64-i915-volpreempt-
20160213bc1 #3
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
Workqueue: events cache_set_flush [bcache]
task:
ffff88020d5dc280 ti:
ffff88020b6f8000 task.ti:
ffff88020b6f8000
RIP: 0010:[<
ffffffffc05a7e8d>] [<
ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache]
Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Tested-by: Marc MERLIN <marc@merlins.org>
Cc: <stable@vger.kernel.org>
Eric Wheeler [Fri, 26 Feb 2016 22:39:06 +0000 (14:39 -0800)]
bcache: fix race of writeback thread starting before complete initialization
The bch_writeback_thread might BUG_ON in read_dirty() if
dc->sb==BDEV_STATE_DIRTY and bch_sectors_dirty_init has not yet completed
its related initialization. This patch downs the dc->writeback_lock until
after initialization is complete, thus preventing bch_writeback_thread
from proceeding prematurely.
See this thread:
http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3453
Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Tested-by: Marc MERLIN <marc@merlins.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Keith Busch [Fri, 4 Mar 2016 20:15:17 +0000 (13:15 -0700)]
NVMe: Create discard zero quirk white list
The NVMe specification does not require discarded blocks return zeroes on
read, but provides that behavior as a possibility. Some applications more
efficiently use an SSD if reads on discarded blocks were deterministically
zero, based on the "discard_zeroes_data" queue attribute.
There is no specification defined way to determine device behavior on
discarded blocks, so the driver always left the queue setting disabled. We
can only know behavior based on individual device models, so this patch
adds a flag to the NVMe "quirk" list that vendors may set if they know
their controller works that way. The patch also sets the new flag for one
such known device.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Suggested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Arnd Bergmann [Fri, 4 Mar 2016 23:49:31 +0000 (00:49 +0100)]
nbd: use correct div_s64 helper
The do_div() macro now checks its arguments for the correct type,
and refuses anything other than u64, so we get a warning about
nbd_ioctl passing in an loff_t:
drivers/block/nbd.c: In function '__nbd_ioctl':
drivers/block/nbd.c:757:77: error: comparison of distinct pointer types lacks a cast [-Werror]
This changes the nbd code to use div_s64() instead, which takes
a signed argument.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes:
37091fdd831f ("nbd: Create size change events for userspace")
Signed-off-by: Jens Axboe <axboe@fb.com>
Jens Axboe [Fri, 4 Mar 2016 15:15:48 +0000 (08:15 -0700)]
mtip32xx: remove unneeded variable in mtip_cmd_timeout()
We always return BLK_EH_RESET_TIMER, so no point in storing that in
an integer.
Signed-off-by: Jens Axboe <axboe@fb.com>
Javier González [Thu, 3 Mar 2016 21:47:53 +0000 (14:47 -0700)]
lightnvm: generalize rrpc ppa calculations
In rrpc, some calculations assume a certain configuration (e.g., 1 LUN,
1 sector per page). The reason behind this was that LightNVM used a
simple configuration with QEMU to test core features in the beginning.
This patch relaxes these assumptions and generalizes calculation,
allowing multiple luns to be used.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Matias Bjørling [Sat, 20 Feb 2016 07:52:42 +0000 (08:52 +0100)]
lightnvm: remove struct nvm_dev->total_blocks
The struct nvm_dev->total_blocks was only used for calculating total
sectors. Remove and instead calculate total sectors from the number of
luns and its sectors.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Matias Bjørling [Sat, 20 Feb 2016 07:52:41 +0000 (08:52 +0100)]
lightnvm: rename ->nr_pages to ->nr_sects
The struct rrpc->nr_pages can easily be interpreted as the number of
flash pages allocated to rrpc, while it is the nr_sects. Make sure that
this is reflected from the variable name.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Javier González [Sat, 20 Feb 2016 07:52:40 +0000 (08:52 +0100)]
lightnvm: update closed list outside of intr context
When an I/O finishes, full blocks are moved from the open to the closed
list - a lock is taken to protect the list. This happens at the moment
in the interrupt context, which is not correct.
This patch moves this logic to the block workqueue instead, avoiding
holding a spinlock without interrupt save in an interrupt context.
Signed-off-by: Javier González <javier@cnexlabs.com>
Fixes:
ff0e498bfa18 ("lightnvm: manage open and closed blocks sepa...")
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Konrad Rzeszutek Wilk [Wed, 3 Feb 2016 21:40:05 +0000 (16:40 -0500)]
xen/blback: Fit the important information of the thread in 17 characters
The processes names are truncated to 17, while we had the length
of the process as name 20 - which meant that while we filled
it out with various details - the last 3 characters (which had
the queue number) never surfaced to the user-space.
To simplify this and be able to fit the device name, domain id,
and the queue number we remove the 'blkback' from the name.
Prior to this patch the device name is "blkback.<domid>.<name>"
for example: blkback.8.xvda, blkback.11.hda.
With the multiqueue block backend we add "-%d" for the queue.
But sadly this is already way past the limit so it gets stripped.
Possible solution had been identified by Ian:
http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg03516.html
"
If you are pressed for space then the "xvd" is probably a bit redundant
in a string which starts blkbk.
The guest may not even call the device xvdN (iirc BSD has another
prefix) any how, so having blkback say so seems of limited use anyway.
Since this seems to not include a partition number how does this work in
the split partition scheme? (i.e. one where the guest is given xvda1 and
xvda2 rather than xvda with a partition table)
[It will be 'blkback.8.xvda1', and 'blkback.11.xvda2']
Perhaps something derived from one of the schemes in
http://xenbits.xen.org/docs/unstable/misc/vbd-interface.txt might be a
better fit?
After a bit of discussion (see
http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg01588.html)
we settled on dropping the "blback" part.
This will make it possible to have the <domid>.<name>-<queue>:
[1.xvda-0]
[1.xvda-1]
And we enough space to make it go up to:
[32100.xvdfg9-5]
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Matias Bjørling [Fri, 19 Feb 2016 12:56:58 +0000 (13:56 +0100)]
lightnvm: fold get bb tbl when using dual/quad plane mode
When the media manager runs in dual or quad plane mode, lightnvm
abstracts away plane specific commands. This poses a problem for
get bad block table, as it reports bad blocks per plane, making the
table either two or four times bigger than expected. Fold the bad block
list before returning.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Alan [Fri, 19 Feb 2016 12:56:57 +0000 (13:56 +0100)]
lightnvm: fix up nonsensical configure overrun checking
Instead of checking a constant 0 actually check the space available. Even
better remember to allow for the header and also check the right amount of
space is needed.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Beulich [Wed, 10 Feb 2016 11:18:10 +0000 (04:18 -0700)]
xen-blkback: advertise indirect segment support earlier
There's no reason to defer this until the connect phase, and in fact
there are frontend implementations expecting this to be available
earlier. Move it into the probe function.
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Wed, 10 Feb 2016 11:21:15 +0000 (04:21 -0700)]
xen-blkfront: rename indirect descriptor parameter
"max" is rather ambiguous and carries pretty little meaning, the more
that there are also "max_queues" and "max_ring_page_order". Make this
"max_indirect_segments" instead, and at once change the type from int
to uint (to match the respective variable's type).
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Asai Thambi SP [Thu, 25 Feb 2016 05:21:20 +0000 (21:21 -0800)]
mtip32xx: Cleanup queued requests after surprise removal
Fail all pending requests after surprise removal of a drive.
Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Asai Thambi SP [Thu, 25 Feb 2016 05:21:13 +0000 (21:21 -0800)]
mtip32xx: Implement timeout handler
Added timeout handler. Replaced blk_mq_end_request() with
blk_mq_complete_request() to avoid double completion of a request.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Asai Thambi SP [Thu, 25 Feb 2016 05:18:20 +0000 (21:18 -0800)]
mtip32xx: Handle FTL rebuild failure state during device initialization
Allow device initialization to finish gracefully when it is in
FTL rebuild failure state. Also, recover device out of this state
after successfully secure erasing it.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Asai Thambi SP [Thu, 25 Feb 2016 05:18:10 +0000 (21:18 -0800)]
mtip32xx: Handle safe removal during IO
Flush inflight IOs using fsync_bdev() when the device is safely
removed. Also, block further IOs in device open function.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Asai Thambi SP [Thu, 25 Feb 2016 05:17:47 +0000 (21:17 -0800)]
mtip32xx: Fix for rmmod crash when drive is in FTL rebuild
When FTL rebuild is in progress, alloc_disk() initializes the disk
but device node will be created by add_disk() only after successful
completion of FTL rebuild. So, skip deletion of device node in
removal path when FTL rebuild is in progress.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Asai Thambi SP [Thu, 25 Feb 2016 05:17:32 +0000 (21:17 -0800)]
mtip32xx: Avoid issuing standby immediate cmd during FTL rebuild
Prevent standby immediate command from being issued in remove,
suspend and shutdown paths, while drive is in FTL rebuild process.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Asai Thambi SP [Thu, 25 Feb 2016 05:16:38 +0000 (21:16 -0800)]
mtip32xx: Print exact time when an internal command is interrupted
Print exact time when an internal command is interrupted.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Asai Thambi SP [Thu, 25 Feb 2016 05:16:21 +0000 (21:16 -0800)]
mtip32xx: Remove unwanted code from taskfile error handler
Remove setting and clearing MTIP_PF_EH_ACTIVE_BIT flag in
mtip_handle_tfe() as they are redundant. Also avoid waking
up service thread from mtip_handle_tfe() because it is
already woken up in case of taskfile error.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Asai Thambi SP [Thu, 25 Feb 2016 05:16:00 +0000 (21:16 -0800)]
mtip32xx: Fix broken service thread handling
Service thread does not detect the need for taskfile error hanlding. Fixed the
flag condition to process taskfile error.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Jens Axboe [Thu, 3 Mar 2016 13:52:13 +0000 (06:52 -0700)]
Merge tag 'nbd-for-4.6' of git://git.pengutronix.de/git/mpa/linux-nbd into for-4.6/drivers
NBD for 4.6
Markus writes:
This pull request contains 7 patches for 4.6.
Patch 1 fixes some unnecessarily complicated code I introduced some versions
ago for debugfs.
Patch 2 removes the criticised signal usage within NBD to kill the NBD threads
after a timeout. This code was used for the last years and is now replaced by
simply killing the tcp connection.
Patches 3-6 are some smaller cleanups.
Patch 7 uevents for the userspace. This way udev/systemd can react on connected
NBD devices.
Ming Lin [Fri, 26 Feb 2016 21:24:19 +0000 (13:24 -0800)]
nvme: expose cntlid in sysfs
For NVMe over Fabrics, the cntlid will be used by systemd/udev to
create link to the device, for example,
/dev/disk/by-path/<fabrics-info>-<cntlid>-<namespace> -> /dev/nvme0n1
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Mon, 29 Feb 2016 14:59:47 +0000 (15:59 +0100)]
nvme: return the whole CQE through the request passthrough interface
Both LighNVM and NVMe over Fabrics need to look at more than just the
status and result field.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matias Bj?rling <m@bjorling.me>
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Mon, 29 Feb 2016 14:59:46 +0000 (15:59 +0100)]
nvme: replace the kthread with a per-device watchdog timer
The only work left in the kthread is the periodic health check for each
controller. There is no need to run this from process context or keep
a thread context around for it, so replace it with a simpler timer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Mon, 29 Feb 2016 14:59:45 +0000 (15:59 +0100)]
nvme: don't poll the CQ from the kthread
There is no reason to do unconditional polling of CQs per the NVMe
spec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Mon, 29 Feb 2016 14:59:44 +0000 (15:59 +0100)]
nvme: use a work item to submit async event requests
Use a dedicated work item to submit async event requests instead of the
global kthread. This simplifies the code and reduces the latencies to
resubmit a request once an even notification happened.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Markus Pargmann [Mon, 27 Jul 2015 05:36:49 +0000 (07:36 +0200)]
nbd: Create size change events for userspace
The userspace needs to know when nbd devices are ready for use.
Currently no events are created for the userspace which doesn't work for
systemd.
See the discussion here: https://github.com/systemd/systemd/pull/358
This patch uses a central point to setup the nbd-internal sizes. A ioctl
to set a size does not lead to a visible size change. The size of the
block device will be kept at 0 until nbd is connected. As soon as it
connects, the size will be changed to the real value and a uevent is
created. When disconnecting, the blockdevice is set to 0 size and
another uevent is generated.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Ming Lin [Wed, 10 Feb 2016 18:03:32 +0000 (10:03 -0800)]
nvme: split pci module out of core module
NVMe over Fabrics drivers are going to reuse the core,
so splits nvme.ko into 2 modules:
nvme-core.ko: the core part
nvme.ko: the PCI driver
Export symbols from nvme-core.ko.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lin [Wed, 10 Feb 2016 18:03:31 +0000 (10:03 -0800)]
nvme: split dev_list_lock
Split dev_list_lock into one in the core and one in the PCI driver.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lin [Wed, 10 Feb 2016 18:03:30 +0000 (10:03 -0800)]
nvme: move timeout variables to core.c
These variables are used by PCI driver and will also be used in the
forthcoming NVMe over Fabrics drivers.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Sagi Grimberg [Wed, 10 Feb 2016 18:03:29 +0000 (10:03 -0800)]
nvme/host: reference the fabric module for each bdev open callout
We don't want to be able to unload the fabric driver when we have
openened referenced to our namespaces. Thus, for each nvme_open we
take a reference on the fabric driver and put it in nvme_release.
This behavior is consistent with the scsi model.
This resolves the panic when unloading a fabric module with
mpath holders.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ian Bakshan <ianb@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Sagi Grimberg [Wed, 10 Feb 2016 15:51:15 +0000 (08:51 -0700)]
nvme: Log the ctrl device name instead of the underlying pci device name
Having the ctrl name "nvmeX" seems much more friendly than
the underlying device name. Also, with other nvme transports
such as the soon to come nvme-loop we don't have an underlying
device so it doesn't makes sense to make up one.
In order to help matching an instance name to a pci function,
we add a info print in nvme_probe.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Manually fixed up the hunk in nvme_cancel_queue_ios().
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Tue, 9 Feb 2016 19:44:03 +0000 (12:44 -0700)]
nvme: fix drvdata setup for the nvme device
Pass the right private data to device_create_with_groups from the
beginning, and remove the superflous call to dev_set_drvdata.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Keith Busch [Fri, 18 Dec 2015 00:08:15 +0000 (17:08 -0700)]
NVMe: Fix possible queue use after freed
This notifies blk-mq when the tag set contains a different number of
queues prior to freeing unused ones that the request queue points to.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Keith Busch [Fri, 18 Dec 2015 00:08:14 +0000 (17:08 -0700)]
blk-mq: dynamic h/w context count
The hardware's provided queue count may change at runtime with resource
provisioning. This patch allows a block driver to alter the number of
h/w queues available when its resource count changes.
The main part is a new blk-mq API to request a new number of h/w queues
for a given live tag set. The new API freezes all queues using that set,
then adjusts the allocated count prior to remapping these to CPUs.
The bulk of the rest just shifts where h/w contexts and all their
artifacts are allocated and freed.
The number of max h/w contexts is capped to the number of possible cpus
since there is no use for more than that. As such, all pre-allocated
memory for pointers need to account for the max possible rather than
the initial number of queues.
A side effect of this is that the blk-mq will proceed successfully as
long as it can allocate at least one h/w context. Previously it would
fail request queue initialization if less than the requested number
was allocated.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Dan Streetman [Thu, 14 Jan 2016 18:42:32 +0000 (13:42 -0500)]
nbd: ratelimit error msgs after socket close
Make the "Attempted send on closed socket" error messages generated in
nbd_request_handler() ratelimited.
When the nbd socket is shutdown, the nbd_request_handler() function emits
an error message for every request remaining in its queue. If the queue
is large, this will spam a large amount of messages to the log. There's
no need for a separate error message for each request, so this patch
ratelimits it.
In the specific case this was found, the system was virtual and the error
messages were logged to the serial port, which overwhelmed it.
Fixes:
4d48a542b427 ("nbd: fix I/O hang on disconnected nbds")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Markus Pargmann [Thu, 29 Oct 2015 11:06:15 +0000 (12:06 +0100)]
nbd: Move flag parsing to a function
nbd changes properties of the blockdevice depending on flags that were
received. This patch moves this flag parsing into a separate function
nbd_parse_flags().
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Markus Pargmann [Thu, 29 Oct 2015 11:04:51 +0000 (12:04 +0100)]
nbd: Cleanup reset of nbd and bdev after a disconnect
Group all variables that are reset after a disconnect into reset
functions. This patch adds two of these functions, nbd_reset() and
nbd_bdev_reset().
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Markus Pargmann [Thu, 29 Oct 2015 11:01:34 +0000 (12:01 +0100)]
nbd: Timeouts are not user requested disconnects
It may be useful to know in the client that a connection timed out. The
current code returns success for a timeout.
This patch reports the error code -ETIMEDOUT for a timeout.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Markus Pargmann [Thu, 29 Oct 2015 10:51:16 +0000 (11:51 +0100)]
nbd: Remove signal usage
As discussed on the mailing list, the usage of signals for timeout
handling has a lot of potential issues. The nbd driver used for some
time signals for timeouts. These signals where able to get the threads
out of the blocking socket operations.
This patch removes all signal usage and uses a socket shutdown instead.
The socket descriptor itself is cleared later when the whole nbd device
is closed.
The tasks_lock is removed as we do not depend on this anymore. Instead
a new lock for the socket is introduced so we can safely work with the
socket in the timeout handler outside of the two main threads.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Jan Kara [Tue, 12 Jan 2016 15:24:19 +0000 (16:24 +0100)]
cfq-iosched: Allow parent cgroup to preempt its child
Currently we don't allow sync workload of one cgroup to preempt sync
workload of any other cgroup. This is because we want to achieve service
separation between cgroups. However in cases where cgroup preempting is
ancestor of the current cgroup, there is no need of separation and
idling introduces unnecessary overhead. This hurts for example the case
when workload is isolated within a cgroup but journalling threads are in
root cgroup. Simple way to demostrate the issue is using:
dbench4 -c /usr/share/dbench4/client.txt -t 10 -D /mnt 1
on ext4 filesystem on plain SATA drive (mounted with barrier=0 to make
difference more visible). When all processes are in the root cgroup,
reported throughput is 153.132 MB/sec. When dbench process gets its own
blkio cgroup, reported throughput drops to 26.1006 MB/sec.
Fix the problem by making check in cfq_should_preempt() more benevolent
and allow preemption by ancestor cgroup. This improves the throughput
reported by dbench4 to 48.9106 MB/sec.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Tue, 12 Jan 2016 15:24:17 +0000 (16:24 +0100)]
cfq-iosched: Allow sync noidle workloads to preempt each other
The original idea with preemption of sync noidle queues (introduced in
commit
718eee0579b8 "cfq-iosched: fairness for sync no-idle queues") was
that we service all sync noidle queues together, we don't idle on any of
the queues individually and we idle only if there is no sync noidle
queue to be served. This intention also matches the original test:
if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD
&& new_cfqq->service_tree == cfqq->service_tree)
return true;
However since at that time cfqq->service_tree was not set for idling
queues, this test was unreliable and was replaced in commit
e4a229196a7c
"cfq-iosched: fix no-idle preemption logic" by:
if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD &&
cfqq_type(new_cfqq) == SYNC_NOIDLE_WORKLOAD &&
new_cfqq->service_tree->count == 1)
return true;
That was a reliable test but was actually doing something different -
now we preempt sync noidle queue only if the new queue is the only one
busy in the service tree.
These days cfq queue is kept in service tree even if it is idling and
thus the original check would be safe again. But since we actually check
that cfq queues are in the same cgroup, of the same priority class and
workload type (sync noidle), we know that new_cfqq is fine to preempt
cfqq. So just remove the service tree check.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Tue, 12 Jan 2016 15:24:16 +0000 (16:24 +0100)]
cfq-iosched: Reorder checks in cfq_should_preempt()
Move check for preemption by rt class up. There is no functional change
but it makes arguing about conditions simpler since we can be sure both
cfq queues are from the same ioprio class.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Tue, 12 Jan 2016 15:24:15 +0000 (16:24 +0100)]
cfq-iosched: Don't group_idle if cfqq has big thinktime
There is no point in idling on a cfq group if the only cfq queue that is
there has too big thinktime.
Signed-off-by: Jan Kara <jack@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Markus Pargmann [Sat, 24 Oct 2015 19:15:34 +0000 (21:15 +0200)]
nbd: Fix debugfs error handling
Static checker complains about the implemented error handling. It is
indeed wrong. We don't care about the return values of created debugfs
files.
We only have to check the return values of created dirs for NULL
pointer. If we use a null pointer as parent directory for files, this
may lead to debugfs files in wrong places.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Linus Torvalds [Mon, 1 Feb 2016 02:12:16 +0000 (18:12 -0800)]
Linux 4.5-rc2
Linus Torvalds [Mon, 1 Feb 2016 01:36:45 +0000 (17:36 -0800)]
Merge tag 'usb-4.5-rc2' of git://git./linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some small USB fixes and new device ids for 4.5-rc2. Nothing
major here, full details are in the shortlog, and all of these have
been in linux-next successfully"
* tag 'usb-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: option: fix Cinterion AHxx enumeration
USB: mxu11x0: fix memory leak on usb_serial private data
USB: serial: ftdi_sio: add support for Yaesu SCU-18 cable
USB: serial: option: Adding support for Telit LE922
USB: serial: visor: fix crash on detecting device without write_urbs
USB: visor: fix null-deref at probe
USB: cp210x: add ID for IAI USB to RS485 adaptor
usb: hub: do not clear BOS field during reset device
cdc-acm:exclude Samsung phone 04e8:685d
usb: cdc-acm: send zero packet for intel 7260 modem
usb: cdc-acm: handle unlinked urb in acm read callback
Linus Torvalds [Mon, 1 Feb 2016 01:09:39 +0000 (17:09 -0800)]
Merge tag 'tty-4.5-rc2' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty/serial driver fixes for 4.5-rc2.
They resolve a number of reported problems (the ioctl one specifically
has been pointed out by numerous people) and one patch adds some new
device ids for the 8250_pci driver. All have been in linux-next
successfully"
* tag 'tty-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_pci: Add Intel Broadwell ports
staging/speakup: Use tty_ldisc_ref() for paste kworker
n_tty: Fix unsafe reference to "other" ldisc
tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)
tty: Retry failed reopen if tty teardown in-progress
tty: Wait interruptibly for tty lock on reopen
Linus Torvalds [Mon, 1 Feb 2016 01:00:27 +0000 (17:00 -0800)]
Merge tag 'staging-4.5-rc2' of git://git./linux/kernel/git/gregkh/staging
Pull staging fixes from Greg KH:
"Here are some small staging driver fixes for 4.5-rc2.
One of them predated 4.4-final, but I missed that merge window due to
the holliday. The others fix reported issues that have come up
recently. The tty change is needed for the speakup driver fix and has
the ack of the tty driver maintainer as well, i.e. myself :)
All have been in linux-next with no reported issues"
* tag 'staging-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
Staging: speakup: fix read scrolled-back VT
Staging: speakup: Fix getting port information
Revert "Staging: panel: usleep_range is preferred over udelay"
iio: adis_buffer: Fix out-of-bounds memory access
Linus Torvalds [Mon, 1 Feb 2016 00:55:04 +0000 (16:55 -0800)]
Merge tag 'driver-core-4.5-rc2' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here's a single driver core fix that resolves an issue a lot of users
have been hitting for a while now. It's been tested a lot and has
been in linux-next successfully for a while"
* tag 'driver-core-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
base/platform: Fix platform drivers with no probe callback
Linus Torvalds [Mon, 1 Feb 2016 00:50:31 +0000 (16:50 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fix from Ralf Baechle:
"Just a single revert for a patch which I had upstreamed out of
sequence"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
Revert "MIPS: bcm63xx: nvram: Remove unused bcm63xx_nvram_get_psi_size() function"
Linus Torvalds [Mon, 1 Feb 2016 00:17:19 +0000 (16:17 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A bit on the largish side due to a series of fixes for a regression in
the x86 vector management which was introduced in 4.3. This work was
started in December already, but it took some time to fix all corner
cases and a couple of older bugs in that area which were detected
while at it
Aside of that a few platform updates for intel-mid, quark and UV and
two fixes for in the mm code:
- Use proper types for pgprot values to avoid truncation
- Prevent a size truncation in the pageattr code when setting page
attributes for large mappings"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/mm/pat: Avoid truncation when converting cpa->numpages to address
x86/mm: Fix types used in pgprot cacheability flags translations
x86/platform/quark: Print boundaries correctly
x86/platform/UV: Remove EFI memmap quirk for UV2+
x86/platform/intel-mid: Join string and fix SoC name
x86/platform/intel-mid: Enable 64-bit build
x86/irq: Plug vector cleanup race
x86/irq: Call irq_force_move_complete with irq descriptor
x86/irq: Remove outgoing CPU from vector cleanup mask
x86/irq: Remove the cpumask allocation from send_cleanup_vector()
x86/irq: Clear move_in_progress before sending cleanup IPI
x86/irq: Remove offline cpus from vector cleanup
x86/irq: Get rid of code duplication
x86/irq: Copy vectormask instead of an AND operation
x86/irq: Check vector allocation early
x86/irq: Reorganize the search in assign_irq_vector
x86/irq: Reorganize the return path in assign_irq_vector
x86/irq: Do not use apic_chip_data.old_domain as temporary buffer
x86/irq: Validate that irq descriptor is still active
x86/irq: Fix a race in x86_vector_free_irqs()
...
Linus Torvalds [Sun, 31 Jan 2016 23:49:06 +0000 (15:49 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"The timer departement delivers:
- a regression fix for the NTP code along with a proper selftest
- prevent a spurious timer interrupt in the NOHZ lowres code
- a fix for user space interfaces returning the remaining time on
architectures with CONFIG_TIME_LOW_RES=y
- a few patches to fix COMPILE_TEST fallout"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/nohz: Set the correct expiry when switching to nohz/lowres mode
clocksource: Fix dependencies for archs w/o HAS_IOMEM
clocksource: Select CLKSRC_MMIO where needed
tick/sched: Hide unused oneshot timer code
kselftests: timers: Add adjtimex SETOFFSET validity tests
ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO
itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
hrtimer: Handle remaining time proper for TIME_LOW_RES
clockevents/tcb_clksrc: Prevent disabling an already disabled clock
Linus Torvalds [Sun, 31 Jan 2016 23:44:04 +0000 (15:44 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Three small fixes in the scheduler/core:
- use after free in the numa code
- crash in the numa init code
- a simple spelling fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
pid: Fix spelling in comments
sched/numa: Fix use-after-free bug in the task_numa_compare
sched: Fix crash in sched_init_numa()
Linus Torvalds [Sun, 31 Jan 2016 23:38:27 +0000 (15:38 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"This is much bigger than typical fixes, but Peter found a category of
races that spurred more fixes and more debugging enhancements. Work
started before the merge window, but got finished only now.
Aside of that this contains the usual small fixes to perf and tools.
Nothing particular exciting"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
perf: Remove/simplify lockdep annotation
perf: Synchronously clean up child events
perf: Untangle 'owner' confusion
perf: Add flags argument to perf_remove_from_context()
perf: Clean up sync_child_event()
perf: Robustify event->owner usage and SMP ordering
perf: Fix STATE_EXIT usage
perf: Update locking order
perf: Remove __free_event()
perf/bpf: Convert perf_event_array to use struct file
perf: Fix NULL deref
perf/x86: De-obfuscate code
perf/x86: Fix uninitialized value usage
perf: Fix race in perf_event_exit_task_context()
perf: Fix orphan hole
perf stat: Do not clean event's private stats
perf hists: Fix HISTC_MEM_DCACHELINE width setting
perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
perf tests: Remove wrong semicolon in while loop in CQM test
perf: Synchronously free aux pages in case of allocation failure
...
Linus Torvalds [Sun, 31 Jan 2016 23:29:37 +0000 (15:29 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
"A single commit, which makes the rtmutex.wait_lock an irq safe lock.
This prevents a potential deadlock which can be triggered by the rcu
boosting code from rcu_read_unlock()"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Make wait_lock irq safe
Linus Torvalds [Sun, 31 Jan 2016 22:48:58 +0000 (14:48 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull IRQ fixes from Ingo Molnar:
"Mostly irqchip driver fixes, but also an irq core crash fix and a
build fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mxs: Add missing set_handle_irq()
irqchip/atmel-aic: Fix wrong bit operation for IRQ priority
irqchip/gic-v3-its: Recompute the number of pages on page size change
base: Export platform_msi_domain_[alloc,free]_irqs
of: MSI: Simplify irqdomain lookup
irqdomain: Allow domain lookup with DOMAIN_BUS_WIRED token
irqchip: Fix dependencies for archs w/o HAS_IOMEM
irqchip/s3c24xx: Mark init_eint as __maybe_unused
genirq: Validate action before dereferencing it in handle_irq_event_percpu()
Linus Torvalds [Sun, 31 Jan 2016 22:43:09 +0000 (14:43 -0800)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull debugobjects fix from Ingo Molnar:
"Bump up debugobjects pool limit that bigger s390 systems kept running
into"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Allow bigger number of early boot objects
Linus Torvalds [Sun, 31 Jan 2016 22:38:37 +0000 (14:38 -0800)]
Merge tag 'vfio-v4.5-rc2' of git://github.com/awilliam/linux-vfio
Pull VFIO fix from Alex Williamson:
"Use alternate group tracking for no-iommu"
* tag 'vfio-v4.5-rc2' of git://github.com/awilliam/linux-vfio:
vfio/noiommu: Don't use iommu_present() to track fake groups
Linus Torvalds [Sun, 31 Jan 2016 22:29:52 +0000 (14:29 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Here are two I2C driver regression fixes. piix4 gets a larger
overhaul fixing the latest refactoring and also an older known issue
as well. designware-pci gets a fix for a bad merge conflict
resolution"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: piix4: don't regress on bus names
i2c: designware-pci: use IRQF_COND_SUSPEND flag
i2c: piix4: Fully initialize SB800 before it is registered
i2c: piix4: Fix SB800 locking
Zhen Lei [Sat, 30 Jan 2016 02:04:17 +0000 (10:04 +0800)]
pid: Fix spelling in comments
Accidentally discovered this typo when I studied this module.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Xinwei Hu <huxinwei@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1454119457-11272-1-git-send-email-thunder.leizhen@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Sat, 30 Jan 2016 08:15:49 +0000 (09:15 +0100)]
Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix 'perf stat' stddev reporting due to mistakenly cleaning event
private stats (Jiri Olsa)
- Fix 'perf test CQM' endless loop detected by 'gcc6 -Wmisleading-indentation'
(Markus Trippelsdorf)
- Fix behaviour of Shift-Tab when nothing is focussed in the annotate TUI browser,
detected with gcc6 -Wmisleading-indentation (Markus Trippelsdorf)
- Fix mem data cacheline hists browser width setting for unresolved
addresses (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sat, 30 Jan 2016 00:16:12 +0000 (16:16 -0800)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Just one fix for a -fstack-protector-strong problem from Kees Cook,
and adding the new copy_file_range syscall"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: wire up copy_file_range() syscall
ARM: 8500/1: fix atags_to_fdt with stack-protector-strong
Linus Torvalds [Sat, 30 Jan 2016 00:10:16 +0000 (16:10 -0800)]
Merge tag 'powerpc-4.5-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Wire up copy_file_range() syscall from Chandan Rajendra
- Simplify module TOC handling from Alan Modra
- Remove newly added extra definition of pmd_dirty from Stephen Rothwell
- Allow user space to map rtas_rmo_buf from Vasant Hegde
- Fix PE location code from Gavin Shan
- Remove PPMU_HAS_SSLOT flag for Power8 from Madhavan Srinivasan
- Fixup _HPAGE_CHG_MASK from Aneesh Kumar K.V
* tag 'powerpc-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fixup _HPAGE_CHG_MASK
powerpc/perf: Remove PPMU_HAS_SSLOT flag for Power8
powerpc/eeh: Fix PE location code
powerpc/mm: Allow user space to map rtas_rmo_buf
powerpc: Remove newly added extra definition of pmd_dirty
powerpc: Simplify module TOC handling
powerpc: Wire up copy_file_range() syscall
Linus Torvalds [Sat, 30 Jan 2016 00:05:18 +0000 (16:05 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
"An optimization for irq-restore, the SSM instruction is quite a bit
slower than an if-statement and a STOSM.
The copy_file_range system all is added.
Cleanup for PCI and CIO.
And a couple of bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cio: update measurement characteristics
s390/cio: ensure consistent measurement state
s390/cio: fix measurement characteristics memleak
s390/zcrypt: Fix cryptographic device id in kernel messages
s390/pci: remove iomap sanity checks
s390/pci: set error state for unusable functions
s390/pci: fix bar check
s390/pci: resize iomap
s390/pci: improve ZPCI_* macros
s390/pci: provide ZPCI_ADDR macro
s390/pci: adjust IOMAP_MAX_ENTRIES
s390/numa: move numa_init_late() from device to arch_initcall
s390: remove all usages of PSW_ADDR_INSN
s390: remove all usages of PSW_ADDR_AMODE
s390: wire up copy_file_range syscall
s390: remove superfluous memblock_alloc() return value checks
s390/numa: allocate memory with correct alignment
s390/irqflags: optimize irq restore
s390/mm: use TASK_MAX_SIZE where applicable
Linus Torvalds [Fri, 29 Jan 2016 23:46:49 +0000 (15:46 -0800)]
Merge branch 'for-linus-4.5' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Dave had a small collection of fixes to the new free space tree code,
one of which was keeping our sysfs files more up to date with feature
bits as different things get enabled (lzo, raid5/6, etc).
I should have kept the sysfs stuff for rc3, since we always manage to
trip over something. This time it was GFP_KERNEL from somewhere that
is NOFS only. Instead of rebasing it out I've put a revert in, and
we'll fix it properly for rc3.
Otherwise, Filipe fixed a btrfs DIO race and Qu Wenruo fixed up a
use-after-free in our tracepoints that Dave Jones reported"
* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Revert "btrfs: synchronize incompat feature bits with sysfs files"
btrfs: don't use GFP_HIGHMEM for free-space-tree bitmap kzalloc
btrfs: sysfs: check initialization state before updating features
Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()"
btrfs: async-thread: Fix a use-after-free error for trace
Btrfs: fix race between fsync and lockless direct IO writes
btrfs: add free space tree to the cow-only list
btrfs: add free space tree to lockdep classes
btrfs: tweak free space tree bitmap allocation
btrfs: tests: switch to GFP_KERNEL
btrfs: synchronize incompat feature bits with sysfs files
btrfs: sysfs: introduce helper for syncing bits with sysfs files
btrfs: sysfs: add free-space-tree bit attribute
btrfs: sysfs: fix typo in compat_ro attribute definition
Linus Torvalds [Fri, 29 Jan 2016 23:40:59 +0000 (15:40 -0800)]
Merge tag 'pm+acpi-4.5-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are: cpuidle fixes (including one fix for a recent regression),
cpufreq fixes (including fixes for two issues introduced during the
4.2 cycle), generic power domains framework fixes (two locking fixes
and one cleanup), one locking fix in the ACPI-based PCI hotplug
framework (ACPIPHP), removal of one ACPI backlight blacklist entry
that isn't necessary any more and a PM Kconfig cleanup.
Specifics:
- Fix a recent cpuidle core regression that broke suspend-to-idle on
all systems where cpuidle drivers don't provide ->enter_freeze
callbacks for any states (Sudeep Holla).
- Drop an unnecessary symbol definition from the cpuidle core code
handling coupled CPU cores (Anders Roxell).
- Fix a race condition related to governor initialization and removal
in the cpufreq core (Viresh Kumar).
- Clean up the cpufreq core to use list_is_last() for checking if the
given policy object is the last element of a list instead of open
coding that in a clumsy way (Gautham R Shenoy).
- Fix compiler warnings in the pxa2xx and cpufreq-dt cpufreq drivers
(Arnd Bergmann).
- Fix two locking issues and clean up a comment in the generic power
domains framework (Ulf Hansson, Marek Szyprowski, Moritz Fischer).
- Fix the error code path of one function in the ACPI-based PCI
hotplug framework (ACPIPHP) that forgets to release a lock acquired
previously (Insu Yun).
- Drop the ACPI backlight blacklist entry for Dell Inspiron 5737 that
is not necessary any more (Hans de Goede).
- Clean up the top-level PM Kconfig to stop requiring APM emulation
to depend on PM which in fact isn't necessary (Arnd Bergmann)"
* tag 'pm+acpi-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: cpufreq-dt: avoid uninitialized variable warnings:
cpufreq: pxa2xx: fix pxa_cpufreq_change_voltage prototype
PM: APM_EMULATION does not depend on PM
cpufreq: Use list_is_last() to check last entry of the policy list
cpufreq: Fix NULL reference crash while accessing policy->governor_data
cpuidle: coupled: remove unused define cpuidle_coupled_lock
PM / Domains: Fix typo in comment
PM / Domains: Fix potential deadlock while adding/removing subdomains
ACPI / PCI / hotplug: unlock in error path in acpiphp_enable_slot()
ACPI: Revert "ACPI / video: Add Dell Inspiron 5737 to the blacklist"
cpuidle: fix fallback mechanism for suspend to idle in absence of enter_freeze
PM / domains: fix lockdep issue for all subdomains
Linus Torvalds [Fri, 29 Jan 2016 23:19:42 +0000 (15:19 -0800)]
Merge branch 'stable/for-linus-4.5' of git://git./linux/kernel/git/konrad/swiotlb
Pull swiotlb patchlet from Konrad Rzeszutek Wilk:
"One trivial patch.
Another patch (from Fengguang) is already in your tree courtesy of
Andrew Morton - but I would prefer not to rebase my tree. Hence the
diff is very small"
* 'stable/for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: Make linux/swiotlb.h standalone includible
MAINTAINERS: add git URL for swiotlb
Linus Torvalds [Fri, 29 Jan 2016 23:13:48 +0000 (15:13 -0800)]
Merge branch 'stable/for-linus-4.5' of git://git./linux/kernel/git/konrad/mm
Pull cleancache cleanups from Konrad Rzeszutek Wilk:
"Simple cleanups"
* 'stable/for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
include/linux/cleancache.h: Clean up code
cleancache: constify cleancache_ops structure
Linus Torvalds [Fri, 29 Jan 2016 23:05:49 +0000 (15:05 -0800)]
Merge tag 'iommu-fixes-v4.5-rc1' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Five patches queued up:
- Two patches for the AMD and Intel IOMMU drivers to fix alias
handling and ATS handling.
- Fix build error with arm io-pgtable code
- Two documentation fixes"
* tag 'iommu-fixes-v4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Update struct iommu_ops comments
iommu/vt-d: Fix link to Intel IOMMU Specification
iommu/amd: Correct the wrong setting of alias DTE in do_attach
iommu/vt-d: Don't skip PCI devices when disabling IOTLB
iommu/io-pgtable-arm: Fix io-pgtable-arm build failure
Linus Torvalds [Fri, 29 Jan 2016 21:20:39 +0000 (13:20 -0800)]
Merge tag 'hwmon-for-linus-v4.5-rc2' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Use bit mask to calculate tdp limit in fam15h_power driver
- Black-list Dell Studio XPS 8000 in dell-smm driver
* tag 'hwmon-for-linus-v4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (fam15h_power) Add bit masking for tdp_limit
hwmon: (dell-smm) Blacklist Dell Studio XPS 8000
Linus Torvalds [Fri, 29 Jan 2016 21:14:45 +0000 (13:14 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four fixes: one to try to fix our repeated intermittent crashes in
suspend/resume, one to correct a regression in the optimal I/O size
reporting and a couple for randconfig build failures in the hisi_sas
driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
SCSI: fix crashes in sd and sr runtime PM
sd: Optimal I/O size is in bytes, not sectors
hisi_sas: Restrict SCSI_HISI_SAS to arm64
hisi_sas: SCSI_HISI_SAS should depend on HAS_DMA
Linus Torvalds [Fri, 29 Jan 2016 20:56:08 +0000 (12:56 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fix from Jens Axboe:
"This just contains the fix for the split issue that we had in -rc1.
It's been well tested at this point, so let's get it in mainline so we
don't have the same split issue for -rc2"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix bio splitting on max sectors
Rafael J. Wysocki [Fri, 29 Jan 2016 20:45:17 +0000 (21:45 +0100)]
Merge branches 'pm-cpuidle', 'pm-cpufreq', 'pm-domains' and 'pm-sleep'
* pm-cpuidle:
cpuidle: coupled: remove unused define cpuidle_coupled_lock
cpuidle: fix fallback mechanism for suspend to idle in absence of enter_freeze
* pm-cpufreq:
cpufreq: cpufreq-dt: avoid uninitialized variable warnings:
cpufreq: pxa2xx: fix pxa_cpufreq_change_voltage prototype
cpufreq: Use list_is_last() to check last entry of the policy list
cpufreq: Fix NULL reference crash while accessing policy->governor_data
* pm-domains:
PM / Domains: Fix typo in comment
PM / Domains: Fix potential deadlock while adding/removing subdomains
PM / domains: fix lockdep issue for all subdomains
* pm-sleep:
PM: APM_EMULATION does not depend on PM
Rafael J. Wysocki [Fri, 29 Jan 2016 20:44:53 +0000 (21:44 +0100)]
Merge branches 'acpi-video' and 'acpi-hotplug'
* acpi-video:
ACPI: Revert "ACPI / video: Add Dell Inspiron 5737 to the blacklist"
* acpi-hotplug:
ACPI / PCI / hotplug: unlock in error path in acpiphp_enable_slot()
Linus Torvalds [Fri, 29 Jan 2016 20:34:39 +0000 (12:34 -0800)]
Merge tag 'sound-4.5-rc2' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"There are a few fixes in ALSA core for bugs that have been spotted by
fuzzer. Also a temporary workaround for PowerPC (and possibly other)
builds with incompatible ioctls was applied to compress API.
Other than that, a few trivial fixes and quirks for FireWire BeBoB,
USB-audio and HD-audio are found, too"
* tag 'sound-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - disable dynamic clock gating on Broxton before reset
ALSA: hda - Add new GPU codec ID 0x10de0083 to snd-hda
ALSA: dummy: Disable switching timer backend via sysfs
ALSA: timer: fix SND_PCM_TIMER Kconfig text
ALSA: Add missing dependency on CONFIG_SND_TIMER
ALSA: bebob: Use a signed return type for get_formation_index
ALSA: usb-audio: Fix TEAC UD-501/UD-503/NT-503 usb delay
ALSA: compress: Disable GET_CODEC_CAPS ioctl for some architectures
ALSA: seq: Degrade the error message for too many opens
ALSA: seq: Fix incorrect sanity check at snd_seq_oss_synth_cleanup()
Linus Torvalds [Fri, 29 Jan 2016 20:28:45 +0000 (12:28 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Summary:
- Misc amdgpu/radeon fixes
- VC4 build fix
- vmwgfx fix
- misc rockchip fixes
The etnaviv guys had an API feature they wanted in their first
release, so I've merged that with their fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (41 commits)
drm/vmwgfx: respect 'nomodeset'
drm/amdgpu: only move pt bos in LRU list on success
drm/radeon: fix DP audio support for APU with DCE4.1 display engine
drm/radeon: Add a common function for DFS handling
drm/radeon: cleaned up VCO output settings for DP audio
drm/amd/powerplay: Update SMU firmware loading for Stoney
drm/etnaviv: call correct function when trying to vmap a DMABUF
drm/etnaviv: rename etnaviv_gem_vaddr to etnaviv_gem_vmap
drm/etnaviv: fix get pages error path in etnaviv_gem_vaddr
drm/etnaviv: fix memory leak in IOMMU init path
drm/etnaviv: add further minor features and varyings count
drm/etnaviv: add helper for comparing model/revision IDs
drm/etnaviv: add helper to extract bitfields
drm/etnaviv: use defined constants for the chip model
drm/etnaviv: update common and state_hi xml.h files
drm/etnaviv: ignore VG GPUs with FE2.0
drm/amdgpu: don't init fbdev if we don't have any connectors
drm/radeon: only init fbdev if we have connectors
drm/radeon: Ensure radeon bo is unreserved in radeon_gem_va_ioctl
drm/etnaviv: fix failure path if model is zero
...
Linus Torvalds [Fri, 29 Jan 2016 20:24:05 +0000 (12:24 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull security layer fixes from James Morris:
"The keys patch fixes a bug which is breaking kerberos, and the seccomp
fix addresses a no_new_privs bypass"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: Only apply KEY_FLAG_KEEP to a key if a parent keyring has it set
seccomp: always propagate NO_NEW_PRIVS on tsync
Chris Mason [Fri, 29 Jan 2016 16:19:37 +0000 (08:19 -0800)]
Revert "btrfs: synchronize incompat feature bits with sysfs files"
This reverts commit
14e46e04958df740c6c6a94849f176159a333f13.
This ends up doing sysfs operations from deep in balance (where we
should be GFP_NOFS) and under heavy balance load, we're making races
against sysfs internals.
Revert it for now while we figure things out.
Signed-off-by: Chris Mason <clm@fb.com>
Mika Westerberg [Fri, 29 Jan 2016 14:49:47 +0000 (16:49 +0200)]
serial: 8250_pci: Add Intel Broadwell ports
Some recent (early 2015) macbooks have Intel Broadwell where LPSS UARTs are
PCI enumerated instead of ACPI. The LPSS UART block is pretty much same as
used on Intel Baytrail so we can reuse the existing Baytrail setup code.
Add both Broadwell LPSS UART ports to the list of supported devices.
Signed-off-by: Leif Liddy <leif.liddy@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Matt Fleming [Fri, 29 Jan 2016 11:36:10 +0000 (11:36 +0000)]
x86/mm/pat: Avoid truncation when converting cpa->numpages to address
There are a couple of nasty truncation bugs lurking in the pageattr
code that can be triggered when mapping EFI regions, e.g. when we pass
a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting
left by PAGE_SHIFT will truncate the resultant address to 32-bits.
Viorel-Cătălin managed to trigger this bug on his Dell machine that
provides a ~5GB EFI region which requires
1236992 pages to be mapped.
When calling populate_pud() the end of the region gets calculated
incorrectly in the following buggy expression,
end = start + (cpa->numpages << PAGE_SHIFT);
And only 188416 pages are mapped. Next, populate_pud() gets invoked
for a second time because of the loop in __change_page_attr_set_clr(),
only this time no pages get mapped because shifting the remaining
number of pages (
1048576) by PAGE_SHIFT is zero. At which point the
loop in __change_page_attr_set_clr() spins forever because we fail to
map progress.
Hitting this bug depends very much on the virtual address we pick to
map the large region at and how many pages we map on the initial run
through the loop. This explains why this issue was only recently hit
with the introduction of commit
a5caa209ba9c ("x86/efi: Fix boot crash by mapping EFI memmap
entries bottom-up at runtime, instead of top-down")
It's interesting to note that safe uses of cpa->numpages do exist in
the pageattr code. If instead of shifting ->numpages we multiply by
PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and
so the result is unsigned long.
To avoid surprises when users try to convert very large cpa->numpages
values to addresses, change the data type from 'int' to 'unsigned
long', thereby making it suitable for shifting by PAGE_SHIFT without
any type casting.
The alternative would be to make liberal use of casting, but that is
far more likely to cause problems in the future when someone adds more
code and fails to cast properly; this bug was difficult enough to
track down in the first place.
Reported-and-tested-by: Viorel-Cătălin Răpițeanu <rapiteanu.catalin@gmail.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131
Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Libin Yang [Fri, 29 Jan 2016 12:39:09 +0000 (20:39 +0800)]
ALSA: hda - disable dynamic clock gating on Broxton before reset
On Broxton, to make sure the reset controller works properly,
MISCBDCGE bit (bit 6) in CGCTL (0x48) of PCI configuration space
need be cleared before reset and set back to 1 after reset.
Otherwise, it may prevent the CORB/RIRB logic from being reset.
Signed-off-by: Libin Yang <libin.yang@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Magnus Damm [Tue, 19 Jan 2016 05:28:48 +0000 (14:28 +0900)]
iommu: Update struct iommu_ops comments
Update the comments around struct iommu_ops to match
current state and fix a few typos while at it.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Michael S. Tsirkin [Tue, 26 Jan 2016 16:33:04 +0000 (18:33 +0200)]
iommu/vt-d: Fix link to Intel IOMMU Specification
Looks like the VT-d spec at intel.com got moved. Update the link.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Baoquan He [Wed, 20 Jan 2016 14:01:19 +0000 (22:01 +0800)]
iommu/amd: Correct the wrong setting of alias DTE in do_attach
In below commit alias DTE is set when its peripheral is
setting DTE. However there's a code bug here to wrongly
set the alias DTE, correct it in this patch.
commit
e25bfb56ea7f046b71414e02f80f620deb5c6362
Author: Joerg Roedel <jroedel@suse.de>
Date: Tue Oct 20 17:33:38 2015 +0200
iommu/amd: Set alias DTE in do_attach/do_detach
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Mark Hounschell <markh@compro.net>
Cc: stable@vger.kernel.org # v4.4
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Jeremy McNicoll [Fri, 15 Jan 2016 05:33:06 +0000 (21:33 -0800)]
iommu/vt-d: Don't skip PCI devices when disabling IOTLB
Fix a simple typo when disabling IOTLB on PCI(e) devices.
Fixes:
b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS")
Cc: stable@vger.kernel.org # v4.4
Signed-off-by: Jeremy McNicoll <jmcnicol@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Oleksij Rempel [Fri, 29 Jan 2016 09:57:53 +0000 (10:57 +0100)]
irqchip/mxs: Add missing set_handle_irq()
The rework of the driver missed to move the call to set_handle_irq() into
asm9260_of_init(). As a consequence no interrupt entry point is installed and
no interrupts are delivered
Solution is simple: Install the interrupt entry handler.
Fixes:
7e4ac676ee ("irqchip/mxs: Add Alphascale ASM9260 support")
Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Cc: kernel@pengutronix.de
Cc: jason@lakedaemon.net
Cc: marc.zyngier@arm.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1454061473-24957-1-git-send-email-linux@rempel-privat.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Lada Trimasova [Wed, 27 Jan 2016 11:10:32 +0000 (11:10 +0000)]
iommu/io-pgtable-arm: Fix io-pgtable-arm build failure
Trying to build a kernel for ARC with both options CONFIG_COMPILE_TEST
and CONFIG_IOMMU_IO_PGTABLE_LPAE enabled (e.g. as a result of "make
allyesconfig") results in the following build failure:
| CC drivers/iommu/io-pgtable-arm.o
| linux/drivers/iommu/io-pgtable-arm.c: In
| function ‘__arm_lpae_alloc_pages’:
| linux/drivers/iommu/io-pgtable-arm.c:221:3:
| error: implicit declaration of function ‘dma_map_single’
| [-Werror=implicit-function-declaration]
| dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
| ^
| linux/drivers/iommu/io-pgtable-arm.c:221:42:
| error: ‘DMA_TO_DEVICE’ undeclared (first use in this function)
| dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
| ^
Since IOMMU_IO_PGTABLE_LPAE depends on DMA API, io-pgtable-arm.c should
include linux/dma-mapping.h. This fixes the reported failure.
Cc: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Lada Trimasova <ltrimas@synopsys.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Jean Delvare [Wed, 27 Jan 2016 13:40:33 +0000 (14:40 +0100)]
i2c: piix4: don't regress on bus names
The I2C bus names are supposed to be stable as they can be used by
userspace to uniquely identify a specific I2C bus. So restore the
original names for all legacy (pre-SB800) devices.
For SB800 devices and later, improve the names. "SDA" refers to the
serial data pin of each SMBus port, it's an implementation detail the
user doesn't need to know. Use "port" instead, which is easier to
understand.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Christian Fetzer <fetzer.ch@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Peter Zijlstra [Tue, 26 Jan 2016 14:25:15 +0000 (15:25 +0100)]
perf: Remove/simplify lockdep annotation
Now that the perf_event_ctx_lock_nested() call has moved from
put_event() into perf_event_release_kernel() the first reason is no
longer valid as that can no longer happen.
The second reason seems to have been invalidated when Al Viro made fput()
unconditionally async in the following commit:
4a9d4b024a31 ("switch fput to task_work_add")
such that munmap()->fput()->release()->perf_release() would no longer happen.
Therefore, remove the annotation. This should increase the efficiency
of lockdep coverage of perf locking.
Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 15 Jan 2016 14:07:41 +0000 (16:07 +0200)]
perf: Synchronously clean up child events
The orphan cleanup workqueue doesn't always catch orphans, for example,
if they never schedule after they are orphaned. IOW, the event leak is
still very real. It also wouldn't work for kernel counters.
Doing it synchonously is a little hairy due to lock inversion issues,
but is made to work.
Patch based on work by Alexander Shishkin.
Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 26 Jan 2016 13:55:02 +0000 (14:55 +0100)]
perf: Untangle 'owner' confusion
There are two concepts of owner wrt an event and they are conflated:
- event::owner / event::owner_list,
used by prctl(.option = PR_TASK_PERF_EVENTS_{EN,DIS}ABLE).
- the 'owner' of the event object, typically the file descriptor.
Currently these two concepts are conflated, which gives trouble with
scm_rights passing of file descriptors. Passing the event and then
closing the creating task would render the event 'orphan' and would
have it cleared out. Unlikely what is expectd.
This patch untangles these two concepts by using PERF_EVENT_STATE_EXIT
to denote the second type.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 26 Jan 2016 12:09:48 +0000 (13:09 +0100)]
perf: Add flags argument to perf_remove_from_context()
In preparation to adding more options, convert the boolean argument
into a flags word.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 26 Jan 2016 12:06:56 +0000 (13:06 +0100)]
perf: Clean up sync_child_event()
sync_child_event() has outgrown its purpose, it does far too much.
Bring it back to its named purpose.
Rename __perf_event_exit_task() to perf_event_exit_event() to better
reflect what it does and move the event->state assignment under the
ctx->lock, like state changes ought to be.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 26 Jan 2016 11:30:14 +0000 (12:30 +0100)]
perf: Robustify event->owner usage and SMP ordering
Use smp_store_release() to clear event->owner and
lockless_dereference() to observe it. Further use READ_ONCE() for all
lockless reads.
This changes perf_remove_from_owner() to leave event->owner cleared.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 26 Jan 2016 11:17:08 +0000 (12:17 +0100)]
perf: Fix STATE_EXIT usage
We should never attempt to enable a STATE_EXIT event.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 26 Jan 2016 11:15:37 +0000 (12:15 +0100)]
perf: Update locking order
Update the locking order to note that ctx::lock nests inside of
child_mutex, as per:
perf_ioctl(): ctx::mutex
-> perf_event_for_each(): event::child_mutex
-> _perf_event_enable(): ctx::lock
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>