Kiyoshi Ueda [Mon, 22 Jun 2009 09:12:36 +0000 (10:12 +0100)]
dm: do not set QUEUE_ORDERED_DRAIN if request based
Request-based dm doesn't have barrier support yet.
So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm.
Since the device type is decided at the first table loading time,
the flag set is deferred until then.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Kiyoshi Ueda [Mon, 22 Jun 2009 09:12:36 +0000 (10:12 +0100)]
dm: enable request based option
This patch enables request-based dm.
o Request-based dm and bio-based dm coexist, since there are
some target drivers which are more fitting to bio-based dm.
Also, there are other bio-based devices in the kernel
(e.g. md, loop).
Since bio-based device can't receive struct request,
there are some limitations on device stacking between
bio-based and request-based.
type of underlying device
bio-based request-based
----------------------------------------------
bio-based OK OK
request-based -- OK
The device type is recognized by the queue flag in the kernel,
so dm follows that.
o The type of a dm device is decided at the first table binding time.
Once the type of a dm device is decided, the type can't be changed.
o Mempool allocations are deferred to at the table loading time, since
mempools for request-based dm are different from those for bio-based
dm and needed mempool type is fixed by the type of table.
o Currently, request-based dm supports only tables that have a single
target. To support multiple targets, we need to support request
splitting or prevent bio/request from spanning multiple targets.
The former needs lots of changes in the block layer, and the latter
needs that all target drivers support merge() function.
Both will take a time.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Kiyoshi Ueda [Mon, 22 Jun 2009 09:12:35 +0000 (10:12 +0100)]
dm: prepare for request based option
This patch adds core functions for request-based dm.
When struct mapped device (md) is initialized, md->queue has
an I/O scheduler and the following functions are used for
request-based dm as the queue functions:
make_request_fn: dm_make_request()
pref_fn: dm_prep_fn()
request_fn: dm_request_fn()
softirq_done_fn: dm_softirq_done()
lld_busy_fn: dm_lld_busy()
Actual initializations are done in another patch (PATCH 2).
Below is a brief summary of how request-based dm behaves, including:
- making request from bio
- cloning, mapping and dispatching request
- completing request and bio
- suspending md
- resuming md
bio to request
==============
md->queue->make_request_fn() (dm_make_request()) calls __make_request()
for a bio submitted to the md.
Then, the bio is kept in the queue as a new request or merged into
another request in the queue if possible.
Cloning and Mapping
===================
Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()),
when requests are dispatched after they are sorted by the I/O scheduler.
dm_request_fn() checks busy state of underlying devices using
target's busy() function and stops dispatching requests to keep them
on the dm device's queue if busy.
It helps better I/O merging, since no merge is done for a request
once it is dispatched to underlying devices.
Actual cloning and mapping are done in dm_prep_fn() and map_request()
called from dm_request_fn().
dm_prep_fn() clones not only request but also bios of the request
so that dm can hold bio completion in error cases and prevent
the bio submitter from noticing the error.
(See the "Completion" section below for details.)
After the cloning, the clone is mapped by target's map_rq() function
and inserted to underlying device's queue using
blk_insert_cloned_request().
Completion
==========
Request completion can be hooked by rq->end_io(), but then, all bios
in the request will have been completed even error cases, and the bio
submitter will have noticed the error.
To prevent the bio completion in error cases, request-based dm clones
both bio and request and hooks both bio->bi_end_io() and rq->end_io():
bio->bi_end_io(): end_clone_bio()
rq->end_io(): end_clone_request()
Summary of the request completion flow is below:
blk_end_request() for a clone request
=> blk_update_request()
=> bio->bi_end_io() == end_clone_bio() for each clone bio
=> Free the clone bio
=> Success: Complete the original bio (blk_update_request())
Error: Don't complete the original bio
=> blk_finish_request()
=> rq->end_io() == end_clone_request()
=> blk_complete_request()
=> dm_softirq_done()
=> Free the clone request
=> Success: Complete the original request (blk_end_request())
Error: Requeue the original request
end_clone_bio() completes the original request on the size of
the original bio in successful cases.
Even if all bios in the original request are completed by that
completion, the original request must not be completed yet to keep
the ordering of request completion for the stacking.
So end_clone_bio() uses blk_update_request() instead of
blk_end_request().
In error cases, end_clone_bio() doesn't complete the original bio.
It just frees the cloned bio and gives over the error handling to
end_clone_request().
end_clone_request(), which is called with queue lock held, completes
the clone request and the original request in a softirq context
(dm_softirq_done()), which has no queue lock, to avoid a deadlock
issue on submission of another request during the completion:
- The submitted request may be mapped to the same device
- Request submission requires queue lock, but the queue lock
has been held by itself and it doesn't know that
The clone request has no clone bio when dm_softirq_done() is called.
So target drivers can't resubmit it again even error cases.
Instead, they can ask dm core for requeueing and remapping
the original request in that cases.
suspend
=======
Request-based dm uses stopping md->queue as suspend of the md.
For noflush suspend, just stops md->queue.
For flush suspend, inserts a marker request to the tail of md->queue.
And dispatches all requests in md->queue until the marker comes to
the front of md->queue. Then, stops dispatching request and waits
for the all dispatched requests to complete.
After that, completes the marker request, stops md->queue and
wake up the waiter on the suspend queue, md->wait.
resume
======
Starts md->queue.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Jonthan Brassow [Mon, 22 Jun 2009 09:12:35 +0000 (10:12 +0100)]
dm raid1: add userspace log
This patch contains a device-mapper mirror log module that forwards
requests to userspace for processing.
The structures used for communication between kernel and userspace are
located in include/linux/dm-log-userspace.h. Due to the frequency,
diversity, and 2-way communication nature of the exchanges between
kernel and userspace, 'connector' was chosen as the interface for
communication.
The first log implementations written in userspace - "clustered-disk"
and "clustered-core" - support clustered shared storage. A userspace
daemon (in the LVM2 source code repository) uses openAIS/corosync to
process requests in an ordered fashion with the rest of the nodes in the
cluster so as to prevent log state corruption. Other implementations
with no association to LVM or openAIS/corosync, are certainly possible.
(Imagine if two machines are writing to the same region of a mirror.
They would both mark the region dirty, but you need a cluster-aware
entity that can handle properly marking the region clean when they are
done. Otherwise, you might clear the region when the first machine is
done, not the second.)
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Mon, 22 Jun 2009 09:12:34 +0000 (10:12 +0100)]
dm: calculate queue limits during resume not load
Currently, device-mapper maintains a separate instance of 'struct
queue_limits' for each table of each device. When the configuration of
a device is to be changed, first its table is loaded and this structure
is populated, then the device is 'resumed' and the calculated
queue_limits are applied.
This places restrictions on how userspace may process related devices,
where it is often advantageous to 'load' tables for several devices
at once before 'resuming' them together. As the new queue_limits
only take effect after the 'resume', if they are changing and one
device uses another, the latter must be 'resumed' before the former
may be 'loaded'.
This patch moves the calculation of these queue_limits out of
the 'load' operation into 'resume'. Since we are no longer
pre-calculating this struct, we no longer need to maintain copies
within our dm structs.
dm_set_device_limits() now passes the 'start' of the device's
data area (aka pe_start) as the 'offset' to blk_stack_limits().
init_valid_queue_limits() is replaced by blk_set_default_limits().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: martin.petersen@oracle.com
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Mon, 22 Jun 2009 09:12:33 +0000 (10:12 +0100)]
dm log: fix create_log_context to use logical_block_size of log device
create_log_context() must use the logical_block_size from the log disk,
where the I/O happens, not the target's logical_block_size.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Mon, 22 Jun 2009 09:12:33 +0000 (10:12 +0100)]
dm target:s introduce iterate devices fn
Add .iterate_devices to 'struct target_type' to allow a function to be
called for all devices in a DM target. Implemented it for all targets
except those in dm-snap.c (origin and snapshot).
(The raid1 version number jumps to 1.12 because we originally reserved
1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors'
instead.)
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: martin.petersen@oracle.com
Mike Snitzer [Mon, 22 Jun 2009 09:12:32 +0000 (10:12 +0100)]
dm table: establish queue limits by copying table limits
Copy the table's queue_limits to the DM device's request_queue. This
properly initializes the queue's topology limits and also avoids having
to track the evolution of 'struct queue_limits' in
dm_table_set_restrictions()
Also fixes a bug that was introduced in dm_table_set_restrictions() via
commit
ae03bf639a5027d27270123f5f6e3ee6a412781d. In addition to
establishing 'bounce_pfn' in the queue's limits blk_queue_bounce_limit()
also performs an allocation to setup the ISA DMA pool. This allocation
resulted in "sleeping function called from invalid context" when called
from dm_table_set_restrictions().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Mon, 22 Jun 2009 09:12:32 +0000 (10:12 +0100)]
dm table: replace struct io_restrictions with struct queue_limits
Use blk_stack_limits() to stack block limits (including topology) rather
than duplicate the equivalent within Device Mapper.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Mon, 22 Jun 2009 09:12:31 +0000 (10:12 +0100)]
dm table: validate device logical_block_size
Impose necessary and sufficient conditions on a devices's table such
that any incoming bio which respects its logical_block_size can be
processed successfully.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Mon, 22 Jun 2009 09:12:30 +0000 (10:12 +0100)]
dm table: ensure targets are aligned to logical_block_size
Ensure I/O is aligned to the logical block size of target devices.
Rename check_device_area() to device_area_is_valid() for clarity and
establish the device limits including the logical block size prior to
calling it.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Milan Broz [Mon, 22 Jun 2009 09:12:30 +0000 (10:12 +0100)]
dm ioctl: support cookies for udev
Add support for passing a 32 bit "cookie" into the kernel with the
DM_SUSPEND, DM_DEV_RENAME and DM_DEV_REMOVE ioctls. The (unsigned)
value of this cookie is returned to userspace alongside the uevents
issued by these ioctls in the variable DM_COOKIE.
This means the userspace process issuing these ioctls can be notified
by udev after udev has completed any actions triggered.
To minimise the interface extension, we pass the cookie into the
kernel in the event_nr field which is otherwise unused when calling
these ioctls. Incrementing the version number allows userspace to
determine in advance whether or not the kernel supports the cookie.
If the kernel does support this but userspace does not, there should
be no impact as the new variable will just get ignored.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Peter Rajnoha [Mon, 22 Jun 2009 09:12:29 +0000 (10:12 +0100)]
dm: sysfs add suspended attribute
Add a file named 'suspended' to each device-mapper device directory in
sysfs. It holds the value 1 while the device is suspended. Otherwise
it holds 0.
Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Jonthan Brassow [Mon, 22 Jun 2009 09:12:29 +0000 (10:12 +0100)]
dm table: improve warning message when devices not freed before destruction
Report any devices forgotten to be freed before a table is destroyed.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Kiyoshi Ueda [Mon, 22 Jun 2009 09:12:28 +0000 (10:12 +0100)]
dm mpath: add service time load balancer
This patch adds a service time oriented dynamic load balancer,
dm-service-time, which selects the path with the shortest estimated
service time for the incoming I/O.
The service time is estimated by dividing the in-flight I/O size
by a performance value of each path.
The performance value can be given as a table argument at the table
loading time. If no performance value is given, all paths are
considered equal.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Kiyoshi Ueda [Mon, 22 Jun 2009 09:12:27 +0000 (10:12 +0100)]
dm mpath: add queue length load balancer
This patch adds a dynamic load balancer, dm-queue-length, which
balances the number of in-flight I/Os across the paths.
The code is based on the patch posted by Stefan Bader:
https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Kiyoshi Ueda [Mon, 22 Jun 2009 09:12:27 +0000 (10:12 +0100)]
dm mpath: add start_io and nr_bytes to path selectors
This patch makes two additions to the dm path selector interface for
dynamic load balancers:
o a new hook, start_io()
o a new parameter 'nr_bytes' to select_path()/start_io()/end_io()
to pass the size of the I/O
start_io() is called when a target driver actually submits I/O
to the selected path.
Path selectors can use it to start accounting of the I/O.
(e.g. counting the number of in-flight I/Os.)
The start_io hook is based on the patch posted by Stefan Bader:
https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html
nr_bytes, the size of the I/O, is so path selectors can take the
size of the I/O into account when deciding which path to use.
dm-service-time uses it to estimate service time, for example.
(Added the nr_bytes member to dm_mpath_io instead of using existing
details.bi_size, since request-based dm patch deletes it.)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:26 +0000 (10:12 +0100)]
dm snapshot: use barrier when writing exception store
Send barrier requests when updating the exception area.
Exception area updates need to be ordered w.r.t. data writes, so that
the writes are not reordered in hardware disk cache.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:26 +0000 (10:12 +0100)]
dm io: retry after barrier error
If -EOPNOTSUPP was returned and the request was a barrier request, retry it
without barrier.
Retry all regions for now. Barriers are submitted only for one-region requests,
so it doesn't matter. (In the future, retries can be limited to the actual
regions that failed.)
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:25 +0000 (10:12 +0100)]
dm io: record eopnotsupp
Add another field, eopnotsupp_bits. It is subset of error_bits, representing
regions that returned -EOPNOTSUPP. (The bit is set in both error_bits and
eopnotsupp_bits).
This value will be used in further patches.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:25 +0000 (10:12 +0100)]
dm snapshot: support barriers
Flush support for dm-snapshot target.
This patch just forwards the flush request to either the origin or the snapshot
device. (It doesn't flush exception store metadata.)
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:24 +0000 (10:12 +0100)]
dm mpath: support barriers
Flush support for dm-multipath target.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:23 +0000 (10:12 +0100)]
dm delay: support barriers
Flush support for dm-delay target.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:23 +0000 (10:12 +0100)]
dm crypt: support flush
Flush support for dm-crypt target.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:22 +0000 (10:12 +0100)]
dm: stripe support flush
Flush support for the stripe target.
This sets ti->num_flush_requests to the number of stripes and
remaps individual flush requests to the appropriate stripe devices.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:22 +0000 (10:12 +0100)]
dm: linear support flush
Flush support for the linear target.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:21 +0000 (10:12 +0100)]
dm: send empty barriers to targets in dm_flush
Pass empty barrier flushes to the targets in dm_flush().
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Alasdair G Kergon [Mon, 22 Jun 2009 09:12:21 +0000 (10:12 +0100)]
dm: initialise tio in alloc_tio
Move repeated dm_target_io initialisation inside alloc_tio().
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:20 +0000 (10:12 +0100)]
dm: introduce num_flush_requests
Introduce num_flush_requests for a target to set to say how many flush
instructions (empty barriers) it wants to receive. These are sent by
__clone_and_map_empty_barrier with map_info->flush_request going from 0
to (num_flush_requests - 1).
Old targets without flush support won't receive any flush requests.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:20 +0000 (10:12 +0100)]
dm: remove check that prevents mapping empty bios
Remove the check that the size of the cloned bio is not zero because a
subsequent patch needs to send zero-sized barriers down this path.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:19 +0000 (10:12 +0100)]
dm: remove EOPNOTSUPP for barriers
If the underlying device doesn't support barriers and dm receives a
barrier, it waits until all requests on that device drain so it no
longer needs to report -EOPNOTSUPP to the caller.
This patch deals with the confusing situation when moving a volume from
one physical device to another triggers an EOPNOTSUPP on a volume that
didn't report it before.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:18 +0000 (10:12 +0100)]
dm: store only first barrier error
With the following patches, more than one error can occur during
processing. Change md->barrier_error so that only the first one is
recorded and returned to the caller.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:18 +0000 (10:12 +0100)]
dm: process requeue in dm_wq_work
If barrier request was returned with DM_ENDIO_REQUEUE,
requeue it in dm_wq_work instead of dec_pending.
This allows us to correctly handle a situation when some targets
are asking for a requeue and other targets signal an error.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:17 +0000 (10:12 +0100)]
dm: make dm_flush return void
Make dm_flush return void.
The first error during flush is stored in md->barrier_error instead.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:17 +0000 (10:12 +0100)]
dm: always hold bdev reference
Fix a potential deadlock when creating multiple snapshots by holding a
reference to struct block_device for the whole lifecycle of every dm
device instead of obtaining it independently at each point it is needed.
bdget_disk() was called while the device was being suspended, in
dm_suspend(). However there could be other devices already suspended,
for example when creating additional snapshots of a device. bdget_disk()
can wait for IO and allocate memory resulting in waiting for the
already-suspended device - deadlock.
This patch changes the code so that it gets the reference to struct
block_device when struct mapped_device is allocated and initialized in
alloc_dev() where it is always OK to allocate memory or wait for I/O.
It drops the reference when it is destroyed in free_dev(). Thus there
is no call to bdget_disk() while any device is suspended.
Previously unlock_fs() was called only if bdev was held. Now it is
called unconditionally, but the superfluous calls are harmless because
it returns immediately if the filesystem was not previously frozen.
This patch also now allows the device size to be changed in a
noflush suspend because the bdev is held. This has no adverse effect.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:15 +0000 (10:12 +0100)]
dm: rename suspended_bdev to bdev
Rename suspended_bdev to bdev.
This patch doesn't change any functionality, just renames the variable.
In the next patch, the variable will be used even for non-suspended device.
(Pre-requisite for the per-target barrier support patches.)
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Jonathan Brassow [Mon, 22 Jun 2009 09:12:15 +0000 (10:12 +0100)]
dm exception store: fix exstore lookup to be case insensitive
When snapshots are created using 'p' instead of 'P' as the
exception store type, the device-mapper table loading fails.
This patch makes the code case insensitive as intended and fixes some
regressions reported with device-mapper snapshots.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:14 +0000 (10:12 +0100)]
dm: use i_size_read
Use i_size_read() instead of reading i_size.
If someone changes the size of the device simultaneously, i_size_read
is guaranteed to return a valid value (either the old one or the new one).
i_size can return some intermediate invalid value (on 32-bit computers
with 64-bit i_size, the reads to both halves of i_size can be interleaved
with updates to i_size, resulting in garbage being returned).
Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:14 +0000 (10:12 +0100)]
dm: avoid unsupported spanning of md stripe boundaries
A bio that has two or more vector entries, size less than or equal to
page size, that crosses a stripe boundary of an underlying md device is
accepted by device mapper (it conforms to all its limits) but not by the
underlying device.
The fix is: If device mapper selects the one-page maximum request size,
it also needs to set its own q->merge_bvec_fn to reject any bios with
multiple vector entries that span more pages.
The problem was discovered in the following scenario:
* MD - RAID-0
* LV on the top of it (raid1, snapshot or striped with chunk
size/stripe larger than RAID-0 stripe)
* one of the logical volumes is exported to xen domU
* inside xen domU it is partitioned, the key point is that the partition
must be unaligned on page boundary (fdisk normally aligns the partition to
63 sectors which will trigger it)
* install the system on the partitioned disk in domU
This causes I/O failures in dom0.
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=223947
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:13 +0000 (10:12 +0100)]
dm mpath: flush keventd queue in destructor
The commit
fe9cf30eb8186ef267d1868dc9f12f2d0f40835a moves dm table event
submission from kmultipath queue to kernel kevent queue to avoid a
deadlock.
There is a possibility of race condition because kevent queue is not flushed
in the multipath destructor. The scenario is:
- some event happens and is queued to keventd
- keventd thread is delayed due to scheuling latency or some other work
- multipath device is destroyed
- keventd now attempts to process work_struct that is residing in already
released memory.
The patch flushes the keventd queue in multipath constructor.
I've already fixed similar bug in dm-raid1.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
Mikulas Patocka [Mon, 22 Jun 2009 09:12:13 +0000 (10:12 +0100)]
dm raid1: keep retrying alloc if mempool_alloc failed
If the code can't handle allocation failures, use __GFP_NOFAIL so that
in case of memory pressure the allocator will retry indefinitely and
won't return NULL which would cause a crash in the function.
This is still not a correct fix, it may cause a classic deadlock when
memory manager waits for I/O being done and I/O waits for some free memory.
I/O code shouldn't allocate any memory. But in this case it probably
doesn't matter much in practice, people usually do not swap on RAID.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Chandra Seetharaman [Mon, 22 Jun 2009 09:12:12 +0000 (10:12 +0100)]
dm mpath: call activate fn for each path in pg_init
Fixed a problem affecting reinstatement of passive paths.
Before we moved the hardware handler from dm to SCSI, it performed a pg_init
for a path group and didn't maintain any state about each path in hardware
handler code.
But in SCSI dh, such state is now maintained, as we want to fail I/O early on a
path if it is not the active path.
All the hardware handlers have a state now and set to active or some form of
inactive. They have prep_fn() which uses this state to fail the I/O without
it ever being sent to the device.
So in effect when dm-multipath calls scsi_dh_activate(), activate is
sent to only one path and the "state" of that path is changed appropriately
to "active" while other paths in the same path group are never changed
as they never got an "activate".
In order make sure all the paths in a path group gets their state set
properly when a pg_init happens, we need to call scsi_dh_activate() on
all paths in a path group.
Doing this at the hardware handler layer is not a good option as we
want the multipath layer to define the relationship between path and path
groups and not the hardware handler.
Attached patch sends an "activate" on each path in a path group when a
path group is switched. It also sends an activate when a path is reinstated.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Hannes Reinecke [Mon, 22 Jun 2009 09:12:11 +0000 (10:12 +0100)]
dm mpath: change attached scsi_dh
When specifying a different hardware handler via multipath
features we should be able to override the built-in defaults.
The problem here is the hardware table from scsi_dh is compiled
in and cannot be changed from userland. The multipath.conf OTOH
is purely user-defined and, what's more, the user might have a valid
reason for modifying it.
(EG EMC Clariion can well be run in PNR mode even though ALUA is
active, or the user might want to try ALUA on any as-of-yet unknown
devices)
So _not_ allowing multipath to override the device handler setting
will just add to the confusion and makes error tracking even more
difficult.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Milan Broz [Mon, 22 Jun 2009 09:12:11 +0000 (10:12 +0100)]
dm: sysfs skip output when device is being destroyed
Do not process sysfs attributes when device is being destroyed.
Otherwise code can cause
BUG_ON(test_bit(DMF_FREEING, &md->flags));
in dm_put() call.
Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:12:10 +0000 (10:12 +0100)]
dm mpath: validate hw_handler argument count
Fix arg count parsing error in hw handlers.
Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Mon, 22 Jun 2009 09:08:02 +0000 (10:08 +0100)]
dm mpath: validate table argument count
The parser reads the argument count as a number but doesn't check that
sufficient arguments are supplied. This command triggers the bug:
dmsetup create mpath --table "0 `blockdev --getsize /dev/mapper/cr0`
multipath 0 0 2 1 round-robin 1000 0 1 1 /dev/mapper/cr0
round-robin 0 1 1 /dev/mapper/cr1 1000"
kernel BUG at drivers/md/dm-mpath.c:530!
Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Linus Torvalds [Sun, 21 Jun 2009 20:14:22 +0000 (13:14 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
sdhci: remove needless double parenthesis
sdhci: Specific quirk vor VIA SDHCI controller in VX855ES
s3cmci: fix dma configuration call
mmc: Add new via-sdmmc host controller driver
sdhci: Add support for hosts that are only capable of 1-bit transfers
MAINTAINERS: add myself as atmel-mci maintainer (sd/mmc interface)
sdhci: Add SDHCI_QUIRK_NO_MULTIBLOCK quirk
sdhci: Add better ADMA error reporting
sdhci-s3c: Samsung S3C based SDHCI controller glue
Linus Torvalds [Sun, 21 Jun 2009 20:14:07 +0000 (13:14 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: aes-ni - Remove CRYPTO_TFM_REQ_MAY_SLEEP from fpu template
crypto: aes-ni - Do not sleep when using the FPU
crypto: aes-ni - Fix cbc mode IV saving
crypto: padlock-aes - work around Nano CPU errata in CBC mode
crypto: padlock-aes - work around Nano CPU errata in ECB mode
Linus Torvalds [Sun, 21 Jun 2009 20:13:53 +0000 (13:13 -0700)]
Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Select frame pointers on x86
dma-debug: be more careful when building reference entries
dma-debug: check for sg_call_ents in best-fit algorithm too
Linus Torvalds [Sun, 21 Jun 2009 20:13:08 +0000 (13:13 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - Add model=6530g option
ALSA: hda - Acer Inspire 6530G model for Realtek ALC888
ALSA: snd_usb_caiaq: fix legacy input streaming
ASoC: Kill BUS_ID_SIZE
ALSA: HDA - Correct trivial typos in comments.
ALSA: HDA - Name-fixes in code (tagra/targa)
ALSA: HDA - Add pci-quirk for MSI MS-7350 motherboard.
ALSA: hda - Fix memory leak at codec creation
Linus Torvalds [Fri, 10 Apr 2009 16:01:23 +0000 (09:01 -0700)]
Move FAULT_FLAG_xyz into handle_mm_fault() callers
This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault(). All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 10 Apr 2009 15:43:11 +0000 (08:43 -0700)]
Remove internal use of 'write_access' in mm/memory.c
The fault handling routines really want more fine-grained flags than a
single "was it a write fault" boolean - the callers will want to set
flags like "you can return a retry error" etc.
And that's actually how the VM works internally, but right now the
top-level fault handling functions in mm/memory.c all pass just the
'write_access' boolean around.
This switches them over to pass around the FAULT_FLAG_xyzzy 'flags'
variable instead. The 'write_access' calling convention still exists
for the exported 'handle_mm_fault()' function, but that is next.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Sat, 20 Jun 2009 00:23:29 +0000 (02:23 +0200)]
ipc: unbreak 32-bit shmctl/semctl/msgctl
31a985f "ipc: use __ARCH_WANT_IPC_PARSE_VERSION in ipc/util.h" would
choose the implementation of ipc_parse_version() based on a symbol
defined in <asm/unistd.h>.
But it failed to also include this header and thus broke
IPC_64-passing 32-bit userspace because the flag wasn't masked out
properly anymore and the command not understood.
Include <linux/unistd.h> to give the architecture a chance to ask for
the no-no-op ipc_parse_version().
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pierre Ossman [Sun, 21 Jun 2009 18:59:33 +0000 (20:59 +0200)]
sdhci: remove needless double parenthesis
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Harald Welte [Thu, 18 Jun 2009 14:53:38 +0000 (16:53 +0200)]
sdhci: Specific quirk vor VIA SDHCI controller in VX855ES
The SDHCI controller found in the VX855ES requires 10ms
delay between applying power and applying clock.
This issue has been discovered and documented by the OLPC XO1.5 team.
Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Ben Dooks [Mon, 8 Jun 2009 22:33:56 +0000 (23:33 +0100)]
s3cmci: fix dma configuration call
This was missed in the DMA changes during the s3c24xx
updates in commit
8970ef47d56fd3db28ee798b9d400caf08abd924.
Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Harald Welte [Wed, 17 Jun 2009 18:22:39 +0000 (20:22 +0200)]
mmc: Add new via-sdmmc host controller driver
This adds the via-sdmmc driver for the SD/MMC-controller of VIA,
which is found in a number of recent integrated VIA chipset
products.
Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Anton Vorontsov [Wed, 17 Jun 2009 20:14:08 +0000 (00:14 +0400)]
sdhci: Add support for hosts that are only capable of 1-bit transfers
Some hosts (hardware configurations, or particular SD/MMC slots) may
not support 4-bit bus. For example, on MPC8569E-MDS boards we can
switch between serial (1-bit only) and nibble (4-bit) modes, thought
we have to disable more peripherals to work in 4-bit mode.
Along with some small core changes, this patch modifies sdhci-of
driver, so that now it looks for "sdhci,1-bit-only" property in the
device-tree, and if specified we enable a proper quirk.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Nicolas Ferre [Tue, 16 Jun 2009 11:05:50 +0000 (13:05 +0200)]
MAINTAINERS: add myself as atmel-mci maintainer (sd/mmc interface)
Add MAINTAINERS entry for atmel-mci driver.
This driver was maintained by its author: Haavard Skinnemoen. I take the
maintainance of it.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Ben Dooks [Sun, 14 Jun 2009 11:40:53 +0000 (12:40 +0100)]
sdhci: Add SDHCI_QUIRK_NO_MULTIBLOCK quirk
Add quirk to show the controller cannot do multi-block IO.
This is mainly for the Samsung SDHCI controller that currently
cannot manage to do multi-block PIO without timing out.
Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Ben Dooks [Sun, 14 Jun 2009 12:52:38 +0000 (13:52 +0100)]
sdhci: Add better ADMA error reporting
Update the ADMA error reporting to not only show the
overall controller state but also to print the ADMA
descriptor list.
Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Ben Dooks [Sun, 14 Jun 2009 12:52:37 +0000 (13:52 +0100)]
sdhci-s3c: Samsung S3C based SDHCI controller glue
Add support for the 'HSMMC' block(s) in the Samsung SoC
line. These are compatible with the SDHCI driver so add
the necessary setup and driver binding for the platform
devices.
Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Takashi Iwai [Sun, 21 Jun 2009 08:59:12 +0000 (10:59 +0200)]
Merge branch 'topic/hda' into for-linus
* topic/hda:
ALSA: hda - Add model=6530g option
ALSA: hda - Acer Inspire 6530G model for Realtek ALC888
ALSA: HDA - Correct trivial typos in comments.
ALSA: HDA - Name-fixes in code (tagra/targa)
ALSA: HDA - Add pci-quirk for MSI MS-7350 motherboard.
ALSA: hda - Fix memory leak at codec creation
Takashi Iwai [Sun, 21 Jun 2009 08:59:10 +0000 (10:59 +0200)]
Merge branch 'topic/caiaq' into for-linus
* topic/caiaq:
ALSA: snd_usb_caiaq: fix legacy input streaming
Takashi Iwai [Sun, 21 Jun 2009 08:59:04 +0000 (10:59 +0200)]
Merge branch 'topic/asoc' into for-linus
* topic/asoc:
ASoC: Kill BUS_ID_SIZE
Takashi Iwai [Sun, 21 Jun 2009 08:56:44 +0000 (10:56 +0200)]
ALSA: hda - Add model=6530g option
Add the new model string corresponding to the previous Acer Aspire
6530G support.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tony Vroon [Sat, 20 Jun 2009 23:40:10 +0000 (00:40 +0100)]
ALSA: hda - Acer Inspire 6530G model for Realtek ALC888
The selected 4930G model seemed to keep the subwoofer 'tuba'
function from operating correctly. Removing the existing PCI
ID match made this work again, but it was mapped to 'Side'
instead of to LFE as one would expect.
This attempts to enable all functionality and keep the amount
of available mixer sliders low. Any slider that had no audible
effect on the output audio has been removed, and as such EAPD
is not currently enabled.
Signed-off-by: Tony Vroon <tony@linx.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Peter Zijlstra [Fri, 12 Jun 2009 08:04:01 +0000 (10:04 +0200)]
lockdep: Select frame pointers on x86
x86 stack traces are a piece of crap without frame pointers, and its not
like the 'performance gain' of not having stack pointers matters when you
selected lockdep.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <new-submission>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Johannes Weiner [Fri, 19 Jun 2009 17:30:56 +0000 (19:30 +0200)]
mm: page_alloc: clear PG_locked before checking flags on free
da456f1 "page allocator: do not disable interrupts in free_page_mlock()" moved
the PG_mlocked clearing after the flag sanity checking which makes mlocked
pages always trigger 'bad page'. Fix this by clearing the bit up front.
Reported--and-debugged-by: Peter Chubb <peter.chubb@nicta.com.au>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Tested-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 20 Jun 2009 22:40:00 +0000 (15:40 -0700)]
x86, 64-bit: Clean up user address masking
The discussion about using "access_ok()" in get_user_pages_fast() (see
commit
7f8189068726492950bf1a2dcfd9b51314560abf: "x86: don't use
'access_ok()' as a range check in get_user_pages_fast()" for details and
end result), made us notice that x86-64 was really being very sloppy
about virtual address checking.
So be way more careful and straightforward about masking x86-64 virtual
addresses:
- All the VIRTUAL_MASK* variants now cover half of the address
space, it's not like we can use the full mask on a signed
integer, and the larger mask just invites mistakes when
applying it to either half of the 48-bit address space.
- /proc/kcore's kc_offset_to_vaddr() becomes a lot more
obvious when it transforms a file offset into a
(kernel-half) virtual address.
- Unify/simplify the 32-bit and 64-bit USER_DS definition to
be based on TASK_SIZE_MAX.
This cleanup and more careful/obvious user virtual address checking also
uncovered a buglet in the x86-64 implementation of strnlen_user(): it
would do an "access_ok()" check on the whole potential area, even if the
string itself was much shorter, and thus return an error even for valid
strings. Our sloppy checking had hidden this.
So this fixes 'strnlen_user()' to do this properly, the same way we
already handled user strings in 'strncpy_from_user()'. Namely by just
checking the first byte, and then relying on fault handling for the
rest. That always works, since we impose a guard page that cannot be
mapped at the end of the user space address space (and even if we
didn't, we'd have the address space hole).
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 20 Jun 2009 18:30:01 +0000 (11:30 -0700)]
Merge branch 'irq-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq, irq.h: Fix kernel-doc warnings
genirq: fix comment to say IRQ_WAKE_THREAD
Linus Torvalds [Sat, 20 Jun 2009 18:29:32 +0000 (11:29 -0700)]
Merge branch 'perfcounters-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
perfcounter: Handle some IO return values
perf_counter: Push perf_sample_data through the swcounter code
perf_counter tools: Define and use our own u64, s64 etc. definitions
perf_counter: Close race in perf_lock_task_context()
perf_counter, x86: Improve interactions with fast-gup
perf_counter: Simplify and fix task migration counting
perf_counter tools: Add a data file header
perf_counter: Update userspace callchain sampling uses
perf_counter: Make callchain samples extensible
perf report: Filter to parent set by default
perf_counter tools: Handle lost events
perf_counter: Add event overlow handling
fs: Provide empty .set_page_dirty() aop for anon inodes
perf_counter: tools: Makefile tweaks for 64-bit powerpc
perf_counter: powerpc: Add processor back-end for MPC7450 family
perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
perf_counter: powerpc: Change how processor-specific back-ends get selected
perf_counter: powerpc: Use unsigned long for register and constraint values
perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
perf_counter tools: Add and use isprint()
...
Linus Torvalds [Sat, 20 Jun 2009 17:57:40 +0000 (10:57 -0700)]
Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix out of scope variable access in sched_slice()
sched: Hide runqueues from direct refer at source code level
sched: Remove unneeded __ref tag
sched, x86: Fix cpufreq + sched_clock() TSC scaling
Linus Torvalds [Sat, 20 Jun 2009 17:56:46 +0000 (10:56 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
tracing/urgent: warn in case of ftrace_start_up inbalance
tracing/urgent: fix unbalanced ftrace_start_up
function-graph: add stack frame test
function-graph: disable when both x86_32 and optimize for size are configured
ring-buffer: have benchmark test print to trace buffer
ring-buffer: do not grab locks in nmi
ring-buffer: add locks around rb_per_cpu_empty
ring-buffer: check for less than two in size allocation
ring-buffer: remove useless compile check for buffer_page size
ring-buffer: remove useless warn on check
ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
tracing: update sample event documentation
tracing/filters: fix race between filter setting and module unload
tracing/filters: free filter_string in destroy_preds()
ring-buffer: use commit counters for commit pointer accounting
ring-buffer: remove unused variable
ring-buffer: have benchmark test handle discarded events
ring-buffer: prevent adding write in discarded area
tracing/filters: strloc should be unsigned short
tracing/filters: operand can be negative
...
Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
Linus Torvalds [Sat, 20 Jun 2009 17:51:44 +0000 (10:51 -0700)]
Merge branch 'timers-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
NOHZ: Properly feed cpufreq ondemand governor
Linus Torvalds [Sat, 20 Jun 2009 17:49:48 +0000 (10:49 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits)
x86, mce: fix error path in mce_create_device()
x86: use zalloc_cpumask_var for mce_dev_initialized
x86: fix duplicated sysfs attribute
x86: de-assembler-ize asm/desc.h
i386: fix/simplify espfix stack switching, move it into assembly
i386: fix return to 16-bit stack from NMI handler
x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present
x86: Remove duplicated #include's
x86: msr.h linux/types.h is only required for __KERNEL__
x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
x86, mce: mce_intel.c needs <asm/apic.h>
x86: apic/io_apic.c: dmar_msi_type should be static
x86, io_apic.c: Work around compiler warning
x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present
x86: mce: Handle banks == 0 case in K7 quirk
x86, boot: use .code16gcc instead of .code16
x86: correct the conversion of EFI memory types
x86: cap iomem_resource to addressable physical memory
x86, mce: rename _64.c files which are no longer 64-bit-specific
x86, mce: mce.h cleanup
...
Manually fix up trivial conflict in arch/x86/mm/fault.c
Linus Torvalds [Sat, 20 Jun 2009 17:37:01 +0000 (10:37 -0700)]
Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Add missing symbols for CONSTRUCTORS support
microblaze: remove init_mm
Linus Torvalds [Sat, 20 Jun 2009 17:19:49 +0000 (10:19 -0700)]
Merge git://git./linux/kernel/git/sam/kbuild-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
kernel-doc: fix param matching for array params
kernel-doc: ignore kmemcheck_bitfield_begin/end
kallsyms: fix inverted valid symbol checking
kbuild: fix build error during make htmldocs
Linus Torvalds [Sat, 20 Jun 2009 17:17:02 +0000 (10:17 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (35 commits)
Input: add driver for Synaptics I2C touchpad
Input: synaptics - add support for reporting x/y resolution
Input: ALPS - handle touchpoints buttons correctly
Input: gpio-keys - change timer to workqueue
Input: ads7846 - pin change interrupt support
Input: add support for touchscreen on W90P910 ARM platform
Input: appletouch - improve finger detection
Input: wacom - clear Intuos4 wheel data when finger leaves proximity
Input: ucb1400 - move static function from header into core
Input: add driver for EETI touchpanels
Input: ads7846 - more detailed model name in sysfs
Input: ads7846 - support swapping x and y axes
Input: ati_remote2 - use non-atomic bitops
Input: introduce lm8323 keypad driver
Input: psmouse - ESD workaround fix for OLPC XO touchpad
Input: tsc2007 - make sure platform provides get_pendown_state()
Input: uinput - flush all pending ff effects before destroying device
Input: simplify name handling for certain input handles
Input: serio - do not use deprecated dev.power.power_state
Input: wacom - add support for Intuos4 tablets
...
Linus Torvalds [Sat, 20 Jun 2009 17:15:30 +0000 (10:15 -0700)]
Merge branch 'drm-linus' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (24 commits)
agp/intel: Make intel_i965_mask_memory use dma_addr_t for physical addresses
agp: add user mapping support to ATI AGP bridge.
drm/i915: enable GEM on PAE.
drm/radeon: fix unused variables warning
agp: switch AGP to use page array instead of unsigned long array
agpgart: detected ALi M???? chipset with M1621
drm/radeon: command stream checker for r3xx-r5xx hardware
drm/radeon: Fully initialize LVDS info also when we can't get it from the ROM.
radeon: Fix CP byte order on big endian architectures with KMS.
agp/uninorth: Handle user memory types.
drm/ttm: Add some powerpc cache flush code.
radeon: Enable modesetting on non-x86.
drm/radeon: Respect AGP cant_use_aperture flag.
drm: EDID endianness fixes.
drm/radeon: this VRAM vs aperture test is wrong, just remove it.
drm/ttm: fix an error path to exit function correctly
drm: Apply "Memory fragmentation from lost alignment blocks"
ttm: Return -ERESTART when a signal interrupts bo eviction.
drm: Remove memory debugging infrastructure.
drm/i915: Clear fence register on tiling stride change.
...
Linus Torvalds [Sat, 20 Jun 2009 17:14:11 +0000 (10:14 -0700)]
Merge git://git./linux/kernel/git/hirofumi/fatfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
fat: Fix the removal of opts->fs_dmask
Linus Torvalds [Sat, 20 Jun 2009 17:11:11 +0000 (10:11 -0700)]
Merge branch 'for-2.6.31' of git://git./linux/kernel/git/bart/ide-2.6
* 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (34 commits)
ide-cd: prevent null pointer deref via cdrom_newpc_intr
ide: BUG() on unknown requests
ide: filter out invalid DMA xfer mode changes in HDIO_DRIVE_CMD ioctl handler
ide: do not access ide_drive_t 'drive_data' field directly
sl82c105: implement test_irq() method
siimage: implement test_irq() method
pdc202xx_old: implement test_irq() method (take 2)
cmd64x: implement test_irq() method
cmd640: implement test_irq() method
ide: move ack_intr() method into 'struct ide_port_ops' (take 2)
ide: move IRQ clearing from ack_intr() method to clear_irq() method (take 2)
siimage: use ide_dma_test_irq() (take 2)
cmd64x: implement clear_irq() method (take 2)
ide: call clear_irq() method in ide_timer_expiry()
sgiioc4: coding style cleanup
ide: don't enable IORDY at a probe time
ide: IORDY handling fixes
ata: add ata_id_pio_need_iordy() helper (v2)
ide-tape: fix build issue
ide: unify interrupt reason checking
...
Linus Torvalds [Sat, 20 Jun 2009 16:52:27 +0000 (09:52 -0700)]
x86: don't use 'access_ok()' as a range check in get_user_pages_fast()
It's really not right to use 'access_ok()', since that is meant for the
normal "get_user()" and "copy_from/to_user()" accesses, which are done
through the TLB, rather than through the page tables.
Why? access_ok() does both too few, and too many checks. Too many,
because it is meant for regular kernel accesses that will not honor the
'user' bit in the page tables, and because it honors the USER_DS vs
KERNEL_DS distinction that we shouldn't care about in GUP. And too few,
because it doesn't do the 'canonical' check on the address on x86-64,
since the TLB will do that for us.
So instead of using a function that isn't meant for this, and does
something else and much more complicated, just do the real rules: we
don't want the range to overflow, and on x86-64, we want it to be a
canonical low address (on 32-bit, all addresses are canonical).
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Sat, 20 Jun 2009 16:26:48 +0000 (18:26 +0200)]
Merge branch 'tip/tracing/urgent-1' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent
Ingo Molnar [Sat, 20 Jun 2009 15:25:49 +0000 (17:25 +0200)]
Merge branch 'tip/tracing/urgent' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent
OGAWA Hirofumi [Sat, 20 Jun 2009 12:50:07 +0000 (21:50 +0900)]
fat: Fix the removal of opts->fs_dmask
(
ce3b0f8d5c2203301fc87f3aaaed73e5819e2a48: New helper - current_umask())
is removing the opts->fs_dmask, probably it's a cut-and-paste
miss or something.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Michal Simek [Sat, 20 Jun 2009 12:24:01 +0000 (14:24 +0200)]
microblaze: Add missing symbols for CONSTRUCTORS support
Commit
b99b87f70c7785ab1e253c6220f4b0b57ce3a7f7 add CONSTRUCTOR
support to Linux but Microblaze not defined KERNEL_CTORS symbols
which are used with that patch.
This patch fixed it.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Arnd Bergmann [Thu, 18 Jun 2009 17:55:26 +0000 (19:55 +0200)]
microblaze: remove init_mm
Alexey removed the definition for init_mm from all architectures
but forgot microblaze, which was only recently added.
This fixes the microblaze build by dropping it there as well.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Randy Dunlap [Thu, 18 Jun 2009 00:37:47 +0000 (17:37 -0700)]
kernel-doc: fix param matching for array params
Fix function actual parameter vs. kernel-doc description matching
so that a warning is not printed when it should not be:
Warning(include/linux/etherdevice.h:199): Excess function parameter 'addr' description in 'is_etherdev_addr'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Randy Dunlap [Thu, 18 Jun 2009 00:36:15 +0000 (17:36 -0700)]
kernel-doc: ignore kmemcheck_bitfield_begin/end
Teach kernel-doc to ignore kmemcheck_bitfield_{begin,end} sugar
so that it won't generate warnings like this:
Warning(include/net/sock.h:297): No description found for parameter 'kmemcheck_bitfield_begin(flags)'
Warning(include/net/sock.h:297): No description found for parameter 'kmemcheck_bitfield_end(flags)'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Mike Frysinger [Mon, 15 Jun 2009 11:52:48 +0000 (07:52 -0400)]
kallsyms: fix inverted valid symbol checking
The previous commit (
17b1f0de) introduced a slightly broken consolidation
of the memory text range checking.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Amerigo Wang [Fri, 19 Jun 2009 07:06:54 +0000 (03:06 -0400)]
kbuild: fix build error during make htmldocs
Fix the following build error when do 'make htmldocs':
DOCPROC Documentation/DocBook/debugobjects.xml
exec /scripts/kernel-doc: No such file or directory
exec /scripts/kernel-doc: No such file or directory
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Frederic Weisbecker [Sat, 20 Jun 2009 00:01:40 +0000 (02:01 +0200)]
perfcounter: Handle some IO return values
Building perfcounter tools raises the following warnings:
builtin-record.c: In function ‘atexit_header’:
builtin-record.c:464: erreur: ignoring return value of ‘pwrite’, declared with attribute warn_unused_result
builtin-record.c: In function ‘__cmd_record’:
builtin-record.c:503: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
builtin-report.c: In function ‘__cmd_report’:
builtin-report.c:1403: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
This patch handles these IO return values.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1245456100-5477-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 19 Jun 2009 16:11:53 +0000 (18:11 +0200)]
perf_counter: Push perf_sample_data through the swcounter code
Push the perf_sample_data further outwards to the swcounter interface,
to abstract it away some more.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rainer Weikusat [Thu, 18 Jun 2009 15:04:00 +0000 (17:04 +0200)]
ide-cd: prevent null pointer deref via cdrom_newpc_intr
With 2.6.30, the error handling code in cdrom_newpc_intr was changed
to deal with partial request failures by normally completing the 'good'
parts of a request and only 'error' the last (and presumably,
incompletely transferred) bio associated with a particular
request. In order to do this, ide_complete_rq is called over
ide_cd_error_cmd() to partially complete the rq. The block layer
does partial completion only for requests with bio's and if the
rq doesn't have one (eg 'GPCMD_READ_DISC_INFO') the request is
completed as a whole and the drive->hwif->rq pointer set to NULL
afterwards. When calling ide_complete_rq again to report
the error, this null pointer is derefenced, resulting in a kernel
crash.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=13399.
Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com>
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Ingo Molnar [Sat, 20 Jun 2009 08:54:22 +0000 (10:54 +0200)]
Merge branch 'x86/mce3' into x86/urgent
Mike Rapoport [Thu, 11 Jun 2009 15:08:39 +0000 (08:08 -0700)]
Input: add driver for Synaptics I2C touchpad
This driver supports Synaptics I2C touchpad controller on eXeda
mobile device. Unfortunaltely it only works in relative mode and
thus is not comaptible with Xorg Synaptics driver.
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Tero Saarni [Thu, 11 Jun 2009 06:27:24 +0000 (23:27 -0700)]
Input: synaptics - add support for reporting x/y resolution
Synaptics uses anisotropic coordinate system. On some wide touchpads
vertical resolution can be twice as high as horizontal which causes
unequal sensitivity on x/y directions. Add support for reading the
resolution with EVIOCGABS ioctl.
Signed-off-by: Tero Saarni <tero.saarni@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Frederic Weisbecker [Sat, 20 Jun 2009 04:52:21 +0000 (06:52 +0200)]
tracing/urgent: warn in case of ftrace_start_up inbalance
Prevent from further ftrace_start_up inbalances so that we avoid
future nop patching omissions with dynamic ftrace.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Frederic Weisbecker [Sat, 20 Jun 2009 03:45:14 +0000 (05:45 +0200)]
tracing/urgent: fix unbalanced ftrace_start_up
Perfcounter reports the following stats for a wide system
profiling:
#
# (2364 samples)
#
# Overhead Symbol
# ........ ......
#
15.40% [k] mwait_idle_with_hints
8.29% [k] read_hpet
5.75% [k] ftrace_caller
3.60% [k] ftrace_call
[...]
This snapshot has been taken while neither the function tracer nor
the function graph tracer was running.
With dynamic ftrace, such results show a wrong ftrace behaviour
because all calls to ftrace_caller or ftrace_graph_caller (the patched
calls to mcount) are supposed to be patched into nop if none of those
tracers are running.
The problem occurs after the first run of the function tracer. Once we
launch it a second time, the callsites will never be nopped back,
unless you set custom filters.
For example it happens during the self tests at boot time.
The function tracer selftest runs, and then the dynamic tracing is
tested too. After that, the callsites are left un-nopped.
This is because the reset callback of the function tracer tries to
unregister two ftrace callbacks in once: the common function tracer
and the function tracer with stack backtrace, regardless of which
one is currently in use.
It then creates an unbalance on ftrace_start_up value which is expected
to be zero when the last ftrace callback is unregistered. When it
reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace
command, triggering the patching into nop. But since it becomes
unbalanced, ie becomes lower than zero, if the kernel functions
are patched again (as in every further function tracer runs), they
won't ever be nopped back.
Note that ftrace_call and ftrace_graph_call are still patched back
to ftrace_stub in the off case, but not the callers of ftrace_call
and ftrace_graph_caller. It means that the tracing is well deactivated
but we waste a useless call into every kernel function.
This patch just unregisters the right ftrace_ops for the function
tracer on its reset callback and ignores the other one which is
not registered, fixing the unbalance. The problem also happens
is .30
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@kernel.org