Alex Elder [Thu, 7 Mar 2013 21:38:26 +0000 (15:38 -0600)]
libceph: record message data byte length
Record the number of bytes of data in a page array rather than the
number of pages in the array. It can be assumed that the page array
is of sufficient size to hold the number of bytes indicated (and
offset by the indicated alignment).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Tue, 5 Mar 2013 04:29:57 +0000 (22:29 -0600)]
ceph: only set message data pointers if non-empty
Change it so we only assign outgoing data information for messages
if there is outgoing data to send.
This then allows us to add a few more (currently commented-out)
assertions.
This is related to:
http://tracker.ceph.com/issues/4284
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Alex Elder [Thu, 14 Feb 2013 18:16:43 +0000 (12:16 -0600)]
libceph: isolate other message data fields
Define ceph_msg_data_set_pagelist(), ceph_msg_data_set_bio(), and
ceph_msg_data_set_trail() to clearly abstract the assignment of the
remaining data-related fields in a ceph message structure. Use the
new functions in the osd client and mds client.
This partially resolves:
http://tracker.ceph.com/issues/4263
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Thu, 7 Mar 2013 21:38:26 +0000 (15:38 -0600)]
libceph: set page info with byte length
When setting page array information for message data, provide the
byte length rather than the page count ceph_msg_data_set_pages().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Thu, 14 Feb 2013 18:16:43 +0000 (12:16 -0600)]
libceph: isolate message page field manipulation
Define a function ceph_msg_data_set_pages(), which more clearly
abstracts the assignment page-related fields for data in a ceph
message structure. Use this new function in the osd client and mds
client.
Ideally, these fields would never be set more than once (with
BUG_ON() calls to guarantee that). At the moment though the osd
client sets these every time it receives a message, and in the event
of a communication problem this can happen more than once. (This
will be resolved shortly, but setting up these helpers first makes
it all a bit easier to work with.)
Rearrange the field order in a ceph_msg structure to group those
that are used to define the possible data payloads.
This partially resolves:
http://tracker.ceph.com/issues/4263
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Thu, 7 Mar 2013 21:38:25 +0000 (15:38 -0600)]
libceph: record byte count not page count
Record the byte count for an osd request rather than the page count.
The number of pages can always be derived from the byte count (and
alignment/offset) but the reverse is not true.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 2 Mar 2013 00:00:16 +0000 (18:00 -0600)]
libceph: simplify new message initialization
Rather than explicitly initializing many fields to 0, NULL, or false
in a newly-allocated message, just use kzalloc() for allocating new
messages. This will become a much more convenient way of doing
things anyway for upcoming patches that abstract the data field.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Thu, 7 Mar 2013 05:39:38 +0000 (23:39 -0600)]
libceph: advance pagelist with list_rotate_left()
While processing an outgoing pagelist (either the data pagelist or
trail) in a ceph message, the messenger cycles through each of the
pages on the list. This is accomplished in out_msg_pos_next(), if
the end of the first page on the list is reached, the first page is
moved to the end of the list.
There is a list operation, list_rotate_left(), which performs
exactly this operation, and by using it, what's really going on
becomes more obvious.
So replace these two list_move_tail() calls with list_rotate_left().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 9 Mar 2013 00:51:04 +0000 (18:51 -0600)]
libceph: define and use in_msg_pos_next()
Define a new function in_msg_pos_next() to match out_msg_pos_next(),
and use it in place of code at the end of read_partial_message_pages()
and read_partial_message_bio().
Note that the page number is incremented and offset reset under
slightly different conditions from before. The result is
equivalent, however, as explained below.
Each time an incoming message is going to arrive, we find out how
much room is left--not surpassing the current page--and provide that
as the number of bytes to receive. So the amount we'll use is the
lesser of: all that's left of the entire request; and all that's
left in the current page.
If we received exactly how many were requested, we either reached
the end of the request or the end of the page. In the first case,
we're done, in the second, we move onto the next page in the array.
In all cases but (possibly) on the last page, after adding the
number of bytes received, page_pos == PAGE_SIZE. On the last page,
it doesn't really matter whether we increment the page number and
reset the page position, because we're done and we won't come back
here again. The code previously skipped over that last case,
basically. The new code handles that case the same as the others,
incrementing and resetting.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 9 Mar 2013 00:51:03 +0000 (18:51 -0600)]
libceph: kill args in read_partial_message_bio()
There is only one caller for read_partial_message_bio(), and it
always passes &msg->bio_iter and &bio_seg as the second and third
arguments. Furthermore, the message in question is always the
connection's in_msg, and we can get that inside the called function.
So drop those two parameters and use their derived equivalents.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Thu, 7 Mar 2013 05:39:38 +0000 (23:39 -0600)]
libceph: change type of ceph_tcp_sendpage() "more"
Change the type of the "more" parameter from int to bool.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 9 Mar 2013 00:51:03 +0000 (18:51 -0600)]
libceph: minor byte order problems in read_partial_message()
Some values printed are not (necessarily) in CPU order. We already
have a copy of the converted versions, so use them.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 9 Mar 2013 00:51:03 +0000 (18:51 -0600)]
libceph: define CEPH_MSG_MAX_MIDDLE_LEN
This is probably unnecessary but the code read as if it were wrong
in read_partial_message().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Tue, 5 Mar 2013 15:25:10 +0000 (09:25 -0600)]
libceph: clean up skipped message logic
In ceph_con_in_msg_alloc() it is possible for a connection's
alloc_msg method to indicate an incoming message should be skipped.
By default, read_partial_message() initializes the skip variable
to 0 before it gets provided to ceph_con_in_msg_alloc().
The osd client, mon client, and mds client each supply an alloc_msg
method. The mds client always assigns skip to be 0.
The other two leave the skip value of as-is, or assigns it to zero,
except:
- if no (osd or mon) request having the given tid is found, in
which case skip is set to 1 and NULL is returned; or
- in the osd client, if the data of the reply message is not
adequate to hold the message to be read, it assigns skip
value 1 and returns NULL.
So the returned message pointer will always be NULL if skip is ever
non-zero.
Clean up the logic a bit in ceph_con_in_msg_alloc() to make this
state of affairs more obvious. Add a comment explaining how a null
message pointer can mean either a message that should be skipped or
a problem allocating a message.
This resolves:
http://tracker.ceph.com/issues/4324
Reported-by: Greg Farnum <greg@inktank.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Alex Elder [Thu, 14 Feb 2013 18:16:43 +0000 (12:16 -0600)]
libceph: separate read and write data
An osd request defines information about where data to be read
should be placed as well as where data to write comes from.
Currently these are represented by common fields.
Keep information about data for writing separate from data to be
read by splitting these into data_in and data_out fields.
This is the key patch in this whole series, in that it actually
identifies which osd requests generate outgoing data and which
generate incoming data. It's less obvious (currently) that an osd
CALL op generates both outgoing and incoming data; that's the focus
of some upcoming work.
This resolves:
http://tracker.ceph.com/issues/4127
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Thu, 14 Feb 2013 18:16:43 +0000 (12:16 -0600)]
libceph: distinguish page and bio requests
An osd request uses either pages or a bio list for its data. Use a
union to record information about the two, and add a data type
tag to select between them.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Thu, 14 Feb 2013 18:16:43 +0000 (12:16 -0600)]
libceph: separate osd request data info
Pull the fields in an osd request structure that define the data for
the request out into a separate structure.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 2 Mar 2013 00:00:15 +0000 (18:00 -0600)]
libceph: don't assign page info in ceph_osdc_new_request()
Currently ceph_osdc_new_request() assigns an osd request's
r_num_pages and r_alignment fields. The only thing it does
after that is call ceph_osdc_build_request(), and that doesn't
need those fields to be assigned.
Move the assignment of those fields out of ceph_osdc_new_request()
and into its caller. As a result, the page_align parameter is no
longer used, so get rid of it.
Note that in ceph_sync_write(), the value for req->r_num_pages had
already been calculated earlier (as num_pages, and fortunately
it was computed the same way). So don't bother recomputing it,
but because it's not needed earlier, move that calculation after the
call to ceph_osdc_new_request(). Hold off making the assignment to
r_alignment, doing it instead r_pages and r_num_pages are
getting set.
Similarly, in start_read(), nr_pages already holds the number of
pages in the array (and is calculated the same way), so there's no
need to recompute it. Move the assignment of the page alignment
down with the others there as well.
This and the next few patches are preparation work for:
http://tracker.ceph.com/issues/4127
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 16 Feb 2013 17:07:32 +0000 (17:07 +0000)]
ceph: simplify ceph_sync_write() page_align calculation
(This is being reposted. The first one had a problem because it
erroneously added a similar change elsewhere; that change has been
dropped.)
The next patch in this series points out that the calculation for
the number of pages in an osd request is getting done twice. It
is not obvious, but the result of both calculations is identical.
This patch simplifies one of them--as a separate step--to make
it clear that the transformation in the next patch is valid.
In ceph_sync_write() there is some magic that computes page_align
for an osd request. But a little analysis shows it can be
simplified.
First, we have:
io_align = pos & ~PAGE_MASK;
which is used here:
page_align = (pos - io_align + buf_align) & ~PAGE_MASK;
Note (pos - io_align) simply rounds "pos" down to the nearest multiple
of the page size.
We also have:
buf_align = (unsigned long)data & ~PAGE_MASK;
Adding buf_align to that rounded-down "pos" value will stay within
the same page; the result will just be offset by the page offset for
the "data" pointer. The final mask therefore leaves just the value
of "buf_align".
One more simplification. Note that the result of calc_pages_for()
is invariant of which page the offset starts in--the only thing that
matters is the offset within the starting page. We will have
put the proper page offset to use into "page_align", so just use
that in calculating num_pages.
This resolves:
http://tracker.ceph.com/issues/4166
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 2 Mar 2013 00:00:15 +0000 (18:00 -0600)]
ceph: use calc_pages_for() in start_read()
There's a spot that computes the number of pages to allocate for a
page-aligned length by just shifting it. Use calc_pages_for()
instead, to be consistent with usage everywhere else. The result
is the same.
The reason for this is to make it clearer in an upcoming patch that
this calculation is duplicated.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 2 Mar 2013 00:00:14 +0000 (18:00 -0600)]
libceph: no need for alignment for mds message
Currently, incoming mds messages never use page data, which means
there is no need to set the page_alignment field in the message.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Alex Elder [Sat, 2 Mar 2013 00:00:14 +0000 (18:00 -0600)]
libceph: define mds_alloc_msg() method
The only user of the ceph messenger that doesn't define an alloc_msg
method is the mds client. Define one, such that it works just like
it did before, and simplify ceph_con_in_msg_alloc() by assuming the
alloc_msg method is always present.
This and the next patch resolve:
http://tracker.ceph.com/issues/4322
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Alex Elder [Sat, 2 Mar 2013 00:00:14 +0000 (18:00 -0600)]
libceph: drop mutex while allocating a message
In ceph_con_in_msg_alloc(), if no alloc_msg method is defined for a
connection a new message is allocated with ceph_msg_new().
Drop the mutex before making this call, and make sure we're still
connected when we get it back again.
This is preparing for the next patch, which ensures all connections
define an alloc_msg method, and then handles them all the same way.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Alex Elder [Sat, 2 Mar 2013 00:00:15 +0000 (18:00 -0600)]
libceph: rename ceph_calc_object_layout()
The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.
Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.
Change the function so it takes a pool number rather than the full
file layout structure. Only update the ceph_pg if the pool is found
in the osd map. Get rid of few useless lines of code from the
function while there.
Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 2 Mar 2013 00:00:15 +0000 (18:00 -0600)]
libceph: kill ceph_msg->pagelist_count
The pagelist_count field is never actually used, so get rid of it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 2 Mar 2013 00:00:15 +0000 (18:00 -0600)]
libceph: use (void *) for untyped data in osd ops
Two of the fields defining osd operations are defined using (char *)
while the data they represent are really untyped, not character
strings. Change them to have type (void *).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Mon, 4 Mar 2013 17:08:29 +0000 (11:08 -0600)]
libceph: fix wrong opcode use in osd_req_encode_op()
The new cases added to osd_req_encode_op() caused a new sparse
error, which highlighted an existing problem that had been
overlooked since it was originally checked in. When an unsupported
opcode is found the destination rather than the source opcode was
being used in the error message. The two differ in their byte
order, and we want to be using the one in the source.
Fix the problem in both spots.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Wed, 27 Feb 2013 16:26:25 +0000 (10:26 -0600)]
libceph: complete lingering requests only once
An osd request marked to linger will be re-submitted in the event
a connection to the target osd gets dropped. Currently, if there
is a callback function associated with a request it will be called
each time a request is submitted--which for lingering requests can
be more than once.
Change it so a request--including lingering ones--will get completed
(from the perspective of the user of the osd client) exactly once.
This resolves:
http://tracker.ceph.com/issues/3967
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Yan, Zheng [Fri, 1 Mar 2013 02:57:54 +0000 (10:57 +0800)]
ceph: acquire i_mutex in __ceph_do_pending_vmtruncate
make __ceph_do_pending_vmtruncate() acquire the i_mutex if the caller
does not hold the i_mutex, so ceph_aio_read() can call safely.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Yan, Zheng [Fri, 1 Mar 2013 02:55:39 +0000 (10:55 +0800)]
ceph: don't early drop Fw cap
ceph_aio_write() has an optimization that marks CEPH_CAP_FILE_WR
cap dirty before data is copied to page cache and inode size is
updated. The optimization avoids slow cap revocation caused by
balance_dirty_pages(), but introduces inode size update race. If
ceph_check_caps() flushes the dirty cap before the inode size is
updated, MDS can miss the new inode size. So just remove the
optimization.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Sage Weil [Thu, 2 May 2013 04:15:58 +0000 (21:15 -0700)]
ceph: revert commit
22cddde104
commit
22cddde104 breaks the atomicity of write operation, it also
introduces a deadlock between write and truncate.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Conflicts:
fs/ceph/addr.c
Yan, Zheng [Mon, 18 Feb 2013 08:38:14 +0000 (16:38 +0800)]
ceph: use I_COMPLETE inode flag instead of D_COMPLETE flag
commit
c6ffe10015 moved the flag that tracks if the dcache contents
for a directory are complete to dentry. The problem is there are
lots of places that use ceph_dir_{set,clear,test}_complete() while
holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may
sleep because they call dput().
This patch basically reverts that commit. For ceph_d_prune(), it's
called with both the dentry to prune and the parent dentry are
locked. So it's safe to access the parent dentry's d_inode and
clear I_COMPLETE flag.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Yan, Zheng [Wed, 27 Feb 2013 01:26:09 +0000 (09:26 +0800)]
ceph: set mds_want according to cap import message
MDS ignores cap update message if migrate_seq mismatch, so when
receiving a cap import message with higher migrate_seq, set mds_want
according to the cap import message.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Yan, Zheng [Mon, 18 Feb 2013 05:43:43 +0000 (13:43 +0800)]
ceph: queue cap release when trimming cap
So the client will later send cap release message to MDS
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Yan, Zheng [Thu, 21 Feb 2013 05:43:55 +0000 (13:43 +0800)]
ceph: fix LSSNAP regression
commit
6e8575faa8 makes parse_reply_info_extra() return -EIO for LSSNAP
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Alex Elder [Thu, 14 Feb 2013 18:16:43 +0000 (12:16 -0600)]
libceph: set page alignment in start_request()
The page alignment field for a request is currently set in
ceph_osdc_build_request(). It's not needed at that point
nor do either of its callers need that value assigned at
any point before they call ceph_osdc_start_request().
So move that assignment into ceph_osdc_start_request().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Mon, 25 Feb 2013 23:35:46 +0000 (17:35 -0600)]
libceph: distinguish page array and pagelist count
Use distinct fields for tracking the number of pages in a message's
page array and in a message's page list. Currently only one or the
other is used at a time, but that will be changing soon.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 16 Feb 2013 04:10:17 +0000 (22:10 -0600)]
libceph: don't pass request to calc_layout()
The only remaining reason to pass the osd request to calc_layout()
is to fill in its r_num_pages and r_page_alignment fields. Once it
fills those in, it doesn't do anything more with them.
We can therefore move those assignments into the caller, and get rid
of the "req" parameter entirely.
Note, however, that the only caller is ceph_osdc_new_request(),
and that immediately overwrites those fields with values based on
its passed-in page offset. So the assignment inside calc_layout()
was redundant anyway.
This resolves:
http://tracker.ceph.com/issues/4262
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 16 Feb 2013 04:10:17 +0000 (22:10 -0600)]
libceph: format target object name in caller
Move the formatting of the object name (oid) to use for an object
request into the caller of calc_layout(). This makes the "vino"
parameter no longer necessary, so get rid of it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 16 Feb 2013 04:10:17 +0000 (22:10 -0600)]
libceph: pass object number back to calc_layout() caller
Have calc_layout() pass the computed object number back to its
caller. (This is a small step to simplify review.)
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 16 Feb 2013 04:10:17 +0000 (22:10 -0600)]
libceph: make ceph_msg->bio_seg be unsigned
The bio_seg field is used by the ceph messenger in iterating through
a bio. It should never have a negative value, so make it an
unsigned. (I contemplated making it unsigned short to match the
struct bio definition, but it offered no benefit.)
Change variables used to hold bio_seg values to all be unsigned as
well. Change two variable names in init_bio_iter() to match the
convention used everywhere else.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Alex Elder [Sat, 16 Feb 2013 04:10:17 +0000 (22:10 -0600)]
libceph: fix a osd request memory leak
If an invalid layout is provided to ceph_osdc_new_request(), its
call to calc_layout() might return an error. At that point in the
function we've already allocated an osd request structure, so we
need to free it (drop a reference) in the event such an error
occurs.
The only other value calc_layout() will return is 0, so make that
explicit in the successful case.
This resolves:
http://tracker.ceph.com/issues/4240
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Linus Torvalds [Mon, 29 Apr 2013 00:36:01 +0000 (17:36 -0700)]
Linux 3.9
Linus Torvalds [Sat, 27 Apr 2013 20:58:36 +0000 (13:58 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fix from Olof Johansson:
"A late-arriving fix for musb on OMAP4, resolving an issue where the
musb IP won't be clocked and thus not functional. Small in scope,
most of the lines changed is a longish comment."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: OMAP4: hwmod data: make 'ocp2scp_usb_phy_phy_48m" as the main clock
Linus Torvalds [Sat, 27 Apr 2013 20:25:38 +0000 (13:25 -0700)]
vm: add no-mmu vm_iomap_memory() stub
I think we could just move the full vm_iomap_memory() function into
util.h or similar, but I didn't get any reply from anybody actually
using nommu even to this trivial patch, so I'm not going to touch it any
more than required.
Here's the fairly minimal stub to make the nommu case at least
potentially work. It doesn't seem like anybody cares, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 27 Apr 2013 17:08:09 +0000 (10:08 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
"This fix adds missing RCU read protection"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
events: Protect access via task_subsys_state_check()
Olof Johansson [Sat, 27 Apr 2013 00:35:13 +0000 (17:35 -0700)]
Merge tag 'omap-for-v3.9-rc6/fixes-signed' of git://git./linux/kernel/git/tmlind/linux-omap into fixes
From Tony Lindgren:
One MUSB regression fix that I forgot to send earlier. Without
this MUSB no longer works on omap4 based devices.
* tag 'omap-for-v3.9-rc6/fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP4: hwmod data: make 'ocp2scp_usb_phy_phy_48m" as the main clock
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Torvalds [Fri, 26 Apr 2013 15:17:07 +0000 (08:17 -0700)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Two driver fixes.
One avoids reading any file at a system with a cx25821 board
(fortunately, this is not a common device). The other one prevents
reading after a buffer with ISDB-T devices based on mb86a20s."
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] cx25821: do not expose broken video output streams
[media] mb86a20s: Fix estimate_rate setting
Linus Torvalds [Fri, 26 Apr 2013 15:05:01 +0000 (08:05 -0700)]
Merge branch 'fixes-3.9-late' of git://git./linux/kernel/git/deller/parisc-linux
Pull late parisc fixes from Helge Deller:
"I know it's *very* late in the 3.9 release cycle, but since there
aren't that many people testing the parisc linux kernel, a few (for
our port) critical issues just showed up a few days back for the first
time.
What's in it?
- add missing __ucmpdi2 symbol, which is required for btrfs on 32bit
kernel.
- change kunmap() macro to static inline function. This fixes a
debian/gcc-4.4 build error.
- add locking when doing PTE updates. This fixes random userspace
crashes.
- disable (optional) -mlong-calls compiler option for modules, else
modules can't be loaded at runtime.
- a smart patch by Will Deacon which fixes 64bit put_user() warnings
on 32bit kernel."
* 'fixes-3.9-late' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: use spin_lock_irqsave/spin_unlock_irqrestore for PTE updates
parisc: disable -mlong-calls compiler option for kernel modules
parisc: uaccess: fix compiler warnings caused by __put_user casting
parisc: Change kunmap macro to static inline function
parisc: Provide __ucmpdi2 to resolve undefined references in 32 bit builds.
Matt Fleming [Fri, 26 Apr 2013 09:10:55 +0000 (10:10 +0100)]
efivars: only check for duplicates on the registered list
variable_is_present() accesses '__efivars' directly, but when called via
gsmi_init() Michel reports observing the following crash,
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: variable_is_present+0x55/0x170
Call Trace:
register_efivars+0x106/0x370
gsmi_init+0x2ad/0x3da
do_one_initcall+0x3f/0x170
The reason for the crash is that '__efivars' hasn't been initialised nor
has it been registered with register_efivars() by the time the google
EFI SMI driver runs. The gsmi code uses its own struct efivars, and
therefore, a different variable list. Fix the above crash by passing
the registered struct efivars to variable_is_present(), so that we
traverse the correct list.
Reported-by: Michel Lespinasse <walken@google.com>
Tested-by: Michel Lespinasse <walken@google.com>
Cc: Mike Waychison <mikew@google.com>
Cc: Matthew Garrett <matthew.garrett@nebula.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Fri, 26 Apr 2013 11:48:53 +0000 (13:48 +0200)]
TTY: fix atime/mtime regression
In commit
b0de59b5733d ("TTY: do not update atime/mtime on read/write")
we removed timestamps from tty inodes to fix a security issue and waited
if something breaks. Well, 'w', the utility to find out logged users
and their inactivity time broke. It shows that users are inactive since
the time they logged in.
To revert to the old behaviour while still preventing attackers to
guess the password length, we update the timestamps in one-minute
intervals by this patch.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhao Hongjiang [Fri, 26 Apr 2013 03:03:53 +0000 (11:03 +0800)]
aio: fix possible invalid memory access when DEBUG is enabled
dprintk() shouldn't access @ring after it's unmapped.
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
H. Peter Anvin [Thu, 25 Apr 2013 21:00:22 +0000 (14:00 -0700)]
Merge tag 'efi-urgent' into x86/urgent
* The EFI variable anti-bricking algorithm merged in -rc8 broke booting
on some Apple machines because they implement EFI spec 1.10, which
doesn't provide a QueryVariableInfo() runtime function and the logic
used to check for the existence of that function was insufficient.
Fix from Josh Boyer.
* The anti-bricking algorithm also introduced a compiler warning on
32-bit. Fix from Borislav Petkov.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
John David Anglin [Tue, 23 Apr 2013 20:42:07 +0000 (22:42 +0200)]
parisc: use spin_lock_irqsave/spin_unlock_irqrestore for PTE updates
User applications running on SMP kernels have long suffered from instability
and random segmentation faults. This patch improves the situation although
there is more work to be done.
One of the problems is the various routines in pgtable.h that update page table
entries use different locking mechanisms, or no lock at all (set_pte_at). This
change modifies the routines to all use the same lock pa_dbit_lock. This lock
is used for dirty bit updates in the interruption code. The patch also purges
the TLB entries associated with the PTE to ensure that inconsistent values are
not used after the page table entry is updated. The UP and SMP code are now
identical.
The change also includes a minor update to the purge_tlb_entries function in
cache.c to improve its efficiency.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Tue, 23 Apr 2013 19:29:03 +0000 (21:29 +0200)]
parisc: disable -mlong-calls compiler option for kernel modules
CONFIG_MLONGCALLS was introduced in commit
ec758f98328da3eb933a25dc7a2eed01ef44d849 to overcome linker issues when linking
huge linux kernels, e.g. with many modules linked in.
But in the kernel module loader there is no support yet for the new relocation
types, which is why modules built with -mlong-calls can't be loaded.
Furthermore, for modules long calls are not really necessary, since we already
use stub sections which resolve long distance calls.
So, let's just disable this compiler option when compiling kernel modules.
Signed-off-by: Helge Deller <deller@gmx.de>
Will Deacon [Mon, 22 Apr 2013 12:53:43 +0000 (12:53 +0000)]
parisc: uaccess: fix compiler warnings caused by __put_user casting
When targetting 32-bit processors, __put_user emits a pair of stw
instructions for the 8-byte case. If the type of __val is a pointer, the
marshalling code casts it to the wider integer type of u64, resulting
in the following compiler warnings:
kernel/signal.c: In function 'copy_siginfo_to_user':
kernel/signal.c:2752:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
kernel/signal.c:2752:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
[...]
This patch fixes the warnings by removing the marshalling code and using
the correct output modifiers in the __put_{user,kernel}_asm64 macros
so that GCC will allocate the right registers without the need to
extract the two words explicitly.
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Helge Deller <deller@gmx.de>
John David Anglin [Tue, 23 Apr 2013 00:23:50 +0000 (00:23 +0000)]
parisc: Change kunmap macro to static inline function
Change kunmap macro to static inline function to fix build error
compiling drivers/base/dma-buf.c.
Without the change, the following error can occur:
CC drivers/base/dma-buf.o
drivers/base/dma-buf.c: In function 'dma_buf_kunmap':
drivers/base/dma-buf.c:427:46:
error: macro "kunmap" passed 3 arguments, but takes just 1
I believe parisc is the only arch to implement kunmap using a macro.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
John David Anglin [Sat, 20 Apr 2013 19:41:06 +0000 (19:41 +0000)]
parisc: Provide __ucmpdi2 to resolve undefined references in 32 bit builds.
The Debian experimental linux source package (3.8.5-1) build fails
with the following errors:
...
MODPOST 2016 modules
ERROR: "__ucmpdi2" [fs/btrfs/btrfs.ko] undefined!
ERROR: "__ucmpdi2" [drivers/md/dm-verity.ko] undefined!
The attached patch resolves this problem. It is based on the s390
implementation of ucmpdi2.c.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Linus Torvalds [Thu, 25 Apr 2013 00:10:18 +0000 (17:10 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fix from David Miller:
"Brown paper bag fix for sparc64"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix missing put_cpu_var() in tlb_batch_add_one() when not batching.
Linus Torvalds [Thu, 25 Apr 2013 00:01:58 +0000 (17:01 -0700)]
Merge tag 'gpio-v3.9-lastminute' of git://git./linux/kernel/git/linusw/linux-gpio
Pull gpi fix from Linus Walleij:
"This is a last minute revert for the GPIO tree, as Mike Dunn noticed
breakage on some older PXA machines due to moving PXA GPIO initcalls
to the module_init initlevel"
* tag 'gpio-v3.9-lastminute' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
Revert "gpio: pxa: set initcall level to module init"
David S. Miller [Wed, 24 Apr 2013 23:52:18 +0000 (16:52 -0700)]
sparc64: Fix missing put_cpu_var() in tlb_batch_add_one() when not batching.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Wed, 24 Apr 2013 19:41:20 +0000 (21:41 +0200)]
Revert "gpio: pxa: set initcall level to module init"
This reverts commit
6c7e660a27da7494c670bfba21cfeba30457656c.
The commit causes breakage on several older PXA machines.
Reported-by: Mike Dunn <mikedunn@newsguy.com>
Acked-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Josh Boyer [Wed, 24 Apr 2013 15:16:52 +0000 (11:16 -0400)]
efi: Check EFI revision in setup_efi_vars
We need to check the runtime sys_table for the EFI version the firmware
specifies instead of just checking for a NULL QueryVariableInfo. Older
implementations of EFI don't have QueryVariableInfo but the runtime is
a smaller structure, so the pointer to it may be pointing off into garbage.
This is apparently the case with several Apple firmwares that support EFI
1.10, and the current check causes them to no longer boot. Fix based on
a suggestion from Matthew Garrett.
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Borislav Petkov [Wed, 24 Apr 2013 10:09:14 +0000 (12:09 +0200)]
x86, efi: Fix a build warning
Fix this:
arch/x86/boot/compressed/eboot.c: In function ‘setup_efi_vars’:
arch/x86/boot/compressed/eboot.c:269:2: warning: passing argument 1 of ‘efi_call_phys’ makes pointer from integer without a cast [enabled by default]
In file included from arch/x86/boot/compressed/eboot.c:12:0:
/w/kernel/linux/arch/x86/include/asm/efi.h:8:33: note: expected ‘void *’ but argument is of type ‘long unsigned int’
after
cc5a080c5d40 ("efi: Pass boot services variable info to runtime
code").
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Linus Torvalds [Mon, 22 Apr 2013 22:00:59 +0000 (15:00 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fix from Ralf Baechle:
"Revert the change of the definition of PAGE_MASK which was prettier
but broke a few relativly rare platforms"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
Revert "MIPS: page.h: Provide more readable definition for PAGE_MASK."
Ralf Baechle [Mon, 22 Apr 2013 15:57:54 +0000 (17:57 +0200)]
Revert "MIPS: page.h: Provide more readable definition for PAGE_MASK."
This reverts commit
c17a6554782ad531f4713b33fd6339ba67ef6391.
Manuel Lauss writes:
lmo commit
c17a6554 (MIPS: page.h: Provide more readable definition for
PAGE_MASK) apparently breaks ioremap of 36-bit addresses on my Alchemy
systems (PCI and PCMCIA) The reason is that in arch/mips/mm/ioremap.c
line 157 (phys_addr &= PAGE_MASK) bits 32-35 are cut off. Seems the
new PAGE_MASK is explicitly 32bit, or one could make it signed instead
of unsigned long.
Rusty Russell [Mon, 22 Apr 2013 09:21:50 +0000 (18:51 +0930)]
kernel/hz.bc: ignore.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 22 Apr 2013 14:07:46 +0000 (07:07 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a kernel memory leak in the algif interface"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algif - suppress sending source address information in recvmsg
Linus Torvalds [Sun, 21 Apr 2013 21:38:45 +0000 (14:38 -0700)]
Linux 3.9-rc8
Linus Torvalds [Sun, 21 Apr 2013 17:25:42 +0000 (10:25 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix offcore_rsp valid mask for SNB/IVB
perf: Treat attr.config as u64 in perf_swevent_init()
Linus Torvalds [Sun, 21 Apr 2013 17:16:56 +0000 (10:16 -0700)]
Merge branch 'vm_ioremap_memory-examples'
I'm going to do an -rc8, so I'm just going to do this rather than delay
it any further. They are arguably stable material anyway.
* vm_ioremap_memory-examples:
mtdchar: remove no-longer-used vma helpers
vm: convert snd_pcm_lib_mmap_iomem() to vm_iomap_memory() helper
vm: convert fb_mmap to vm_iomap_memory() helper
vm: convert mtdchar mmap to vm_iomap_memory() helper
vm: convert HPET mmap to vm_iomap_memory() helper
Paul E. McKenney [Fri, 19 Apr 2013 19:01:24 +0000 (12:01 -0700)]
events: Protect access via task_subsys_state_check()
The following RCU splat indicates lack of RCU protection:
[ 953.267649] ===============================
[ 953.267652] [ INFO: suspicious RCU usage. ]
[ 953.267657] 3.9.0-0.rc6.git2.4.fc19.ppc64p7 #1 Not tainted
[ 953.267661] -------------------------------
[ 953.267664] include/linux/cgroup.h:534 suspicious rcu_dereference_check() usage!
[ 953.267669]
[ 953.267669] other info that might help us debug this:
[ 953.267669]
[ 953.267675]
[ 953.267675] rcu_scheduler_active = 1, debug_locks = 0
[ 953.267680] 1 lock held by glxgears/1289:
[ 953.267683] #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<
c00000000027f884>] .prepare_bprm_creds+0x34/0xa0
[ 953.267700]
[ 953.267700] stack backtrace:
[ 953.267704] Call Trace:
[ 953.267709] [
c0000001f0d1b6e0] [
c000000000016e30] .show_stack+0x130/0x200 (unreliable)
[ 953.267717] [
c0000001f0d1b7b0] [
c0000000001267f8] .lockdep_rcu_suspicious+0x138/0x180
[ 953.267724] [
c0000001f0d1b840] [
c0000000001d43a4] .perf_event_comm+0x4c4/0x690
[ 953.267731] [
c0000001f0d1b950] [
c00000000027f6e4] .set_task_comm+0x84/0x1f0
[ 953.267737] [
c0000001f0d1b9f0] [
c000000000280414] .setup_new_exec+0x94/0x220
[ 953.267744] [
c0000001f0d1ba70] [
c0000000002f665c] .load_elf_binary+0x58c/0x19b0
...
This commit therefore adds the required RCU read-side critical
section to perf_event_comm().
Reported-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/20130419190124.GA8638@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Gustavo Luiz Duarte <gusld@br.ibm.com>
Linus Torvalds [Sun, 21 Apr 2013 01:40:36 +0000 (18:40 -0700)]
Merge branch 'x86-kdump-for-linus' of git://git./linux/kernel/git/tip/tip
Pull kdump fixes from Peter Anvin:
"The kexec/kdump people have found several problems with the support
for loading over 4 GiB that was introduced in this merge cycle. This
is partly due to a number of design problems inherent in the way the
various pieces of kdump fit together (it is pretty horrifically manual
in many places.)
After a *lot* of iterations this is the patchset that was agreed upon,
but of course it is now very late in the cycle. However, because it
changes both the syntax and semantics of the crashkernel option, it
would be desirable to avoid a stable release with the broken
interfaces."
I'm not happy with the timing, since originally the plan was to release
the final 3.9 tomorrow. But apparently I'm doing an -rc8 instead...
* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kexec: use Crash kernel for Crash kernel low
x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
x86, kdump: Retore crashkernel= to allocate under 896M
x86, kdump: Set crashkernel_low automatically
Linus Torvalds [Sun, 21 Apr 2013 01:38:48 +0000 (18:38 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"Three groups of fixes:
1. Make sure we don't execute the early microcode patching if family
< 6, since it would touch MSRs which don't exist on those
families, causing crashes.
2. The Xen partial emulation of HyperV can be dealt with more
gracefully than just disabling the driver.
3. More EFI variable space magic. In particular, variables hidden
from runtime code need to be taken into account too."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode: Verify the family before dispatching microcode patching
x86, hyperv: Handle Xen emulation of Hyper-V more gracefully
x86,efi: Implement efi_no_storage_paranoia parameter
efi: Export efi_query_variable_store() for efivars.ko
x86/Kconfig: Make EFI select UCS2_STRING
efi: Distinguish between "remaining space" and actually used space
efi: Pass boot services variable info to runtime code
Move utf16 functions to kernel core and rename
x86,efi: Check max_size only if it is non-zero.
x86, efivars: firmware bug workarounds should be in platform code
Linus Torvalds [Sun, 21 Apr 2013 01:38:06 +0000 (18:38 -0700)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
"A set of fixes from various people - Will Deacon gets a prize for
removing code this time around. The biggest fix in this lot is
sorting out the ARM740T mess. The rest are relatively small fixes."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7699/1: sched_clock: Add more notrace to prevent recursion
ARM: 7698/1: perf: fix group validation when using enable_on_exec
ARM: 7697/1: hw_breakpoint: do not use __cpuinitdata for dbg_cpu_pm_nb
ARM: 7696/1: Fix kexec by setting outer_cache.inv_all for Feroceon
ARM: 7694/1: ARM, TCM: initialize TCM in paging_init(), instead of setup_arch()
ARM: 7692/1: iop3xx: move IOP3XX_PERIPHERAL_VIRT_BASE
ARM: modules: don't export cpu_set_pte_ext when !MMU
ARM: mm: remove broken condition check for v4 flushing
ARM: mm: fix numerous hideous errors in proc-arm740.S
ARM: cache: remove ARMv3 support code
ARM: tlbflush: remove ARMv3 support
Linus Torvalds [Sun, 21 Apr 2013 01:23:08 +0000 (18:23 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
1) Fix race in sparc64 TLB shootdowns, we have to synchronize with the
sibling cpus completing if we are passing them a reference via
pointer to a data structure.
2) Fix cleaning of bitmaps in sparc32, from Akinobu Mita.
3) Fix various sparc header mistakes, some of which resulted in
userland build breakage. From Sam Ravnborg.
4) Kill ghost declarations and defines missed when several bits of code
got deleted recently.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix race in TLB batch processing.
sparc: use asm-generic version of types.h
bbc_i2c: fix section mismatch warning
sparc: use generic headers
sparc:cleanup unused code in smp_32.h
sparc/iommu: fix typo s/265KB/256KB/
sparc/srmmu: clear trailing edge of bitmap properly
sparc:remove unused declaration smp_boot_cpus()
Linus Torvalds [Sun, 21 Apr 2013 01:21:05 +0000 (18:21 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking updates from David Miller:
1) ax88796 does 64-bit divides which causes link errors on ARM, fix
from Arnd Bergmann.
2) Once an improper offload setting is detected on an SKB we don't rate
limit the log message so we can very easily live lock. From Ben
Greear.
3) Openvswitch cannot report vport configuration changes reliably
because it didn't preallocate the netlink notification message
before changing state. From Jesse Gross.
4) The effective UID/GID SCM credentials fix, from Linus.
5) When a user explicitly asks for wireless authentication, cfg80211
isn't told about the AP detachment leaving inconsistent state. Fix
from Johannes Berg.
6) Fix self-MAC checks in batman-adv on multi-mesh nodes, from Antonio
Quartulli.
7) Revert build_skb() change sin IGB driver, can result in memory
corruption. From Alexander Duyck.
8) Fix setting VLANs on virtual functions in IXGBE, from Greg Rose.
9) Fix TSO races in qlcnic driver, from Sritej Velaga.
10) In bnx2x the kernel driver and UNDI firmware can try to program the
chip at the same time, resulting in corruption. Add proper
synchronization. From Dmitry Kravkov.
11) Fix corruption of status block in firmware ram in bxn2x, from Ariel
Elior.
12) Fix load balancing hash regression of bonding driver in forwarding
configurations, from Eric Dumazet.
13) Fix TS ECR regression in TCP by calling tcp_replace_ts_recent() in
all the right spots, from Eric Dumazet.
14) Fix several bonding bugs having to do with address manintainence,
including not removing address when configuration operations
encounter errors, missed locking on the address lists, missing
refcounting on VLAN objects, etc. All from Nikolay Aleksandrov.
15) Add workarounds for firmware bugs in LTE qmi_wwan devices, wherein
the devices fail to add a proper ethernet header while on LTE
networks but otherwise properly do so on 2G and 3G ones. From Bjørn
Mork.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
net: fix incorrect credentials passing
net: rate-limit warn-bad-offload splats.
net: ax88796: avoid 64 bit arithmetic
qlge: Update version to 1.00.00.32.
qlge: Fix ethtool autoneg advertising.
qlge: Fix receive path to drop error frames
net: qmi_wwan: prevent duplicate mac address on link (firmware bug workaround)
net: qmi_wwan: fixup destination address (firmware bug workaround)
net: qmi_wwan: fixup missing ethernet header (firmware bug workaround)
bonding: in bond_mc_swap() bond's mc addr list is walked without lock
bonding: disable netpoll on enslave failure
bonding: primary_slave & curr_active_slave are not cleaned on enslave failure
bonding: vlans don't get deleted on enslave failure
bonding: mc addresses don't get deleted on enslave failure
pkt_sched: fix error return code in fw_change_attrs()
irda: small read past the end of array in debug code
tcp: call tcp_replace_ts_recent() from tcp_ack()
netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too
netfilter: ipset: bitmap:ip,mac: fix listing with timeout
bonding: fix l23 and l34 load balancing in forwarding path
...
Linus Torvalds [Fri, 19 Apr 2013 15:32:32 +0000 (15:32 +0000)]
net: fix incorrect credentials passing
Commit
257b5358b32f ("scm: Capture the full credentials of the scm
sender") changed the credentials passing code to pass in the effective
uid/gid instead of the real uid/gid.
Obviously this doesn't matter most of the time (since normally they are
the same), but it results in differences for suid binaries when the wrong
uid/gid ends up being used.
This just undoes that (presumably unintentional) part of the commit.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
H. Peter Anvin [Sat, 20 Apr 2013 00:09:03 +0000 (17:09 -0700)]
Merge remote-tracking branch 'efi/urgent' into x86/urgent
Matt Fleming (1):
x86, efivars: firmware bug workarounds should be in platform
code
Matthew Garrett (3):
Move utf16 functions to kernel core and rename
efi: Pass boot services variable info to runtime code
efi: Distinguish between "remaining space" and actually used
space
Richard Weinberger (2):
x86,efi: Check max_size only if it is non-zero.
x86,efi: Implement efi_no_storage_paranoia parameter
Sergey Vlasov (2):
x86/Kconfig: Make EFI select UCS2_STRING
efi: Export efi_query_variable_store() for efivars.ko
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
H. Peter Anvin [Fri, 19 Apr 2013 23:36:03 +0000 (16:36 -0700)]
x86, microcode: Verify the family before dispatching microcode patching
For each CPU vendor that implements CPU microcode patching, there will
be a minimum family for which this is implemented. Verify this
minimum level of support.
This can be done in the dispatch function or early in the application
functions. Doing the latter turned out to be somewhat awkward because
of the ineviable split between the BSP and the AP paths, and rather
than pushing deep into the application functions, do this in
the dispatch function.
Reported-by: "Bryan O'Donoghue" <bryan.odonoghue.lkml@nexus-software.ie>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1366392183-4149-1-git-send-email-bryan.odonoghue.lkml@nexus-software.ie
Ben Greear [Fri, 19 Apr 2013 10:45:52 +0000 (10:45 +0000)]
net: rate-limit warn-bad-offload splats.
If one does do something unfortunate and allow a
bad offload bug into the kernel, this the
skb_warn_bad_offload can effectively live-lock the
system, filling the logs with the same error over
and over.
Add rate limitation to this so that box remains otherwise
functional in this case.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 19 Apr 2013 08:47:26 +0000 (08:47 +0000)]
net: ax88796: avoid 64 bit arithmetic
When building ax88796 on an ARM platform with 64-bit resource_size_t,
we currently get
drivers/net/ethernet/8390/ax88796.c:875: undefined reference to `__aeabi_uldivmod'
because we do a division on the length of the MMIO resource.
Since we know that this resource is very short, using an
"unsigned long" instead of "resource_size_t" is entirely
sufficient, and avoids this link-time error.
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Thu, 18 Apr 2013 19:49:54 +0000 (19:49 +0000)]
qlge: Update version to 1.00.00.32.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Thu, 18 Apr 2013 19:49:53 +0000 (19:49 +0000)]
qlge: Fix ethtool autoneg advertising.
Autoneg is supported on specific port types only. Fix the driver to advertise
autoneg based on the port type.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sritej Velaga [Thu, 18 Apr 2013 19:49:52 +0000 (19:49 +0000)]
qlge: Fix receive path to drop error frames
o Fix the driver to drop error frames in the receive path
o Update error counter which was not getting incremented
Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Apr 2013 21:51:26 +0000 (17:51 -0400)]
Merge branch 'qmi_wwan'
Bjørn Mork says:
====================
This series adds workarounds for 3 different firmware bugs, each
preventing the affected devices from working at all. I therefore
humbly request that these fixes go to stable-3.8 (if still
maintained) and 3.9 (either via net if still possible, or via
stable if not).
All 3 workarounds are applied to all devices supported by the driver.
Adding quirks for specific devices was considered as an alternative,
but was rejected because we have too little information about the
exact distribution of the buggy firmwares. All we know is that the
same bug shows up in devices from at least 3 different, and presumably
independent, vendors.
The workarounds have instead been designed to automatically apply
when necessary, and to have as little impact as possible on unaffected
devices. The series has been tested on a number of devices both with
and without these bugs.
The series should apply cleanly to net/master, net-next/master and
stable/linux-3.8.y
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 18 Apr 2013 12:57:11 +0000 (12:57 +0000)]
net: qmi_wwan: prevent duplicate mac address on link (firmware bug workaround)
We normally trust and use the CDC functional descriptors provided by a
number of devices. But some of these will erroneously list the address
reserved for the device end of the link. Attempting to use this on
both the device and host side will naturally not work.
Work around this bug by ignoring the functional descriptor and assign a
random address instead in this case.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 18 Apr 2013 12:57:10 +0000 (12:57 +0000)]
net: qmi_wwan: fixup destination address (firmware bug workaround)
Received packets are sometimes addressed to 00:a0:c6:00:00:00
instead of the address the device firmware should have learned
from the host:
321.224126 77.16.85.204 -> 148.122.171.134 ICMP 98 Echo (ping) request id=0x4025, seq=64/16384, ttl=64
0000 82 c0 82 c9 f1 67 82 c0 82 c9 f1 67 08 00 45 00 .....g.....g..E.
0010 00 54 00 00 40 00 40 01 57 cc 4d 10 55 cc 94 7a .T..@.@.W.M.U..z
0020 ab 86 08 00 62 fc 40 25 00 40 b2 bc 6e 51 00 00 ....b.@%.@..nQ..
0030 00 00 6b bd 09 00 00 00 00 00 10 11 12 13 14 15 ..k.............
0040 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 .......... !"#$%
0050 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 &'()*+,-./012345
0060 36 37 67
321.240607 148.122.171.134 -> 77.16.85.204 ICMP 98 Echo (ping) reply id=0x4025, seq=64/16384, ttl=55
0000 00 a0 c6 00 00 00 02 50 f3 00 00 00 08 00 45 00 .......P......E.
0010 00 54 00 56 00 00 37 01 a0 76 94 7a ab 86 4d 10 .T.V..7..v.z..M.
0020 55 cc 00 00 6a fc 40 25 00 40 b2 bc 6e 51 00 00 U...j.@%.@..nQ..
0030 00 00 6b bd 09 00 00 00 00 00 10 11 12 13 14 15 ..k.............
0040 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 .......... !"#$%
0050 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 &'()*+,-./012345
0060 36 37 67
The bogus address is always the same, and matches the address
suggested by many devices as a default address. It is likely a
hardcoded firmware default.
The circumstances where this bug has been observed indicates that
the trigger is related to timing or some other factor the host
cannot control. Repeating the exact same configuration sequence
that caused it to trigger once, will not necessarily cause it to
trigger the next time. Reproducing the bug is therefore difficult.
This opens up a possibility that the bug is more common than we can
confirm, because affected devices often will work properly again
after a reset. A procedure most users are likely to try out before
reporting a bug.
Unconditionally rewriting the destination address if the first digit
of the received packet is 0, is considered an acceptable compromise
since we already have to inspect this digit. The simplification will
cause unnecessary rewrites if the real address starts with 0, but this
is still better than adding additional tests for this particular case.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 18 Apr 2013 12:57:09 +0000 (12:57 +0000)]
net: qmi_wwan: fixup missing ethernet header (firmware bug workaround)
A number of LTE devices from different vendors all suffer from the
same firmware bug: Most of the packets received from the device while
it is attached to a LTE network will not have an ethernet header. The
devices work as expected when attached to 2G or 3G networks, sending
an ethernet header with all packets.
This driver is not aware of which network the modem attached to, and
even if it were there are still some packet types which are always
received with the header intact.
All devices supported by this driver have severely limited
networking capabilities:
- can only transmit IPv4, IPv6 and possibly ARP
- can only support a single host hardware address at any time
- will only do point-to-point communcation with the host
Because of this, we are able to reliably identify any bogus raw IP
packets by simply looking at the 4 IP version bits. All we need to
do is to avoid 4 or 6 in the first digit of the mac address. This
workaround ensures this, and fix up the received packets as necessary.
Given the distribution of the bug, it is believed that the source is
the chipset vendor. The devices which are verified to be affected are:
Huawei E392u-12 (Qualcomm MDM9200)
Pantech UML290 (Qualcomm MDM9600)
Novatel USB551L (Qualcomm MDM9600)
Novatel E362 (Qualcomm MDM9600)
It is believed that the bug depend on firmware revision, which means
that possibly all devices based on the above mentioned chipset may be
affected if we consider all available firmware revisions.
The information about affected devices and versions is likely
incomplete. As the additional overhead for packets not needing this
fixup is very small, it is considered acceptable to apply the
workaround to all devices handled by this driver.
Reported-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Apr 2013 21:49:11 +0000 (17:49 -0400)]
Merge branch 'bonding'
Nikolay Aleksandrov says:
====================
This patch-set fixes mainly bugs on enslave failure and one occasion
of a needed locking. The patches are:
1. On enslave failure mc addresses are not flushed from the slave
2. On enslave failure vlans are not cleaned up from the slave
3. On enslave failure the bond's primary and curr_active_slave
are not cleaned up (which might result in use of freed memory)
4. On enslave failure netpoll is not disabled which might result in
a memory leak
5. In bond_mc_swap() the bond's mc addr list is walked without
netif_addr_lock, since it can be called without rtnl, add it
v2: patch 01 - fix log message and remove unnecessary code move
====================
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nikolay@redhat.com [Thu, 18 Apr 2013 07:33:38 +0000 (07:33 +0000)]
bonding: in bond_mc_swap() bond's mc addr list is walked without lock
Use netif_addr_lock_bh() to acquire the appropriate lock before walking.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nikolay@redhat.com [Thu, 18 Apr 2013 07:33:37 +0000 (07:33 +0000)]
bonding: disable netpoll on enslave failure
slave_disable_netpoll() is not called upon enslave failure which would
lead to a memory leak. Call slave_disable_netpoll() after err_detach as
that's the first error path after enabling netpoll on that slave.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nikolay@redhat.com [Thu, 18 Apr 2013 07:33:36 +0000 (07:33 +0000)]
bonding: primary_slave & curr_active_slave are not cleaned on enslave failure
On enslave failure primary_slave can point to new_slave which is to be
freed, and the same applies to curr_active_slave. So check if this is
the case and clean up properly after err_detach because that's the first
error code path after they're set.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nikolay@redhat.com [Thu, 18 Apr 2013 07:33:35 +0000 (07:33 +0000)]
bonding: vlans don't get deleted on enslave failure
The main problem is with vid refcount which only gets bumped up.
Delete the vlans after err_detach as that's the first error path
after the vlans are added.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nikolay@redhat.com [Thu, 18 Apr 2013 07:33:34 +0000 (07:33 +0000)]
bonding: mc addresses don't get deleted on enslave failure
Add bond_mc_list_flush() after err_detach as that's the first error path
after the addresses are added. The main issue is the mc addresses' refcount
which only gets bumped up.
v2: update log message and don't move code unnecessarily
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 17 Apr 2013 16:49:10 +0000 (16:49 +0000)]
pkt_sched: fix error return code in fw_change_attrs()
Fix to return -EINVAL when tb[TCA_FW_MASK] is set and head->mask != 0xFFFFFFFF
instead of 0 (ifdef CONFIG_NET_CLS_IND and tb[TCA_FW_INDEV]), as done elsewhere
in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 16 Apr 2013 21:10:38 +0000 (21:10 +0000)]
irda: small read past the end of array in debug code
The "reason" can come from skb->data[] and it hasn't been capped so it
can be from 0-255 instead of just 0-6. For example in irlmp_state_dtr()
the code does:
reason = skb->data[3];
...
irlmp_disconnect_indication(self, reason, skb);
Also LMREASON has a couple other values which don't have entries in the
irlmp_reasons[] array. And 0xff is a valid reason as well which means
"unknown".
So far as I can see we don't actually care about "reason" except for in
the debug code.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Apr 2013 21:26:26 +0000 (17:26 -0400)]
sparc64: Fix race in TLB batch processing.
As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.
So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.
Fix this by using generic infrastructure to synchonize on the
completion of the cross call.
This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().
We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a
region, we flush TLBs synchronously.
1) Get rid of xcall_flush_tlb_pending and per-cpu type
implementations.
2) Do TLB batch cross calls instead via:
smp_call_function_many()
tlb_pending_func()
__flush_tlb_pending()
3) Batch only in lazy mmu sequences:
a) Add 'active' member to struct tlb_batch
b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
c) Set 'active' in arch_enter_lazy_mmu_mode()
d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
e) Check 'active' in tlb_batch_add_one() and do a synchronous
flush if it's clear.
4) Add infrastructure for synchronous TLB page flushes.
a) Implement __flush_tlb_page and per-cpu variants, patch
as needed.
b) Likewise for xcall_flush_tlb_page.
c) Implement smp_flush_tlb_page() to invoke the cross-call.
d) Wire up global_flush_tlb_page() to the right routine based
upon CONFIG_SMP
5) It turns out that singleton batches are very common, 2 out of every
3 batch flushes have only a single entry in them.
The batch flush waiting is very expensive, both because of the poll
on sibling cpu completeion, as well as because passing the tlb batch
pointer to the sibling cpus invokes a shared memory dereference.
Therefore, in flush_tlb_pending(), if there is only one entry in
the batch perform a completely asynchronous global_flush_tlb_page()
instead.
Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Stephen Boyd [Thu, 18 Apr 2013 16:33:40 +0000 (17:33 +0100)]
ARM: 7699/1: sched_clock: Add more notrace to prevent recursion
cyc_to_sched_clock() is called by sched_clock() and cyc_to_ns()
is called by cyc_to_sched_clock(). I suspect that some compilers
inline both of these functions into sched_clock() and so we've
been getting away without having a notrace marking. It seems that
my compiler isn't inlining cyc_to_sched_clock() though, so I'm
hitting a recursion bug when I enable the function graph tracer,
causing my system to crash. Marking these functions notrace fixes
it. Technically cyc_to_ns() doesn't need the notrace because it's
already marked inline, but let's just add it so that if we ever
remove inline from that function it doesn't blow up.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Fri, 19 Apr 2013 18:38:36 +0000 (11:38 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Only one remaining fix for arm-soc platforms at this time, a small
bugfix for cpu hotplug on highbank platforms that has become much
easier to hit as of late.
Details in the patch description, but it's small and well-contained
and definitely impacts users of the platform, so 3.9 seems
appropriate."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: highbank: fix cache flush ordering for cpu hotplug