Christoph Hellwig [Tue, 23 Sep 2014 10:38:48 +0000 (12:38 +0200)]
nfsd: implement pNFS layout recalls
Add support to issue layout recalls to clients. For now we only support
full-file recalls to get a simple and stable implementation. This allows
to embedd a nfsd4_callback structure in the layout_state and thus avoid
any memory allocations under spinlocks during a recall. For normal
use cases that do not intent to share a single file between multiple
clients this implementation is fully sufficient.
To ensure layouts are recalled on local filesystem access each layout
state registers a new FL_LAYOUT lease with the kernel file locking code,
which filesystems that support pNFS exports that require recalls need
to break on conflicting access patterns.
The XDR code is based on the old pNFS server implementation by
Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman,
Marc Eshel, Mike Sager and Ricardo Labiaga.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 5 May 2014 11:11:59 +0000 (13:11 +0200)]
nfsd: implement pNFS operations
Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.
Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure. It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid. The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.
The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused. As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place. To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export. Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.
Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Sun, 17 Aug 2014 12:40:00 +0000 (07:40 -0500)]
nfsd: make find_any_file available outside nfs4state.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Thu, 14 Aug 2014 06:50:16 +0000 (08:50 +0200)]
nfsd: make find/get/put file available outside nfs4state.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Thu, 14 Aug 2014 06:44:57 +0000 (08:44 +0200)]
nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Wed, 13 Aug 2014 18:56:13 +0000 (20:56 +0200)]
nfsd: add fh_fsid_match helper
Add a helper to check that the fsid parts of two file handles match.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Thu, 14 Aug 2014 06:41:48 +0000 (08:41 +0200)]
nfsd: move nfsd_fh_match to nfsfh.h
The pnfs code will need it too. Also remove the nfsd_ prefix to match the
other filehandle helpers in that file.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Wed, 21 Jan 2015 18:17:03 +0000 (19:17 +0100)]
fs: add FL_LAYOUT lease type
This (ab-)uses the file locking code to allow filesystems to recall
outstanding pNFS layouts on a file. This new lease type is similar but
not quite the same as FL_DELEG. A FL_LAYOUT lease can always be granted,
an a per-filesystem lock (XFS iolock for the initial implementation)
ensures not FL_LAYOUT leases granted when we would need to recall them.
Also included are changes that allow multiple outstanding read
leases of different types on the same file as long as they have a
differnt owner. This wasn't a problem until now as nfsd never set
FL_LEASE leases, and no one else used FL_DELEG leases, but given that
nfsd will also issues FL_LAYOUT leases we will have to handle it now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Wed, 21 Jan 2015 18:14:02 +0000 (19:14 +0100)]
fs: track fl_owner for leases
Just like for other lock types we should allow different owners to have
a read lease on a file. Currently this can't happen, but with the addition
of pNFS layout leases we'll need this feature.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Sat, 16 Aug 2014 11:31:51 +0000 (13:31 +0200)]
nfs: add LAYOUT_TYPE_MAX enum value
This gives us a nice upper bound for later use in nfѕd.
Signed-off-by: Christoph Hellwig <hch@lst.de>
J. Bruce Fields [Mon, 2 Feb 2015 16:29:29 +0000 (11:29 -0500)]
Merge branch 'locks-3.20' of git://git.samba.org/jlayton/linux into for-3.20
Christoph's block pnfs patches have some minor dependencies on these
lock patches.
Christoph Hellwig [Thu, 22 Jan 2015 11:09:50 +0000 (12:09 +0100)]
nfsd: factor out a helper to decode nfstime4 values
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Thu, 22 Jan 2015 13:19:32 +0000 (08:19 -0500)]
sunrpc/lockd: fix references to the BKL
The BKL is completely out of the picture in the lockd and sunrpc code
these days. Update the antiquated comments that refer to it.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 20 Jan 2015 16:51:26 +0000 (11:51 -0500)]
nfsd: fix year-2038 nfs4 state problem
Someone with a weird time_t happened to notice this, it shouldn't really
manifest till 2038. It may not be our ownly year-2038 problem.
Reported-by: Aaron Pace <Aaron.Pace@alcatel-lucent.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Thu, 22 Jan 2015 01:44:01 +0000 (20:44 -0500)]
locks: update comments that refer to inode->i_flock
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Jeff Layton [Fri, 16 Jan 2015 20:05:58 +0000 (15:05 -0500)]
locks: consolidate NULL i_flctx checks in locks_remove_file
We have each of the locks_remove_* variants doing this individually.
Have the caller do it instead, and have locks_remove_flock and
locks_remove_lease just assume that it's a valid pointer.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Jeff Layton [Fri, 16 Jan 2015 20:05:57 +0000 (15:05 -0500)]
locks: keep a count of locks on the flctx lists
This makes things a bit more efficient in the cifs and ceph lock
pushing code.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Jeff Layton [Fri, 16 Jan 2015 20:05:57 +0000 (15:05 -0500)]
locks: clean up the lm_change prototype
Now that we use standard list_heads for tracking leases, we can have
lm_change take a pointer to the lease to be modified instead of a
double pointer.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Jeff Layton [Fri, 16 Jan 2015 20:05:57 +0000 (15:05 -0500)]
locks: add a dedicated spinlock to protect i_flctx lists
We can now add a dedicated spinlock without expanding struct inode.
Change to using that to protect the various i_flctx lists.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Jeff Layton [Fri, 16 Jan 2015 20:05:56 +0000 (15:05 -0500)]
locks: remove i_flock field from struct inode
Nothing uses it anymore. Also add a forward declaration for struct
file_lock to silence some compiler warnings that the removal triggers.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Jeff Layton [Fri, 16 Jan 2015 20:05:55 +0000 (15:05 -0500)]
locks: convert lease handling to file_lock_context
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Jeff Layton [Fri, 16 Jan 2015 20:05:55 +0000 (15:05 -0500)]
locks: convert posix locks to file_lock_context
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Jeff Layton [Fri, 16 Jan 2015 20:05:55 +0000 (15:05 -0500)]
locks: move flock locks to file_lock_context
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Jeff Layton [Fri, 16 Jan 2015 20:05:55 +0000 (15:05 -0500)]
ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks
There is only a single call site for each of these functions, and the
caller takes the i_lock prior to calling them and drops it just
afterward. Move the spinlocking into the functions instead.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Jeff Layton [Fri, 16 Jan 2015 20:05:54 +0000 (15:05 -0500)]
locks: add a new struct file_locking_context pointer to struct inode
The current scheme of using the i_flock list is really difficult to
manage. There is also a legitimate desire for a per-inode spinlock to
manage these lists that isn't the i_lock.
Start conversion to a new scheme to eventually replace the old i_flock
list with a new "file_lock_context" object.
We start by adding a new i_flctx to struct inode. For now, it lives in
parallel with i_flock list, but will eventually replace it. The idea is
to allocate a structure to sit in that pointer and act as a locus for
all things file locking.
We allocate a file_lock_context for an inode when the first lock is
added to it, and it's only freed when the inode is freed. We use the
i_lock to protect the assignment, but afterward it should mostly be
accessed locklessly.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Jeff Layton [Fri, 16 Jan 2015 20:05:54 +0000 (15:05 -0500)]
locks: have locks_release_file use flock_lock_file to release generic flock locks
...instead of open-coding it and removing flock locks directly. This
helps consolidate the flock lock removal logic into a single spot.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Jeff Layton [Fri, 16 Jan 2015 20:05:54 +0000 (15:05 -0500)]
locks: add new struct list_head to struct file_lock
...that we can use to queue file_locks to per-ctx list_heads. Go ahead
and convert locks_delete_lock and locks_dispose_list to use it instead
of the fl_block list.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Linus Torvalds [Fri, 16 Jan 2015 01:58:16 +0000 (14:58 +1300)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"This fixes a regression in the latest fuse update plus a fix for a
rather theoretical memory ordering issue"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: add memory barrier to INIT
fuse: fix LOOKUP vs INIT compat handling
Linus Torvalds [Fri, 16 Jan 2015 01:55:47 +0000 (14:55 +1300)]
Merge tag 'fbdev-fixes-3.19' of git://git./linux/kernel/git/tomba/linux
Pull fbdev fixes from Tomi Valkeinen:
- broadsheetfb: fix memory leak
- simplefb: fix build failure on sparc
* tag 'fbdev-fixes-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
fbdev/broadsheetfb: fix memory leak
simplefb: Fix build failure on Sparc
Linus Torvalds [Fri, 16 Jan 2015 01:53:07 +0000 (14:53 +1300)]
Merge tag 'mmc-v3.19-4' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC bugfix from Ulf Hansson:
"Fix sdhci regulator regression for Qualcomm and Nvidia boards"
* tag 'mmc-v3.19-4' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: sdhci: Set SDHCI_POWER_ON with external vmmc
Linus Torvalds [Fri, 16 Jan 2015 01:29:21 +0000 (14:29 +1300)]
Merge branch 'for-linus' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k fixlet from Geert Uytterhoeven.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Wire up execveat
Linus Torvalds [Fri, 16 Jan 2015 01:28:01 +0000 (14:28 +1300)]
Merge tag 'powerpc-3.19-4' of git://git./linux/kernel/git/mpe/linux
Pull powerpc fixes from Michael Ellerman:
"A few powerpc fixes"
* tag 'powerpc-3.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc: Work around gcc bug in current_thread_info()
cxl: Fix issues when unmapping contexts
powernv: Fix OPAL tracepoint code
Chuck Lever [Tue, 13 Jan 2015 16:03:53 +0000 (11:03 -0500)]
svcrdma: Handle additional inline content
Most NFS RPCs place their large payload argument at the end of the
RPC header (eg, NFSv3 WRITE). For NFSv3 WRITE and SYMLINK, RPC/RDMA
sends the complete RPC header inline, and the payload argument in
the read list. Data in the read list is the last part of the XDR
stream.
One important case is not like this, however. NFSv4 COMPOUND is a
counted array of operations. A WRITE operation, with its large data
payload, can appear in the middle of the compound's operations
array. Thus NFSv4 WRITE compounds can have header content after the
WRITE payload.
The Linux client, for example, performs an NFSv4 WRITE like this:
{ PUTFH, WRITE, GETATTR }
Though RFC 5667 is not precise about this, the proper way to convey
this compound is to place the GETATTR inline, _after_ the front of
the RPC header. The receiver inserts the read list payload into the
XDR stream after the initial WRITE arguments, and before the GETATTR
operation, thanks to the value of the read list "position" field.
The Linux client currently sends the GETATTR at the end of the
RPC/RDMA read list, which is incorrect. It will be corrected in the
future.
The Linux server currently rejects NFSv4 compounds with inline
content after the read list. For the above NFSv4 WRITE compound, the
NFS compound header indicates there are three operations, but the
server finds nonsense when it looks in the XDR stream for the third
operation, and the compound fails with OP_ILLEGAL.
Move trailing inline content to the end of the XDR buffer's page
list. This presents incoming NFSv4 WRITE compounds to NFSD in the
same way the socket transport does.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Chuck Lever [Tue, 13 Jan 2015 16:03:45 +0000 (11:03 -0500)]
svcrdma: Move read list XDR round-up logic
This is a pre-requisite for a subsequent patch.
Read list XDR round-up needs to be done _before_ additional inline
content is copied to the end of the XDR buffer's page list. Move
the logic added by commit
e560e3b510d2 ("svcrdma: Add zero padding
if the client doesn't send it").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Chuck Lever [Tue, 13 Jan 2015 16:03:37 +0000 (11:03 -0500)]
svcrdma: Support RDMA_NOMSG requests
Currently the Linux server can not decode RDMA_NOMSG type requests.
Operations whose length exceeds the fixed size of RDMA SEND buffers,
like large NFSv4 CREATE(NF4LNK) operations, must be conveyed via
RDMA_NOMSG.
For an RDMA_MSG type request, the client sends the RPC/RDMA, RPC
headers, and some or all of the NFS arguments via RDMA SEND.
For an RDMA_NOMSG type request, the client sends just the RPC/RDMA
header via RDMA SEND. The request's read list contains elements for
the entire RPC message, including the RPC header.
NFSD expects the RPC/RMDA header and RPC header to be contiguous in
page zero of the XDR buffer. Add logic in the RDMA READ path to make
the read list contents land where the server prefers, when the
incoming message is a type RDMA_NOMSG message.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Chuck Lever [Tue, 13 Jan 2015 16:03:28 +0000 (11:03 -0500)]
svcrdma: rc_position sanity checking
An RPC/RDMA client may send large RPC arguments via a read
list. This is a list of scatter/gather elements which convey
RPC call arguments too large to fit in a small RDMA SEND.
Each entry in the read list has a "position" field, whose value is
the byte offset in the XDR stream where the data in that entry is to
be inserted. Entries which share the same "position" value make up
the same RPC argument. The receiver inserts entries with the same
position field value in list order into the XDR stream.
Currently the Linux NFS/RDMA server cannot handle receiving read
chunks in more than one position, mostly because no current client
sends read lists with elements in more than one position. As a
sanity check, ensure that all received chunks have the same
"rc_position."
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Chuck Lever [Tue, 13 Jan 2015 16:03:20 +0000 (11:03 -0500)]
svcrdma: Plant reader function in struct svcxprt_rdma
The RDMA reader function doesn't change once an svcxprt_rdma is
instantiated. Instead of checking sc_devcap during every incoming
RPC, set the reader function once when the connection is accepted.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Chuck Lever [Tue, 13 Jan 2015 16:03:11 +0000 (11:03 -0500)]
svcrdma: Find rmsgp more reliably
xdr_start() can return the wrong rmsgp address if an assumption
about how the xdr_buf was constructed changes. When it gets it
wrong, the client receives a reply that has gibberish in the
RPC/RDMA header, preventing it from matching a waiting RPC request.
Instead, make (and document) just one assumption: that the RDMA
header for the client's RPC call is at the start of the first page
in rq_pages.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Chuck Lever [Tue, 13 Jan 2015 16:03:03 +0000 (11:03 -0500)]
svcrdma: Scrub BUG_ON() and WARN_ON() call sites
Current convention is to avoid using BUG_ON() in places where an
oops could cause complete system failure.
Replace BUG_ON() call sites in svcrdma with an assertion error
message and allow execution to continue safely.
Some BUG_ON() calls are removed because they have never fired in
production (that we are aware of).
Some WARN_ON() calls are also replaced where a back trace is not
helpful; e.g., in a workqueue task.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Chuck Lever [Tue, 13 Jan 2015 16:02:54 +0000 (11:02 -0500)]
svcrdma: Clean up read chunk counting
The byte_count argument is not used, and the function is called
only from one place.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Chuck Lever [Tue, 13 Jan 2015 16:02:45 +0000 (11:02 -0500)]
svcrdma: Remove unused variable
Nit: remove an unused variable to squelch a compiler warning.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Chuck Lever [Tue, 13 Jan 2015 16:02:37 +0000 (11:02 -0500)]
svcrdma: Clean up dprintk
Nit: Fix inconsistent white space in dprintk messages.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Rickard Strandqvist [Tue, 13 Jan 2015 20:57:24 +0000 (21:57 +0100)]
nfsd: nfs4state: Remove unused function
Remove the function renew_client() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Rickard Strandqvist [Sat, 10 Jan 2015 17:02:42 +0000 (18:02 +0100)]
lockd: xdr: Remove unused function
Remove the function nlm_encode_fh() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Linus Torvalds [Thu, 15 Jan 2015 06:20:26 +0000 (19:20 +1300)]
Merge branch 'thermal-soc' of git://git./linux/kernel/git/rzhang/linux
Pull thermal fixes from Zhang Rui:
"Specifics:
- bogus type qualifier fix in OF thermal code.
- Minor fixes on imx and rcar thermal drivers.
- Update TI SoC thermal maintainer entry.
- Updated documentation of OF cpufreq cooling register"
* 'thermal-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: rcar: Spelling/grammar: s/drier use .../driver uses ...s/
thermal: rcar: change type of ctemp in rcar_thermal_update_temp()
thermal: rcar: fix ENR register value
Documentation: thermal: document of_cpufreq_cooling_register()
Thermal: imx: add clk disable/enable for suspend/resume
MAINTAINERS: update ti-soc-thermal status
MAINTAINERS: Add linux-omap to list of reviewers for TI Thermal
thermal: of: Remove bogus type qualifier for of_thermal_get_trip_points()
Linus Torvalds [Wed, 14 Jan 2015 22:17:37 +0000 (11:17 +1300)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Don't use uninitialized data in IPVS, from Dan Carpenter.
2) conntrack race fixes from Pablo Neira Ayuso.
3) Fix TX hangs with i40e, from Jesse Brandeburg.
4) Fix budget return from poll calls in dnet and alx, from Eric
Dumazet.
5) Fix bugus "if (unlikely(x) < 0)" test in AF_PACKET, from Christoph
Jaeger.
6) Fix bug introduced by conversion to list_head in TIPC retransmit
code, from Jon Paul Maloy.
7) Don't use GFP_NOIO under spinlock in USB kaweth driver, from Alexey
Khoroshilov.
8) Fix bridge build with INET disabled, from Arnd Bergmann.
9) Fix netlink array overrun for PROBE attributes in openvswitch, from
Thomas Graf.
10) Don't hold spinlock across synchronize_irq() in tg3 driver, from
Prashant Sreedharan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
tg3: Release tp->lock before invoking synchronize_irq()
tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
openvswitch: packet messages need their own probe attribtue
i40e: adds FCoE configure option
cxgb4vf: Fix queue allocation for 40G adapter
netdevice: Add missing parentheses in macro
bridge: only provide proxy ARP when CONFIG_INET is enabled
neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
net: fec: fix MDIO bus assignement for dual fec SoC's
xen-netfront: use different locks for Rx and Tx stats
drivers: net: cpsw: fix multicast flush in dual emac mode
cxgb4vf: Initialize mdio_addr before using it
net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()
MAINTAINERS: add me as ibmveth maintainer
tipc: fix bug in broadcast retransmit code
update ip-sysctl.txt documentation (v2)
net/at91_ether: prepare and unprepare clock
...
David S. Miller [Wed, 14 Jan 2015 22:05:55 +0000 (17:05 -0500)]
Merge branch 'tg3-net'
Prashant Sreedharan says:
====================
tg3: synchronize_irq() should be called without taking locks
v2: Added Reported-by, Tested-by fields and reference to the thread that
reported the problem
This series addresses the problem reported by Peter Hurley in mail thread
https://lkml.org/lkml/2015/1/12/1082
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Sreedharan [Wed, 14 Jan 2015 19:34:44 +0000 (11:34 -0800)]
tg3: Release tp->lock before invoking synchronize_irq()
synchronize_irq() can sleep waiting, for pending IRQ handlers so driver
should release the tp->lock spin lock before invoking synchronize_irq()
Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Sreedharan [Wed, 14 Jan 2015 19:34:17 +0000 (11:34 -0800)]
tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
Currently tg3_reset_task() uses only tp->lock for synchronizing with code
paths like tg3_open() etc. But since tp->lock is released before doing
synchronize_irq(), rtnl_lock should be taken in tg3_reset_task() to
synchronize it with other code paths.
Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Sreedharan [Wed, 14 Jan 2015 19:33:49 +0000 (11:33 -0800)]
tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
This is to avoid the race between tg3_timer() and the execution paths
which does not invoke tg3_timer_stop() and releases tp->lock before
calling synchronize_irq()
Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 14 Jan 2015 21:54:30 +0000 (10:54 +1300)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Two bugfixes for arm64. I will have another pull request next week,
but otherwise things are calm"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
arm64: KVM: Fix HCR setting for 32bit guests
arm64: KVM: Fix TLB invalidation by IPA/VMID
Jiri Pirko [Wed, 14 Jan 2015 17:15:30 +0000 (18:15 +0100)]
team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
This patch is fixing a race condition that may cause setting
count_pending to -1, which results in unwanted big bulk of arp messages
(in case of "notify peers").
Consider following scenario:
count_pending == 2
CPU0 CPU1
team_notify_peers_work
atomic_dec_and_test (dec count_pending to 1)
schedule_delayed_work
team_notify_peers
atomic_add (adding 1 to count_pending)
team_notify_peers_work
atomic_dec_and_test (dec count_pending to 1)
schedule_delayed_work
team_notify_peers_work
atomic_dec_and_test (dec count_pending to 0)
schedule_delayed_work
team_notify_peers_work
atomic_dec_and_test (dec count_pending to -1)
Fix this race by using atomic_dec_if_positive - that will prevent
count_pending running under 0.
Fixes:
fc423ff00df3a1955441 ("team: add peer notification")
Fixes:
492b200efdd20b8fcfd ("team: add support for sending multicast rejoins")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 14 Jan 2015 21:50:29 +0000 (10:50 +1300)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Two small performance tweaks, the plumbing for the execveat system
call and a couple of bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/uprobes: fix user space PER events
s390/bpf: Fix JMP_JGE_X (A > X) and JMP_JGT_X (A >= X)
s390/bpf: Fix ALU_NEG (A = -A)
s390/mm: avoid using pmd_to_page for !USE_SPLIT_PMD_PTLOCKS
s390/timex: fix get_tod_clock_ext() inline assembly
s390: wire up execveat syscall
s390/kernel: use stnsm 255 instead of stosm 0
s390/vtime: Get rid of redundant WARN_ON
s390/zcrypt: kernel oops at insmod of the z90crypt device driver
Thomas Graf [Wed, 14 Jan 2015 13:56:19 +0000 (13:56 +0000)]
openvswitch: packet messages need their own probe attribtue
User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.
Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining to be binary compatible with existing OVS binaries.
Fixes:
05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tracked-down-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasu Dev [Wed, 14 Jan 2015 13:14:07 +0000 (05:14 -0800)]
i40e: adds FCoE configure option
Adds FCoE config option I40E_FCOE, so that FCoE can be enabled
as needed but otherwise have it disabled by default.
This also eliminate multiple FCoE config checks, instead now just
one config check for CONFIG_I40E_FCOE.
The I40E FCoE was added with 3.17 kernel and therefore this patch
shall be applied to stable 3.17 kernel also.
CC: <stable@vger.kernel.org>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 14 Jan 2015 12:28:10 +0000 (17:58 +0530)]
cxgb4vf: Fix queue allocation for 40G adapter
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 14 Jan 2015 21:38:07 +0000 (10:38 +1300)]
Merge tag 'locks-v3.19-1' of git://git.samba.org/jlayton/linux
Pull file locking fix from Jeff Layton:
"Just a simple bugfix for a regression that I introduced into v3.18
with the internal lease API overhaul -- mea culpa. Kudos to Linda and
Neil for tracking this down and fixing it"
* tag 'locks-v3.19-1' of git://git.samba.org/jlayton/linux:
locks: fix NULL-deref in generic_delete_lease
Benjamin Poirier [Wed, 14 Jan 2015 07:52:35 +0000 (16:52 +0900)]
netdevice: Add missing parentheses in macro
For example, one could conceivably call
for_each_netdev_in_bond_rcu(condition ? bond1 : bond2, slave)
and get an unexpected result.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 14 Jan 2015 21:27:56 +0000 (10:27 +1300)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
"The major part is an update to the NVMe driver, fixing various issues
around surprise removal and hung controllers. Most of that is from
Keith, and parts are simple blk-mq fixes or exports/additions of minor
functions to aid this effort, and parts are changes directly to the
NVMe driver.
Apart from the above, this contains:
- Small blk-mq change from me, killing an unused member of the
hardware queue structure.
- Small fix from Ming Lei, fixing up a few drivers that didn't
properly check for ERR_PTR() returns from blk_mq_init_queue()"
* 'for-linus' of git://git.kernel.dk/linux-block:
NVMe: Fix locking on abort handling
NVMe: Start and stop h/w queues on reset
NVMe: Command abort handling fixes
NVMe: Admin queue removal handling
NVMe: Reference count admin queue usage
NVMe: Start all requests
blk-mq: End unstarted requests on a dying queue
blk-mq: Allow requests to never expire
blk-mq: Add helper to abort requeued requests
blk-mq: Let drivers cancel requeue_work
blk-mq: Export if requests were started
blk-mq: Wake tasks entering queue on dying
blk-mq: get rid of ->cmd_size in the hardware queue
block: fix checking return value of blk_mq_init_queue
block: wake up waiters when a queue is marked dying
NVMe: Fix double free irq
blk-mq: Export freeze/unfreeze functions
blk-mq: Exit queue on alloc failure
Arnd Bergmann [Tue, 13 Jan 2015 14:10:27 +0000 (15:10 +0100)]
bridge: only provide proxy ARP when CONFIG_INET is enabled
When IPV4 support is disabled, we cannot call arp_send from
the bridge code, which would result in a kernel link error:
net/built-in.o: In function `br_handle_frame_finish':
:(.text+0x59914): undefined reference to `arp_send'
:(.text+0x59a50): undefined reference to `arp_tbl'
This makes the newly added proxy ARP support in the bridge
code depend on the CONFIG_INET symbol and lets the compiler
optimize the code out to avoid the link error.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes:
958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP")
Cc: Kyeyoon Park <kyeyoonp@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tim Kryger [Wed, 14 Jan 2015 06:24:12 +0000 (07:24 +0100)]
mmc: sdhci: Set SDHCI_POWER_ON with external vmmc
Host controllers lacking the required internal vmmc regulator may still
follow the spec with regard to the LSB of SDHCI_POWER_CONTROL. Set the
SDHCI_POWER_ON bit when vmmc is enabled to encourage the controller to
to drive CMD, DAT, SDCLK.
This fixes a regression observed on some Qualcomm and Nvidia boards
caused by
5222161 mmc: sdhci: Improve external VDD regulator support.
Fixes:
52221610dd84 (mmc: sdhci: Improve external VDD regulator support)
Signed-off-by: Tim Kryger <tim.kryger@gmail.com>
Tested-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Zhang Rui [Wed, 14 Jan 2015 07:49:19 +0000 (15:49 +0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal into thermal-soc
Jean-Francois Remy [Wed, 14 Jan 2015 03:22:39 +0000 (04:22 +0100)]
neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
When setting base_reachable_time or base_reachable_time_ms on a
specific interface through sysctl or netlink, the reachable_time
value is not updated.
This means that neighbour entries will continue to be updated using the
old value until it is recomputed in neigh_period_work (which
recomputes the value every 300*HZ).
On systems with HZ equal to 1000 for instance, it means 5mins before
the change is effective.
This patch changes this behavior by recomputing reachable_time after
each set on base_reachable_time or base_reachable_time_ms.
The new value will become effective the next time the neighbour's timer
is triggered.
Changes are made in two places: the netlink code for set and the sysctl
handling code. For sysctl, I use a proc_handler. The ipv6 network
code does provide its own handler but it already refreshes
reachable_time correctly so it's not an issue.
Any other user of neighbour which provide its own handlers must
refresh reachable_time.
Signed-off-by: Jean-Francois Remy <jeff@melix.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Agner [Tue, 13 Jan 2015 23:20:21 +0000 (00:20 +0100)]
net: fec: fix MDIO bus assignement for dual fec SoC's
On i.MX28, the MDIO bus is shared between the two FEC instances.
The driver makes sure that the second FEC uses the MDIO bus of the
first FEC. This is done conditionally if FEC_QUIRK_ENET_MAC is set.
However, in newer designs, such as Vybrid or i.MX6SX, each FEC MAC
has its own MDIO bus. Simply removing the quirk FEC_QUIRK_ENET_MAC
is not an option since other logic, triggered by this quirk, is
still needed.
Furthermore, there are board designs which use the same MDIO bus
for both PHY's even though the second bus would be available on the
SoC side. Such layout are popular since it saves pins on SoC side.
Due to the above quirk, those boards currently do work fine. The
boards in the mainline tree with such a layout are:
- Freescale Vybrid Tower with TWR-SER2 (vf610-twr.dts)
- Freescale i.MX6 SoloX SDB Board (imx6sx-sdb.dts)
This patch adds a new quirk FEC_QUIRK_SINGLE_MDIO for i.MX28, which
makes sure that the MDIO bus of the first FEC is used in any case.
However, the boards above do have a SoC with a MDIO bus for each FEC
instance. But the PHY's are not connected in a 1:1 configuration. A
proper device tree description is needed to allow the driver to
figure out where to find its PHY. This patch fixes that shortcoming
by adding a MDIO bus child node to the first FEC instance, along
with the two PHY's on that bus, and making use of the phy-handle
property to add a reference to the PHY's.
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 13 Jan 2015 23:22:56 +0000 (12:22 +1300)]
Merge branch 'leds-fixes-for-3.19' of git://git./linux/kernel/git/cooloney/linux-leds
Pull LED fix from Bryan Wu.
* 'leds-fixes-for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: netxbig: fix oops at probe time
Linus Torvalds [Tue, 13 Jan 2015 22:54:12 +0000 (11:54 +1300)]
Merge tag 'for-linus' of git://git./linux/kernel/git/borntraeger/linux
Pull WRITE_ONCE argument order change from Christian Borntraeger:
"As discussed on LKML[1] it was agreed that WRITE_ONCE(x, val) is
better than ASSIGN_ONCE(val, x)
Lets change that for 3.19 as 3.19 has no user yet, but the first users
will hit linux-next soon"
[1] http://marc.info/?l=linux-kernel&m=
142081181707596
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux:
kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)
David Vrabel [Tue, 13 Jan 2015 16:42:42 +0000 (16:42 +0000)]
xen-netfront: use different locks for Rx and Tx stats
In netfront the Rx and Tx path are independent and use different
locks. The Tx lock is held with hard irqs disabled, but Rx lock is
held with only BH disabled. Since both sides use the same stats lock,
a deadlock may occur.
[ INFO: possible irq lock inversion dependency detected ]
3.16.2 #16 Not tainted
---------------------------------------------------------
swapper/0/0 just changed the state of lock:
(&(&queue->tx_lock)->rlock){-.....}, at: [<
c03adec8>]
xennet_tx_interrupt+0x14/0x34
but this lock took another, HARDIRQ-unsafe lock in the past:
(&stat->syncp.seq#2){+.-...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&stat->syncp.seq#2);
local_irq_disable();
lock(&(&queue->tx_lock)->rlock);
lock(&stat->syncp.seq#2);
<Interrupt>
lock(&(&queue->tx_lock)->rlock);
Using separate locks for the Rx and Tx stats fixes this deadlock.
Reported-by: Dmitry Piotrovsky <piotrovskydmitry@gmail.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Tue, 13 Jan 2015 12:05:49 +0000 (17:35 +0530)]
drivers: net: cpsw: fix multicast flush in dual emac mode
Since ALE table is a common resource for both the interfaces in Dual EMAC
mode and while bringing up the second interface in cpsw_ndo_set_rx_mode()
all the multicast entries added by the first interface is flushed out and
only second interface multicast addresses are added. Fixing this by
flushing multicast addresses based on dual EMAC port vlans which will not
affect the other emac port multicast addresses.
Fixes:
d9ba8f9 (driver: net: ethernet: cpsw: dual emac interface implementation)
Cc: <stable@vger.kernel.org> # v3.9+
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Guinot [Tue, 2 Dec 2014 15:32:10 +0000 (07:32 -0800)]
leds: netxbig: fix oops at probe time
This patch fixes a NULL pointer dereference on led_dat->mode_val. Due to
this bug, a kernel oops can be observed at probe time on the LaCie 2Big
and 5Big v2 boards:
Unable to handle kernel NULL pointer dereference at virtual address
00000008
[...]
[<
c03f244c>] (netxbig_led_probe) from [<
c02c8c6c>] (platform_drv_probe+0x4c/0x9c)
[<
c02c8c6c>] (platform_drv_probe) from [<
c02c72d0>] (driver_probe_device+0x98/0x25c)
[<
c02c72d0>] (driver_probe_device) from [<
c02c7520>] (__driver_attach+0x8c/0x90)
[<
c02c7520>] (__driver_attach) from [<
c02c5c24>] (bus_for_each_dev+0x68/0x94)
[<
c02c5c24>] (bus_for_each_dev) from [<
c02c6408>] (bus_add_driver+0x124/0x1dc)
[<
c02c6408>] (bus_add_driver) from [<
c02c7ac0>] (driver_register+0x78/0xf8)
[<
c02c7ac0>] (driver_register) from [<
c000888c>] (do_one_initcall+0x80/0x1cc)
[<
c000888c>] (do_one_initcall) from [<
c0733618>] (kernel_init_freeable+0xe4/0x1b4)
[<
c0733618>] (kernel_init_freeable) from [<
c058db9c>] (kernel_init+0xc/0xec)
[<
c058db9c>] (kernel_init) from [<
c0009850>] (ret_from_fork+0x14/0x24)
[...]
This bug was introduced by commit
588a6a99286ae30afb1339d8bc2163517b1b7dd1
("leds: netxbig: fix attribute-creation race").
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Cc: <stable@vger.kernel.org> # 3.17+
Acked-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
Hariprasad Shenai [Mon, 12 Jan 2015 16:36:16 +0000 (22:06 +0530)]
cxgb4vf: Initialize mdio_addr before using it
In commit
5ad24def21b205a8 ("cxgb4vf: Fix ethtool get_settings for VF driver")
mdio_addr of port_info structure was used unininitialzed. Fixing it.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Borntraeger [Tue, 13 Jan 2015 09:46:42 +0000 (10:46 +0100)]
kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)
Feedback has shown that WRITE_ONCE(x, val) is easier to use than
ASSIGN_ONCE(val,x).
There are no in-tree users yet, so lets change it for 3.19.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Linus Torvalds [Tue, 13 Jan 2015 19:09:14 +0000 (08:09 +1300)]
Merge tag 'linux-kselftest-3.19-rc-5' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"This update contains three patches to fix one compile error, and two
run-time bugs. One of them fixes infinite loop on ARM"
* tag 'linux-kselftest-3.19-rc-5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/vm: fix link error for transhuge-stress test
tools: testing: selftests: mq_perf_tests: Fix infinite loop on ARM
selftests/exec: allow shell return code of 126
Linus Torvalds [Tue, 13 Jan 2015 19:07:42 +0000 (08:07 +1300)]
Merge tag 'stable/for-linus-3.19-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
"Several critical linear p2m fixes that prevented some hosts from
booting"
* tag 'stable/for-linus-3.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: properly retrieve NMI reason
xen: check for zero sized area when invalidating memory
xen: use correct type for physical addresses
xen: correct race in alloc_p2m_pmd()
xen: correct error for building p2m list on 32 bits
x86/xen: avoid freeing static 'name' when kasprintf() fails
x86/xen: add extra memory for remapped frames during setup
x86/xen: don't count how many PFNs are identity mapped
x86/xen: Free bootmem in free_p2m_page() during early boot
x86/xen: Remove unnecessary BUG_ON(preemptible()) in xen_setup_timer()
B Viswanath [Mon, 12 Jan 2015 09:16:25 +0000 (14:46 +0530)]
net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
Corrected the comment describing the ndo operations to
reflect the actual prototype for couple of operations
Signed-off-by: B Viswanath <marichika4@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 13 Jan 2015 18:53:51 +0000 (07:53 +1300)]
Merge branch 'for-rc' of git://git./linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui:
"Specifics:
- Fix a problem that Intel SoC DTS thermal driver does not work when
CONFIG_THERMAL_INT340X is not set.
- Fix a NULL pointer dereference when processor_thermal_device driver
is loaded on a platform without ACPI support"
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
int340x_thermal/processor_thermal_device: return failure when
ACPI/int340x_thermal: enumerate INT3401 for Intel SoC DTS thermal driver
ACPI/int340x_thermal: enumerate INT340X devices even if they're not in _ART/_TRT
Colin Ian King [Mon, 12 Jan 2015 15:27:52 +0000 (15:27 +0000)]
fbdev/broadsheetfb: fix memory leak
static code analysis from cppcheck reports:
[drivers/video/fbdev/broadsheetfb.c:673]:
(error) Memory leak: sector_buffer
sector_buffer is not being kfree'd on each call to
broadsheet_spiflash_rewrite_sector(), so free it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
NeilBrown [Tue, 13 Jan 2015 02:17:43 +0000 (15:17 +1300)]
locks: fix NULL-deref in generic_delete_lease
commit
0efaa7e82f02fe69c05ad28e905f31fc86e6f08e
locks: generic_delete_lease doesn't need a file_lock at all
moves the call to fl->fl_lmops->lm_change() to a place in the
code where fl might be a non-lease lock.
When that happens, fl_lmops is NULL and an Oops ensures.
So add an extra test to restore correct functioning.
Reported-by: Linda Walsh <suse@tlinx.org>
Link: https://bugzilla.suse.com/show_bug.cgi?id=912569
Cc: stable@vger.kernel.org (v3.18)
Fixes:
0efaa7e82f02fe69c05ad28e905f31fc86e6f08e
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Hans de Goede [Mon, 5 Jan 2015 08:15:16 +0000 (09:15 +0100)]
simplefb: Fix build failure on Sparc
of_platform_device_create is only defined when CONFIG_OF_ADDRESS is set,
which is normally always the case when CONFIG_OF is defined, except on Sparc,
so explicitly check for CONFIG_OF_ADDRESS rather then for CONFIG_OF.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Jan Beulich [Tue, 13 Jan 2015 07:40:05 +0000 (07:40 +0000)]
x86/xen: properly retrieve NMI reason
Using the native code here can't work properly, as the hypervisor would
normally have cleared the two reason bits by the time Dom0 gets to see
the NMI (if passed to it at all). There's a shared info field for this,
and there's an existing hook to use - just fit the two together. This
is particularly relevant so that NMIs intended to be handled by APEI /
GHES actually make it to the respective handler.
Note that the hook can (and should) be used irrespective of whether
being in Dom0, as accessing port 0x61 in a DomU would be even worse,
while the shared info field would just hold zero all the time. Note
further that hardware NMI handling for PVH doesn't currently work
anyway due to missing code in the hypervisor (but it is expected to
work the native rather than the PV way).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Linus Torvalds [Tue, 13 Jan 2015 02:29:42 +0000 (15:29 +1300)]
Merge tag 'gpio-v3.19-3' of git://git./linux/kernel/git/linusw/linux-gpio
Pull gpio fixes from Linus Walleij:
"Here are some GPIO fixes, mainly affecting the DLN2 IRQ handling.
Nothing special about them, just fixes:
- Three patches fixing IRQ handling for the DLN2
- Null pointer handling for grgpio"
* tag 'gpio-v3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: dln2: use bus_sync_unlock instead of scheduling work
gpio: grgpio: Avoid potential NULL pointer dereference
gpio: dln2: Fix gpio output value in dln2_gpio_direction_output()
gpio: dln2: fix issue when an IRQ is unmasked then enabled
Linus Torvalds [Tue, 13 Jan 2015 02:25:23 +0000 (15:25 +1300)]
Merge tag 'mmc-v3.19-3' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"MMC host:
- sdhci-pci|acpi: Support some new IDs
- sdhci: Fix sleep from atomic context
- sdhci-pxav3: Prevent hang during ->probe()
- sdhci: Disable re-tuning for HS400"
* tag 'mmc-v3.19-3' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: sdhci-pci: Add support for Intel SPT
mmc: sdhci-acpi: Add ACPI HID INT344D
mmc: sdhci: Fix sleep in atomic after inserting SD card
mmc: sdhci-pxav3: do the mbus window configuration after enabling clocks
mmc: sdhci: Disable re-tuning for HS400
mmc: sdhci: Simplify use of tuning timer
mmc: sdhci: Add out_unlock to sdhci_execute_tuning
mmc: sdhci: Tuning should not change max_blk_count
Linus Torvalds [Tue, 13 Jan 2015 02:23:26 +0000 (15:23 +1300)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull scsi target fixes from Nicholas Bellinger:
"Mostly minor fixes this time, including:
- Add missing virtio-scsi -> TCM attribute conversion in vhost-scsi.
- Fix persistent reservations write exclusive handling to allow
readers for all registered I_T nexuses.
- Drop arbitrary maximum I/O size limit in order to process I/Os
larger than 4 MB, required for initiators that don't honor block
limits EVPD.
- Drop the now left-over fabric_max_sectors attribute"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iscsi-target: Fix typos in enum cmd_flags_table
MAINTAINERS: Add entry for iSER target driver
target: Allow Write Exclusive non-reservation holders to READ
target: Drop left-over fabric_max_sectors attribute
target: Drop arbitrary maximum I/O size limit
Documentation/target: Update fabric_ops to latest code
vhost-scsi: Add missing virtio-scsi -> TCM attribute conversion
Will Deacon [Mon, 12 Jan 2015 19:10:55 +0000 (19:10 +0000)]
mm: mmu_gather: use tlb->end != 0 only for TLB invalidation
When batching up address ranges for TLB invalidation, we check tlb->end
!= 0 to indicate that some pages have actually been unmapped.
As of commit
f045bbb9fa1b ("mmu_gather: fix over-eager
tlb_flush_mmu_free() calling"), we use the same check for freeing these
pages in order to avoid a performance regression where we call
free_pages_and_swap_cache even when no pages are actually queued up.
Unfortunately, the range could have been reset (tlb->end = 0) by
tlb_end_vma, which has been shown to cause memory leaks on arm64.
Furthermore, investigation into these leaks revealed that the fullmm
case on task exit no longer invalidates the TLB, by virtue of tlb->end
== 0 (in 3.18, need_flush would have been set).
This patch resolves the problem by reverting commit
f045bbb9fa1b, using
instead tlb->local.nr as the predicate for page freeing in
tlb_flush_mmu_free and ensuring that tlb->end is initialised to a
non-zero value in the fullmm case.
Tested-by: Mark Langsdorf <mlangsdo@redhat.com>
Tested-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Khoroshilov [Fri, 9 Jan 2015 23:16:22 +0000 (02:16 +0300)]
usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()
Commit
e4c7f259c5be ("USB: kaweth.c: use GFP_ATOMIC under spin_lock")
makes sure that kaweth_internal_control_msg() allocates memory with GFP_ATOMIC,
but kaweth_internal_control_msg() also calls usb_start_wait_urb()
that still allocates memory with GFP_NOIO.
The patch fixes usb_start_wait_urb() as well.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Fri, 9 Jan 2015 20:29:40 +0000 (14:29 -0600)]
MAINTAINERS: add me as ibmveth maintainer
Adding myself as the ibmveth maintainer and replacing
Santiago Leon.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Cc: Santiago Leon <santi_leon@yahoo.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 8 Jan 2015 17:27:27 +0000 (12:27 -0500)]
tipc: fix bug in broadcast retransmit code
In commit
58dc55f25631178ee74cd27185956a8f7dcb3e32 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.
In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.
In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Mon, 12 Jan 2015 16:33:59 +0000 (17:33 +0100)]
thermal: rcar: Spelling/grammar: s/drier use .../driver uses ...s/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Ani Sinha [Wed, 7 Jan 2015 23:45:56 +0000 (15:45 -0800)]
update ip-sysctl.txt documentation (v2)
Update documentation to reflect the fact that
/proc/sys/net/ipv4/route/max_size is no longer used for ipv4.
Signed-off-by: Ani Sinha <ani@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexandre Belloni [Wed, 7 Jan 2015 22:59:26 +0000 (23:59 +0100)]
net/at91_ether: prepare and unprepare clock
The clock is enabled without being prepared, this leads to:
WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:889 __clk_enable+0x24/0xa8()
and a non working ethernet interface.
Use clk_prepare_enable() and clk_disable_unprepare() to handle the clock.
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giel van Schijndel [Wed, 7 Jan 2015 19:10:12 +0000 (20:10 +0100)]
isdn: fix NUL (\0 or \x00) specification in string
In C one can either use '\0' or '\x00' (or '\000') to add a NUL byte to
a string. '\0x00' isn't part of these and will in fact result in a
single NUL followed by "x00". This fixes that.
Signed-off-by: Giel van Schijndel <me@mortis.eu>
Reported-at: http://www.viva64.com/en/b/0299/
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Zyngier [Sun, 11 Jan 2015 13:10:11 +0000 (14:10 +0100)]
arm64: KVM: Fix HCR setting for 32bit guests
Commit
b856a59141b1 (arm/arm64: KVM: Reset the HCR on each vcpu
when resetting the vcpu) moved the init of the HCR register to
happen later in the init of a vcpu, but left out the fixup
done in kvm_reset_vcpu when preparing for a 32bit guest.
As a result, the 32bit guest is run as a 64bit guest, but the
rest of the kernel still manages it as a 32bit. Fun follows.
Moving the fixup to vcpu_reset_hcr solves the problem for good.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Marc Zyngier [Sun, 11 Jan 2015 13:10:10 +0000 (14:10 +0100)]
arm64: KVM: Fix TLB invalidation by IPA/VMID
It took about two years for someone to notice that the IPA passed
to TLBI IPAS2E1IS must be shifted by 12 bits. Clearly our reviewing
is not as good as it should be...
Paper bag time for me.
Reported-by: Mario Smarduch <m.smarduch@samsung.com>
Tested-by: Mario Smarduch <m.smarduch@samsung.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Juergen Gross [Mon, 12 Jan 2015 05:05:10 +0000 (06:05 +0100)]
xen: check for zero sized area when invalidating memory
With the introduction of the linear mapped p2m list setting memory
areas to "invalid" had to be delayed. When doing the invalidation
make sure no zero sized areas are processed.
Signed-off-by: Juegren Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Mon, 12 Jan 2015 05:05:09 +0000 (06:05 +0100)]
xen: use correct type for physical addresses
When converting a pfn to a physical address be sure to use 64 bit
wide types or convert the physical address to a pfn if possible.
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Mon, 12 Jan 2015 05:05:08 +0000 (06:05 +0100)]
xen: correct race in alloc_p2m_pmd()
When allocating a new pmd for the linear mapped p2m list a check is
done for not introducing another pmd when this just happened on
another cpu. In this case the old pte pointer was returned which
points to the p2m_missing or p2m_identity page. The correct value
would be the pointer to the found new page.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Mon, 12 Jan 2015 05:05:07 +0000 (06:05 +0100)]
xen: correct error for building p2m list on 32 bits
In xen_rebuild_p2m_list() for large areas of invalid or identity
mapped memory the pmd entries on 32 bit systems are initialized
wrong. Correct this error.
Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Jan Willeke [Thu, 8 Jan 2015 15:56:01 +0000 (16:56 +0100)]
s390/uprobes: fix user space PER events
If uprobes are single stepped for example with gdb, the behavior should
now be correct. Before this patch, when gdb was single stepping a uprobe,
the result was a SIGILL.
When PER is active for any storage alteration and a uprobe is hit, a storage
alteration event is indicated. These over indications are filterd out by gdb,
if no change has happened within the observed area.
Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Adrian Hunter [Mon, 5 Jan 2015 12:47:58 +0000 (14:47 +0200)]
mmc: sdhci-pci: Add support for Intel SPT
Add PCI IDs for SPT eMMC, SDIO and SD card.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Adrian Hunter [Mon, 5 Jan 2015 12:47:57 +0000 (14:47 +0200)]
mmc: sdhci-acpi: Add ACPI HID INT344D
Add ACPI HID INT344D for an Intel SDIO host controller.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Krzysztof Kozlowski [Mon, 5 Jan 2015 09:50:15 +0000 (10:50 +0100)]
mmc: sdhci: Fix sleep in atomic after inserting SD card
Sleep in atomic context happened on Trats2 board after inserting or
removing SD card because mmc_gpio_get_cd() was called under spin lock.
Fix this by moving card detection earlier, before acquiring spin lock.
The mmc_gpio_get_cd() call does not have to be protected by spin lock
because it does not access any sdhci internal data.
The sdhci_do_get_cd() call access host flags (SDHCI_DEVICE_DEAD). After
moving it out side of spin lock it could theoretically race with driver
removal but still there is no actual protection against manual card
eject.
Dmesg after inserting SD card:
[ 41.663414] BUG: sleeping function called from invalid context at drivers/gpio/gpiolib.c:1511
[ 41.670469] in_atomic(): 1, irqs_disabled(): 128, pid: 30, name: kworker/u8:1
[ 41.677580] INFO: lockdep is turned off.
[ 41.681486] irq event stamp: 61972
[ 41.684872] hardirqs last enabled at (61971): [<
c0490ee0>] _raw_spin_unlock_irq+0x24/0x5c
[ 41.693118] hardirqs last disabled at (61972): [<
c04907ac>] _raw_spin_lock_irq+0x18/0x54
[ 41.701190] softirqs last enabled at (61648): [<
c0026fd4>] __do_softirq+0x234/0x2c8
[ 41.708914] softirqs last disabled at (61631): [<
c00273a0>] irq_exit+0xd0/0x114
[ 41.716206] Preemption disabled at:[< (null)>] (null)
[ 41.721500]
[ 41.722985] CPU: 3 PID: 30 Comm: kworker/u8:1 Tainted: G W 3.18.0-rc5-next-
20141121 #883
[ 41.732111] Workqueue: kmmcd mmc_rescan
[ 41.735945] [<
c0014d2c>] (unwind_backtrace) from [<
c0011c80>] (show_stack+0x10/0x14)
[ 41.743661] [<
c0011c80>] (show_stack) from [<
c0489d14>] (dump_stack+0x70/0xbc)
[ 41.750867] [<
c0489d14>] (dump_stack) from [<
c0228b74>] (gpiod_get_raw_value_cansleep+0x18/0x30)
[ 41.759628] [<
c0228b74>] (gpiod_get_raw_value_cansleep) from [<
c03646e8>] (mmc_gpio_get_cd+0x38/0x58)
[ 41.768821] [<
c03646e8>] (mmc_gpio_get_cd) from [<
c036d378>] (sdhci_request+0x50/0x1a4)
[ 41.776808] [<
c036d378>] (sdhci_request) from [<
c0357934>] (mmc_start_request+0x138/0x268)
[ 41.785051] [<
c0357934>] (mmc_start_request) from [<
c0357cc8>] (mmc_wait_for_req+0x58/0x1a0)
[ 41.793469] [<
c0357cc8>] (mmc_wait_for_req) from [<
c0357e68>] (mmc_wait_for_cmd+0x58/0x78)
[ 41.801714] [<
c0357e68>] (mmc_wait_for_cmd) from [<
c0361c00>] (mmc_io_rw_direct_host+0x98/0x124)
[ 41.810480] [<
c0361c00>] (mmc_io_rw_direct_host) from [<
c03620f8>] (sdio_reset+0x2c/0x64)
[ 41.818641] [<
c03620f8>] (sdio_reset) from [<
c035a3d8>] (mmc_rescan+0x254/0x2e4)
[ 41.826028] [<
c035a3d8>] (mmc_rescan) from [<
c003a0e0>] (process_one_work+0x180/0x3f4)
[ 41.833920] [<
c003a0e0>] (process_one_work) from [<
c003a3bc>] (worker_thread+0x34/0x4b0)
[ 41.841991] [<
c003a3bc>] (worker_thread) from [<
c003fed8>] (kthread+0xe4/0x104)
[ 41.849285] [<
c003fed8>] (kthread) from [<
c000f268>] (ret_from_fork+0x14/0x2c)
[ 42.038276] mmc0: new high speed SDHC card at address 1234
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes:
94144a465dd0 ("mmc: sdhci: add get_cd() implementation")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>