Linus Torvalds [Thu, 28 Mar 2013 20:43:46 +0000 (13:43 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull userns fixes from Eric W Biederman:
"The bulk of the changes are fixing the worst consequences of the user
namespace design oversight in not considering what happens when one
namespace starts off as a clone of another namespace, as happens with
the mount namespace.
The rest of the changes are just plain bug fixes.
Many thanks to Andy Lutomirski for pointing out many of these issues."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Restrict when proc and sysfs can be mounted
ipc: Restrict mounting the mqueue filesystem
vfs: Carefully propogate mounts across user namespaces
vfs: Add a mount flag to lock read only bind mounts
userns: Don't allow creation if the user is chrooted
yama: Better permission check for ptraceme
pid: Handle the exit of a multi-threaded init.
scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.
Linus Torvalds [Wed, 27 Mar 2013 22:50:24 +0000 (15:50 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/sfr/next-fixes
Pull powerpc build fixes from Stephen Rothwell:
"Just a couple of build fixes for powerpc all{mod,yes}config.
Submitted by me since BenH is on vacation."
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes:
powerpc: define the conditions where the ePAPR idle hcall can be supported
powerpc: make additional room in exception vector area
Linus Torvalds [Wed, 27 Mar 2013 19:56:25 +0000 (12:56 -0700)]
Merge tag 'stable/for-linus-3.9-rc4-tag' of git://git./linux/kernel/git/konrad/xen
Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
"This is mostly just the last stragglers of the regression bugs that
this merge window had. There are also two bug-fixes: one that adds an
extra layer of security, and a regression fix for a change that was
added in v3.7 (the v1 was faulty, the v2 works).
- Regression fixes for C-and-P states not being parsed properly.
- Fix possible security issue with guests triggering DoS via
non-assigned MSI-Xs.
- Fix regression (introduced in v3.7) with raising an event (v2).
- Fix hastily introduced band-aid during c0 for the CR3 blowup."
* tag 'stable/for-linus-3.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/events: avoid race with raising an event in unmask_evtchn()
xen/mmu: Move the setting of pvops.write_cr3 to later phase in bootup.
xen/acpi-stub: Disable it b/c the acpi_processor_add is no longer called.
xen-pciback: notify hypervisor about devices intended to be assigned to guests
xen/acpi-processor: Don't dereference struct acpi_processor on all CPUs.
Linus Torvalds [Wed, 27 Mar 2013 18:18:43 +0000 (11:18 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- fix for potential 3.9 regression in handling of buttons for touchpads
following HID mt specification; potential because reportedly there is
no retail product on the market that would be using this feature, but
nevertheless we'd better follow the spec. Fix by Benjamin Tissoires.
- support for two quirky devices added by Josh Boyer.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: multitouch: fix touchpad buttons
HID: usbhid: fix build problem
HID: usbhid: quirk for MSI GX680R led panel
HID: usbhid: quirk for Realtek Multi-card reader
Linus Torvalds [Wed, 27 Mar 2013 16:25:11 +0000 (09:25 -0700)]
Merge tag 'iommu-fixes-v3.9-rc4' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Here are some fixes which have collected since Linux v3.9-rc1.
The most important one fixes a long-standing regressen which make
re-hotplugged devices unusable when AMD IOMMU is used.
The other patches fix build issues (build regression on OMAP and a
section mismatch). One patch just removes a duplicate header include."
* tag 'iommu-fixes-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Make sure dma_ops are set for hotplug devices
x86, io_apic: remove duplicated include from irq_remapping.c
iommu: OMAP: build only on OMAP2+
amd_iommu_init: remove __init from amd_iommu_erratum_746_workaround
Al Viro [Wed, 27 Mar 2013 15:20:30 +0000 (15:20 +0000)]
vfs/splice: Fix missed checks in new __kernel_write() helper
Commit
06ae43f34bcc ("Don't bother with redoing rw_verify_area() from
default_file_splice_from()") lost the checks to test existence of the
write/aio_write methods. My apologies ;-/
Eventually, we want that in fs/splice.c side of things (no point
repeating it for every buffer, after all), but for now this is the
obvious minimal fix.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Vrabel [Mon, 25 Mar 2013 14:11:19 +0000 (14:11 +0000)]
xen/events: avoid race with raising an event in unmask_evtchn()
In unmask_evtchn(), when the mask bit is cleared after testing for
pending and the event becomes pending between the test and clear, then
the upcall will not become pending and the event may be lost or
delayed.
Avoid this by always clearing the mask bit before checking for
pending. If a hypercall is needed, remask the event as
EVTCHNOP_unmask will only retrigger pending events if they were
masked.
This fixes a regression introduced in 3.7 by
b5e579232d635b79a3da052964cb357ccda8d9ea (xen/events: fix
unmask_evtchn for PV on HVM guests) which reordered the clear mask and
check pending operations.
Changes in v2:
- set mask before hypercall.
Cc: stable@vger.kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 22 Mar 2013 14:34:28 +0000 (10:34 -0400)]
xen/mmu: Move the setting of pvops.write_cr3 to later phase in bootup.
We move the setting of write_cr3 from the early bootup variant
(see git commit
0cc9129d75ef8993702d97ab0e49542c15ac6ab9
"x86-64, xen, mmu: Provide an early version of write_cr3.")
to a more appropiate location.
This new location sets all of the other non-early variants
of pvops calls - and most importantly is before the
alternative_asm mechanism kicks in.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 22 Mar 2013 14:15:47 +0000 (10:15 -0400)]
xen/acpi-stub: Disable it b/c the acpi_processor_add is no longer called.
With the Xen ACPI stub code (CONFIG_XEN_STUB=y) enabled, the power
C and P states are no longer uploaded to the hypervisor.
The reason is that the Xen CPU hotplug code: xen-acpi-cpuhotplug.c
and the xen-acpi-stub.c register themselves as the "processor" type object.
That means the generic processor (processor_driver.c) stops
working and it does not call (acpi_processor_add) which populates the
per_cpu(processors, pr->id) = pr;
structure. The 'pr' is gathered from the acpi_processor_get_info function
which does the job of finding the C-states and figuring out PBLK address.
The 'processors->pr' is then later used by xen-acpi-processor.c (the one that
uploads C and P states to the hypervisor). Since it is NULL, we end
skip the gathering of _PSD, _PSS, _PCT, etc and never upload the power
management data.
The end result is that enabling the CONFIG_XEN_STUB in the build means that
xen-acpi-processor is not working anymore.
This temporary patch fixes it by marking the XEN_STUB driver as
BROKEN until this can be properly fixed.
CC: jinsong.liu@intel.com
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Eric W. Biederman [Sun, 24 Mar 2013 21:28:27 +0000 (14:28 -0700)]
userns: Restrict when proc and sysfs can be mounted
Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.
proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.
Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.
In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).
Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Eric W. Biederman [Fri, 22 Mar 2013 01:13:15 +0000 (18:13 -0700)]
ipc: Restrict mounting the mqueue filesystem
Only allow mounting the mqueue filesystem if the caller has CAP_SYS_ADMIN
rights over the ipc namespace. The principle here is if you create
or have capabilities over it you can mount it, otherwise you get to live
with what other people have mounted.
This information is not particularly sensitive and mqueue essentially
only reports which posix messages queues exist. Still when creating a
restricted environment for an application to live any extra
information may be of use to someone with sufficient creativity. The
historical if imperfect way this information has been restricted has
been not to allow mounts and restricting this to ipc namespace
creators maintains the spirit of the historical restriction.
Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Eric W. Biederman [Fri, 22 Mar 2013 11:08:05 +0000 (04:08 -0700)]
vfs: Carefully propogate mounts across user namespaces
As a matter of policy MNT_READONLY should not be changable if the
original mounter had more privileges than creator of the mount
namespace.
Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
a mount namespace that requires more privileges to a mount namespace
that requires fewer privileges.
When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
if any of the mnt flags that should never be changed are set.
This protects both mount propagation and the initial creation of a less
privileged mount namespace.
Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Eric W. Biederman [Fri, 22 Mar 2013 10:10:15 +0000 (03:10 -0700)]
vfs: Add a mount flag to lock read only bind mounts
When a read-only bind mount is copied from mount namespace in a higher
privileged user namespace to a mount namespace in a lesser privileged
user namespace, it should not be possible to remove the the read-only
restriction.
Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
remain read-only.
CC: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Eric W. Biederman [Fri, 15 Mar 2013 08:45:51 +0000 (01:45 -0700)]
userns: Don't allow creation if the user is chrooted
Guarantee that the policy of which files may be access that is
established by setting the root directory will not be violated
by user namespaces by verifying that the root directory points
to the root of the mount namespace at the time of user namespace
creation.
Changing the root is a privileged operation, and as a matter of policy
it serves to limit unprivileged processes to files below the current
root directory.
For reasons of simplicity and comprehensibility the privilege to
change the root directory is gated solely on the CAP_SYS_CHROOT
capability in the user namespace. Therefore when creating a user
namespace we must ensure that the policy of which files may be access
can not be violated by changing the root directory.
Anyone who runs a processes in a chroot and would like to use user
namespace can setup the same view of filesystems with a mount
namespace instead. With this result that this is not a practical
limitation for using user namespaces.
Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Benjamin Tissoires [Fri, 22 Mar 2013 17:53:57 +0000 (18:53 +0100)]
HID: multitouch: fix touchpad buttons
Commit "HID: multitouch: use the callback "report" instead..." breaks the
buttons of touchpads following the HID multitouch specification.
The buttons were emmitted through hid-input, but as now the events
are generated only in hid-multitouch, the buttons are not emmitted anymore.
The input_event() call is far much simpler than the hid-input one as
many of the different tests do not apply to multitouch touchpads.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Joerg Roedel [Tue, 26 Mar 2013 21:48:23 +0000 (22:48 +0100)]
iommu/amd: Make sure dma_ops are set for hotplug devices
There is a bug introduced with commit
27c2127 that causes
devices which are hot unplugged and then hot-replugged to
not have per-device dma_ops set. This causes these devices
to not function correctly. Fixed with this patch.
Cc: stable@vger.kernel.org
Reported-by: Andreas Degert <andreas.degert@googlemail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Linus Torvalds [Wed, 27 Mar 2013 00:42:55 +0000 (17:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"stable fodder; assorted deadlock fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vt: synchronize_rcu() under spinlock is not nice...
Nest rename_lock inside vfsmount_lock
Don't bother with redoing rw_verify_area() from default_file_splice_from()
Al Viro [Wed, 27 Mar 2013 00:30:17 +0000 (20:30 -0400)]
vt: synchronize_rcu() under spinlock is not nice...
vcs_poll_data_free() calls unregister_vt_notifier(), which calls
atomic_notifier_chain_unregister(), which calls synchronize_rcu().
Do it *after* we'd dropped ->f_lock.
Cc: stable@vger.kernel.org (all kernels since 2.6.37)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 26 Mar 2013 22:25:57 +0000 (18:25 -0400)]
Nest rename_lock inside vfsmount_lock
... lest we get livelocks between path_is_under() and d_path() and friends.
The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
it is possible to have thread B spin on attempt to take lock shared while thread
A is already holding it shared, if B is on lower-numbered CPU than A and there's
a thread C spinning on attempt to take the same lock exclusive.
As the result, we need consistent ordering between vfsmount_lock (lglock) and
rename_lock (seq_lock), even though everything that takes both is going to take
vfsmount_lock only shared.
Spotted-by: Brad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Tue, 26 Mar 2013 21:24:29 +0000 (14:24 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Always increment IPV4 ID field in encapsulated GSO packets, even
when DF is set. Regression fix from Pravin B Shelar.
2) Fix per-net subsystem initialization in netfilter conntrack,
otherwise we may access dynamically allocated memory before it is
actually allocated. From Gao Feng.
3) Fix DMA buffer lengths in iwl3945 driver, from Stanislaw Gruszka.
4) Fix race between submission of sync vs async commands in mwifiex
driver, from Amitkumar Karwar.
5) Add missing cancel of command timer in mwifiex driver, from Bing
Zhao.
6) Missing SKB free in rtlwifi USB driver, from Jussi Kivilinna.
7) Thermal layer tries to use a genetlink multicast string that is
longer than the 16 character limit. Fix it and add a BUG check to
prevent this kind of thing from happening in the future.
From Masatake YAMATO.
8) Fix many bugs in the handling of the teardown of L2TP connections,
UDP encapsulation instances, and sockets. From Tom Parkin.
9) Missing socket release in IRDA, from Kees Cook.
10) Fix fec driver modular build, from Fabio Estevam.
11) Erroneous use of kfree() instead of free_netdev() in lantiq_etop,
from Wei Yongjun.
12) Fix bugs in handling of queue numbers and steering rules in mlx4
driver, from Moshe Lazer, Hadar Hen Zion, and Or Gerlitz.
13) Some FOO_DIAG_MAX constants were defined off by one, fix from Andrey
Vagin.
14) TCP segmentation deferral is unintentionally done too strongly,
breaking ACK clocking. Fix from Eric Dumazet.
15) net_enable_timestamp() can legitimately be invoked from software
interrupts, and in a way that is safe, so remove the WARN_ON().
Also from Eric Dumazet.
16) Fix use after free in VLANs, from Cong Wang.
17) Fix TCP slow start retransmit storms after SACK reneging, from
Yuchung Cheng.
18) Unix socket release should mark a socket dead before NULL'ing out
sock->sk, otherwise we can race. Fix from Paul Moore.
19) IPV6 addrconf code can try to free static memory, from Hong Zhiguo.
20) Fix register mis-programming, NULL pointer derefs, and wrong PHC
clock frequency in IGB driver. From Lior LevyAlex Williamson, Jiri
Benc, and Jeff Kirsher.
21) skb->ip_summed logic in pch_gbe driver is reversed, breaking packet
forwarding. Fix from Veaceslav Falico.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
ipv4: Fix ip-header identification for gso packets.
bonding: remove already created master sysfs link on failure
af_unix: dont send SCM_CREDENTIAL when dest socket is NULL
pch_gbe: fix ip_summed checksum reporting on rx
igb: fix PHC stopping on max freq
igb: make sensor info static
igb: SR-IOV init reordering
igb: Fix null pointer dereference
igb: fix i350 anti spoofing config
ixgbevf: don't release the soft entries
ipv6: fix bad free of addrconf_init_net
unix: fix a race condition in unix_release()
tcp: undo spurious timeout after SACK reneging
bnx2x: fix assignment of signed expression to unsigned variable
bridge: fix crash when set mac address of br interface
8021q: fix a potential use-after-free
net: remove a WARN_ON() in net_enable_timestamp()
tcp: preserve ACK clocking in TSO
net: fix *_DIAG_MAX constants
net/mlx4_core: Disallow releasing VF QPs which have steering rules
...
Linus Torvalds [Tue, 26 Mar 2013 21:23:45 +0000 (14:23 -0700)]
Merge tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix an NFSv4 idmapper regression
- Fix an Oops in the pNFS blocks client
- Fix up various issues with pNFS layoutcommit
- Ensure correct read ordering of variables in
rpc_wake_up_task_queue_locked
* tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked
NFSv4.1: Add a helper pnfs_commit_and_return_layout
NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
NFSv4.1: Fix a race in pNFS layoutcommit
pnfs-block: removing DM device maybe cause oops when call dev_remove
NFSv4: Fix the string length returned by the idmapper
Eric W. Biederman [Thu, 21 Mar 2013 09:30:41 +0000 (02:30 -0700)]
yama: Better permission check for ptraceme
Change the permission check for yama_ptrace_ptracee to the standard
ptrace permission check, testing if the traceer has CAP_SYS_PTRACE
in the tracees user namespace.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pravin B Shelar [Sun, 24 Mar 2013 17:36:29 +0000 (17:36 +0000)]
ipv4: Fix ip-header identification for gso packets.
ip-header id needs to be incremented even if IP_DF flag is set.
This behaviour was changed in commit
490ab08127cebc25e3a26
(IP_GRE: Fix IP-Identification).
Following patch fixes it so that identification is always
incremented.
Reported-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Tue, 26 Mar 2013 16:43:28 +0000 (17:43 +0100)]
bonding: remove already created master sysfs link on failure
If slave sysfs symlink failes to be created - we end up without removing
the master sysfs symlink. Remove it in case of failure.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Mon, 25 Mar 2013 17:02:04 +0000 (17:02 +0000)]
af_unix: dont send SCM_CREDENTIAL when dest socket is NULL
SCM_SCREDENTIALS should apply to write() syscalls only either source or destination
socket asserted SOCK_PASSCRED. The original implememtation in maybe_add_creds is wrong,
and breaks several LSB testcases ( i.e. /tset/LSB.os/netowkr/recvfrom/T.recvfrom).
Origionally-authored-by: Karel Srot <ksrot@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 26 Mar 2013 16:21:31 +0000 (12:21 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net
Jeff Kirsher says:
====================
This series contains updates to ixgbevf and igb.
The ixgbevf calls to pci_disable_msix() and to free the msix_entries
memory should not occur if device open fails. Instead they should be
called during device driver removal to balance with the call to
pci_enable_msix() and the call to allocate msix_entries memory
during the device probe and driver load.
The remaining 4 of 5 igb patches are simple 1-3 line patches to fix
several issues such as possible null pointer dereference, PHC stopping
on max frequency, make sensor info static and SR-IOV initialization
reordering.
The remaining igb patch to fix anti-spoofing config fixes a problem
in i350 where anti spoofing configuration was written into a wrong
register.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Mon, 25 Mar 2013 22:26:21 +0000 (22:26 +0000)]
pch_gbe: fix ip_summed checksum reporting on rx
skb->ip_summed should be CHECKSUM_UNNECESSARY when the driver reports that
checksums were correct and CHECKSUM_NONE in any other case. They're
currently placed vice versa, which breaks the forwarding scenario. Fix it
by placing them as described above.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 11 Mar 2013 14:21:28 +0000 (22:21 +0800)]
x86, io_apic: remove duplicated include from irq_remapping.c
Remove duplicated include.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Eric W. Biederman [Tue, 26 Mar 2013 09:27:11 +0000 (02:27 -0700)]
pid: Handle the exit of a multi-threaded init.
When a multi-threaded init exits and the initial thread is not the
last thread to exit the initial thread hangs around as a zombie
until the last thread exits. In that case zap_pid_ns_processes
needs to wait until there are only 2 hashed pids in the pid
namespace not one.
v2. Replace thread_pid_vnr(me) == 1 with the test thread_group_leader(me)
as suggested by Oleg.
Cc: stable@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Reported-by: Caj Larsson <caj@omnicloud.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Jiri Benc [Wed, 20 Mar 2013 09:06:34 +0000 (09:06 +0000)]
igb: fix PHC stopping on max freq
For 82576 MAC type, max_adj is reported as
1000000000 ppb. However, if
this value is passed to igb_ptp_adjfreq_82576, incvalue overflows out of
INCVALUE_82576_MASK, resulting in setting of zero TIMINCA.incvalue, stopping
the PHC (instead of going at twice the nominal speed).
Fix the advertised max_adj value to the largest value hardware can handle.
As there is no min_adj value available (-max_adj is used instead), this will
also prevent stopping the clock intentionally. It's probably not a big deal,
other igb MAC types don't support stopping the clock, either.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Stephen Hemminger [Wed, 20 Mar 2013 09:06:29 +0000 (09:06 +0000)]
igb: make sensor info static
Trivial sparse warning.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alex Williamson [Wed, 13 Mar 2013 15:50:29 +0000 (15:50 +0000)]
igb: SR-IOV init reordering
igb is ineffective at setting a lower total VFs because:
int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
{
...
/* Shouldn't change if VFs already enabled */
if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
return -EBUSY;
Swap init ordering.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alex Williamson [Wed, 13 Mar 2013 15:50:24 +0000 (15:50 +0000)]
igb: Fix null pointer dereference
The max_vfs= option has always been self limiting to the number of VFs
supported by the device.
fa44f2f1 added SR-IOV configuration via
sysfs, but in the process broke this self correction factor. The
failing path is:
igb_probe
igb_sw_init
if (max_vfs > 7) {
adapter->vfs_allocated_count = 7;
...
igb_probe_vfs
igb_enable_sriov(, max_vfs)
if (num_vfs > 7) {
err = -EPERM;
...
This leaves vfs_allocated_count = 7 and vf_data = NULL, so we bomb out
when igb_probe finally calls igb_reset. It seems like a really bad
idea, and somewhat pointless, to set vfs_allocated_count separate from
vf_data, but limiting max_vfs is enough to avoid the null pointer.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Lior Levy [Tue, 12 Mar 2013 15:49:32 +0000 (15:49 +0000)]
igb: fix i350 anti spoofing config
Fix a problem in i350 where anti spoofing configuration was written into a
wrong register.
Signed-off-by: Lior Levy <lior.levy@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
xunleer [Tue, 5 Mar 2013 07:44:20 +0000 (07:44 +0000)]
ixgbevf: don't release the soft entries
When the ixgbevf driver is opened the request to allocate MSIX irq
vectors may fail. In that case the driver will call ixgbevf_down()
which will call ixgbevf_irq_disable() to clear the HW interrupt
registers and calls synchronize_irq() using the msix_entries pointer in
the adapter structure. However, when the function to request the MSIX
irq vectors failed it had already freed the msix_entries which causes
an OOPs from using the NULL pointer in synchronize_irq().
The calls to pci_disable_msix() and to free the msix_entries memory
should not occur if device open fails. Instead they should be called
during device driver removal to balance with the call to
pci_enable_msix() and the call to allocate msix_entries memory
during the device probe and driver load.
Signed-off-by: Li Xun <xunleer.li@huawei.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Linus Torvalds [Tue, 26 Mar 2013 01:03:34 +0000 (18:03 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single bugfix which prevents that a non functional timer device is
selected to provide the fallback device, which is supposed to serve
timer interrupts on behalf of non functional devices ..."
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents: Don't allow dummy broadcast timers
Stuart Yoder [Fri, 22 Mar 2013 09:12:13 +0000 (09:12 +0000)]
powerpc: define the conditions where the ePAPR idle hcall can be supported
For 32-bit, CONFIG_EPAPR_PARAVIRT pulls in both epapr_paravirt.c
and epapr_hcalls.c which contains the 32-bit paravirt idle loop.
For 64-bit, the paravirt idle loop is in idle_book3e.S and that
source file is included only if CONFIG_PPC_BOOK3E_64 defined.
This patch makes that dependency for 64-bit explicit.
Fixes these build errors:
arch/powerpc/kernel/built-in.o: In function `restore_pblist_ptr':
ftrace.c:(.toc+0xdc0): undefined reference to `epapr_ev_idle_start'
ftrace.c:(.toc+0xdd0): undefined reference to `epapr_ev_idle'
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Hong Zhiguo [Mon, 25 Mar 2013 17:52:45 +0000 (01:52 +0800)]
ipv6: fix bad free of addrconf_init_net
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Mon, 25 Mar 2013 03:18:33 +0000 (03:18 +0000)]
unix: fix a race condition in unix_release()
As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned. This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned. This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.
Dave, I think this one should go on the -stable pile.
Special thanks to Jan for coming up with a reproducer for this
problem.
Reported-by: Jan Stancek <jan.stancek@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 25 Mar 2013 16:44:39 +0000 (09:44 -0700)]
Merge tag 'rdma-for-linus' of git://git./linux/kernel/git/roland/infiniband
Pull infiniband/rdma fixes from Roland Dreier:
"Small batch of InfiniBand/RDMA fixes for 3.9:
- Fix for TX lockup in IPoIB
- QLogic -> Intel update for qib driver
- Small static checker fix for qib
- Fix error path return value in cxgb4"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/qib: change QLogic to Intel
IB/ipath: Silence a static checker warning
IPoIB: Fix send lockup due to missed TX completion
RDMA/cxgb4: Fix error return code in create_qp()
Linus Torvalds [Mon, 25 Mar 2013 16:26:10 +0000 (09:26 -0700)]
Merge tag 'fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC bug fixes from Arnd Bergmann:
"Four patches for arm-soc this week:
- Kevin Hilman is no longer reachable under his previous email
address. He submitted the patch earlier, but nobody felt
responsible to pick it up.
- One Tegra fix for an incorect register address in device tree.
- IMX multiplatform support exposes a configuration option that leads
to unbootable kernels on all other machines and that needs to
depend on that platform.
- A nontrivial bug fix for the setup of the mxs video output."
* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
MAINTAINERS: update email address for Kevin Hilman
ARM: tegra: fix register address of slink controller
ARM: imx: add dependency check for DEBUG_IMX_UART_PORT
ARM: video: mxs: Fix mxsfb misconfiguring VDCTRL0
Linus Torvalds [Mon, 25 Mar 2013 16:25:12 +0000 (09:25 -0700)]
Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from J Bruce Fields:
"Fixes for a couple mistakes in the new DRC code. And thanks to Kent
Overstreet for noticing we've been sync'ing the wrong range on stable
writes since 3.8."
* 'for-3.9' of git://linux-nfs.org/~bfields/linux:
nfsd: fix bad offset use
nfsd: fix startup order in nfsd_reply_cache_init
nfsd: only unhash DRC entries that are in the hashtable
Trond Myklebust [Mon, 25 Mar 2013 15:23:40 +0000 (11:23 -0400)]
SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked
We need to be careful when testing task->tk_waitqueue in
rpc_wake_up_task_queue_locked, because it can be changed while we
are holding the queue->lock.
By adding appropriate memory barriers, we can ensure that it is safe to
test task->tk_waitqueue for equality if the RPC_TASK_QUEUED bit is set.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Linus Torvalds [Mon, 25 Mar 2013 09:57:32 +0000 (02:57 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Exynos and Intel fixes.
The intel fixes are fairly straightforward, mostly reverts due to bugs
found. The exynos one is a big larger since they found some issues
with the G2D engine and iommu interaction, and needed to verify the
operations a lot better than they were previously, otherwise a user
app can just crash the kernel with an iommu fault."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
Revert "drm/i915: write backlight harder"
drm/i915: don't disable the power well yet
Revert "drm/i915: set TRANSCODER_EDP even earlier"
drm/exynos: Check g2d cmd list for g2d restrictions
drm/exynos: Add a new function to get gem buffer size
drm/exynos: Deal with g2d buffer info more efficiently
drm/exynos: Clean up some G2D codes for readability
drm/exynos: Fix G2D core malfunctioning issue
drm/exynos: clear node object type at gem unmap
drm/exynos: Fix error routine to getting dma addr.
drm/exynos: Replaced kzalloc & memcpy with kmemdup
drm/exynos: fimd: calculate the correct address offset
drm/exynos: Make mixer_check_timing static
drm/exynos: modify the compatible string for exynos fimd
Chen Gang [Mon, 25 Mar 2013 01:31:31 +0000 (09:31 +0800)]
powerpc: make additional room in exception vector area
The FWNMI region is fixed at 0x7000 and the vector are now overflowing
that with allmodconfig. Fix that by moving slb_miss_realmode code out
of that region as it doesn't need to be that close to the call sites
(it is a _GLOBAL function)
Fixes this build error:
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1304: Error: attempt to move .org backwards
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Dave Airlie [Mon, 25 Mar 2013 02:20:00 +0000 (12:20 +1000)]
Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into HEAD
Daniel writes:
"Just three revert/disable by default patches, one of them cc: stable
(since the offending commit was cc: stable, too)."
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
Revert "drm/i915: write backlight harder"
drm/i915: don't disable the power well yet
Revert "drm/i915: set TRANSCODER_EDP even earlier"
Dave Airlie [Mon, 25 Mar 2013 02:19:10 +0000 (12:19 +1000)]
Merge branch 'exynos-drm-fixes' of git://git./linux/kernel/git/daeinki/drm-exynos into HEAD
Inki writes:
Includes bug fixes and code cleanups.
And it considers some restrictions to G2D hardware.
With this, the malfunction and page fault issues to g2d driver
would be fixed.
* 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: Check g2d cmd list for g2d restrictions
drm/exynos: Add a new function to get gem buffer size
drm/exynos: Deal with g2d buffer info more efficiently
drm/exynos: Clean up some G2D codes for readability
drm/exynos: Fix G2D core malfunctioning issue
drm/exynos: clear node object type at gem unmap
drm/exynos: Fix error routine to getting dma addr.
drm/exynos: Replaced kzalloc & memcpy with kmemdup
drm/exynos: fimd: calculate the correct address offset
drm/exynos: Make mixer_check_timing static
drm/exynos: modify the compatible string for exynos fimd
Yuchung Cheng [Sun, 24 Mar 2013 10:42:25 +0000 (10:42 +0000)]
tcp: undo spurious timeout after SACK reneging
On SACK reneging the sender immediately retransmits and forces a
timeout but disables Eifel (undo). If the (buggy) receiver does not
drop any packet this can trigger a false slow-start retransmit storm
driven by the ACKs of the original packets. This can be detected with
undo and TCP timestamps.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Amit Mehta [Sat, 23 Mar 2013 20:10:25 +0000 (20:10 +0000)]
bnx2x: fix assignment of signed expression to unsigned variable
fix for incorrect assignment of signed expression to unsigned variable.
Signed-off-by: Kumar Amit Mehta <gmate.amit@gmail.com>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hong zhi guo [Sat, 23 Mar 2013 02:27:50 +0000 (02:27 +0000)]
bridge: fix crash when set mac address of br interface
When I tried to set mac address of a bridge interface to a mac
address which already learned on this bridge, I got system hang.
The cause is straight forward: function br_fdb_change_mac_address
calls fdb_insert with NULL source nbp. Then an fdb lookup is
performed. If an fdb entry is found and it's local, it's OK. But
if it's not local, source is dereferenced for printk without NULL
check.
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Fri, 22 Mar 2013 19:14:07 +0000 (19:14 +0000)]
8021q: fix a potential use-after-free
vlan_vid_del() could possibly free ->vlan_info after a RCU grace
period, however, we may still refer to the freed memory area
by 'grp' pointer. Found by code inspection.
This patch moves vlan_vid_del() as behind as possible.
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 22 Mar 2013 14:38:28 +0000 (14:38 +0000)]
net: remove a WARN_ON() in net_enable_timestamp()
The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false
positive, in socket clone path, run from softirq context :
[ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80()
[ 3641.668811] Call Trace:
[ 3641.671254] <IRQ> [<
ffffffff80286817>] warn_slowpath_common+0x87/0xc0
[ 3641.677871] [<
ffffffff8028686a>] warn_slowpath_null+0x1a/0x20
[ 3641.683683] [<
ffffffff80742f8b>] net_enable_timestamp+0x7b/0x80
[ 3641.689668] [<
ffffffff80732ce5>] sk_clone_lock+0x425/0x450
[ 3641.695222] [<
ffffffff8078db36>] inet_csk_clone_lock+0x16/0x170
[ 3641.701213] [<
ffffffff807ae449>] tcp_create_openreq_child+0x29/0x820
[ 3641.707663] [<
ffffffff807d62e2>] ? ipt_do_table+0x222/0x670
[ 3641.713354] [<
ffffffff807aaf5b>] tcp_v4_syn_recv_sock+0xab/0x3d0
[ 3641.719425] [<
ffffffff807af63a>] tcp_check_req+0x3da/0x530
[ 3641.724979] [<
ffffffff8078b400>] ? inet_hashinfo_init+0x60/0x80
[ 3641.730964] [<
ffffffff807ade6f>] ? tcp_v4_rcv+0x79f/0xbe0
[ 3641.736430] [<
ffffffff807ab9bd>] tcp_v4_do_rcv+0x38d/0x4f0
[ 3641.741985] [<
ffffffff807ae14a>] tcp_v4_rcv+0xa7a/0xbe0
Its safe at this point because the parent socket owns a reference
on the netstamp_needed, so we cant have a 0 -> 1 transition, which
requires to lock a mutex.
Instead of refining the check, lets remove it, as all known callers
are safe. If it ever changes in the future, static_key_slow_inc()
will complain anyway.
Reported-by: Laurent Chavey <chavey@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 24 Mar 2013 17:11:29 +0000 (10:11 -0700)]
Merge tag 'pinctrl-fixes-for-v3.9' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pinctrl fixes from Linus Walleij:
"Here are a few pinctrl fixes for the v3.9 rc series:
- Usecount bounds checking so we do not go below zero on mux
usecounts.
- Loop range checking in GPIO ranges in the DT range parser.
- Proper print in debugfs for pinconf state.
- Fix compilation bug in generic pinconf code.
- Minor bugfixes to abx500 and mvebu drivers."
* tag 'pinctrl-fixes-for-v3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinmux: forbid mux_usecount to be set at UINT_MAX
pinctrl: mvebu: fix checking for SoC specific controls
pinctrl: generic: Fix compilation error
pinctrl: Print the correct information in debugfs pinconf-state file
pinctrl: abx500: Fix checking if pin use AlternateFunction register
gpio: fix wrong checking condition for gpio range
Linus Torvalds [Sun, 24 Mar 2013 17:10:34 +0000 (10:10 -0700)]
Merge branch 'x86/urgent' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"A collection of minor fixes, more EFI variables paranoia
(anti-bricking) plus the ability to disable the pstore either as a
runtime default or completely, due to bricking concerns."
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efivars: Fix check for CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE
x86, microcode_intel_early: Mark apply_microcode_early() as cpuinit
efivars: Handle duplicate names from get_next_variable()
efivars: explicitly calculate length of VariableName
efivars: Add module parameter to disable use as a pstore backend
efivars: Allow disabling use as a pstore backend
x86-32, microcode_intel_early: Fix crash with CONFIG_DEBUG_VIRTUAL
x86-64: Fix the failure case in copy_user_handle_tail()
Daniel Vetter [Fri, 22 Mar 2013 14:44:46 +0000 (15:44 +0100)]
Revert "drm/i915: write backlight harder"
This reverts commit
cf0a6584aa6d382f802f2c3cacac23ccbccde0cd.
Turns out that cargo-culting breaks systems. Note that we can't revert
further, since
commit
770c12312ad617172b1a65b911d3e6564fc5aca8
Author: Takashi Iwai <tiwai@suse.de>
Date: Sat Aug 11 08:56:42 2012 +0200
drm/i915: Fix blank panel at reopening lid
fixed a regression in 3.6-rc kernels for which we've never figured out
the exact root cause. But some further inspection of the backlight
code reveals that it's seriously lacking locking. And especially the
asle backlight update is know to get fired (through some smm magic)
when writing specific backlight control registers. So the possibility
of suffering from races is rather real.
Until those races are fixed I don't think it makes sense to try
further hacks. Which sucks a bit, but sometimes that's how it is :(
References: http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg18788.html
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47941
Tested-by: Takashi Iwai <tiwai@suse.de>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: stable@vger.kernel.org (the reverted commit was cc: stable, too)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Paulo Zanoni [Fri, 22 Mar 2013 17:07:23 +0000 (14:07 -0300)]
drm/i915: don't disable the power well yet
We're still not 100% ready to disable the power well, so don't disable
it for now. When we disable it we break the audio driver (because some
of the audio registers are on the power well) and machines with eDP on
port D (because it doesn't use TRANSCODER_EDP).
Also, instead of just reverting the code, add a Kernel option to let
us disable it if we want. This will allow us to keep developing and
testing the feature while it's not enabled.
This fixes problems caused by the following commit:
commit
d6dd9eb1d96d2b7345fe4664066c2b7ed86da898
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Jan 29 16:35:20 2013 -0200
drm/i915: dynamic Haswell display power well support
References: http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg18788.html
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Mengdong Lin <mengdong.lin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel Vetter [Fri, 22 Mar 2013 09:53:40 +0000 (10:53 +0100)]
Revert "drm/i915: set TRANSCODER_EDP even earlier"
This reverts commit
cc464b2a17c59adedbdc02cc54341d630354edc3.
The reason is that Takashi Iwai reported a regression bisected to this
commit:
http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg18788.html
His machine has eDP on port D (usual desktop all-in-on setup), which
intel_dp.c identifies as an eDP panel, but the hsw ddi code
mishandles.
Closer inspection of the code reveals that haswell_crtc_mode_set also
checks intel_encoder_is_pch_edp when setting is_cpu_edp. On haswell
that doesn't make much sense (since there's no edp on the pch), but
what this function _really_ checks is whether that edp connector is on
port A or port D. It's just that on ilk-ivb port D was on the pch ...
So that explains why this seemingly innocent change killed eDP on port
D. Furthermore it looks like everything else accidentally works, since
we've never enabled eDP on port D support for hsw intentionally (e.g.
we still register the HDMI output for port D in that case).
But in retrospective I also don't like that this leaks highly platform
specific details into common code, and the reason is that the drm
vblank layer sucks. So instead I think we should:
- move the cpu_transcoder into the dynamic pipe_config tracking (once
that's merged).
- fix up the drm vblank layer to finally deal with kms crtc objects
instead of int pipes.
v2: Pimp commit message with the better diagnosis as discussed with
Paulo on irc.
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
H. Peter Anvin [Sun, 24 Mar 2013 04:49:51 +0000 (21:49 -0700)]
Merge tag 'efi-for-3.9-rc4' into x86/urgent
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Linus Torvalds [Sat, 23 Mar 2013 23:52:44 +0000 (16:52 -0700)]
Linux 3.9-rc4
Linus Torvalds [Sat, 23 Mar 2013 23:51:55 +0000 (16:51 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
"These are mostly minor fixes this time around. The iscsi-target CHAP
big-endian bugfix and bump FD_MAX_SECTORS=2048 default patch to allow
1MB sized I/Os for FILEIO backends on >= v3.5 code are both CC'ed to
stable.
Also, there is a persistent reservations regression that has recently
been reported for >= v3.8.x code, that is currently being tracked down
for v3.9."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target/pscsi: Reject cross page boundary case in pscsi_map_sg
target/file: Bump FD_MAX_SECTORS to 2048 to handle 1M sized I/Os
tcm_vhost: Flush vhost_work in vhost_scsi_flush()
tcm_vhost: Add missed lock in vhost_scsi_clear_endpoint()
target: fix possible memory leak in core_tpg_register()
target/iscsi: Fix mutual CHAP auth on big-endian arches
target_core_sbc: use noop for SYNCHRONIZE_CACHE
Linus Torvalds [Sat, 23 Mar 2013 22:49:49 +0000 (15:49 -0700)]
Merge tag 'md-3.9-fixes' of git://neil.brown.name/md
Pull md fixes from NeilBrown:
"A few bugfixes for md
- recent regressions in raid5
- recent regressions in dmraid
- a few instances of CONFIG_MULTICORE_RAID456 linger
Several tagged for -stable"
* tag 'md-3.9-fixes' of git://neil.brown.name/md:
md: remove CONFIG_MULTICORE_RAID456 entirely
md/raid5: ensure sync and DISCARD don't happen at the same time.
MD: Prevent sysfs operations on uninitialized kobjects
MD RAID5: Avoid accessing gendisk or queue structs when not available
md/raid5: schedule_construction should abort if nothing to do.
Linus Torvalds [Sat, 23 Mar 2013 19:33:36 +0000 (12:33 -0700)]
Merge tag 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev
Pull libata updates from Jeff Garzik:
"Simple stuff. See one-line summaries."
* tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
pata_samsung_cf: use module_platform_driver_probe()
[libata] Avoid specialized TLA's in ZPODD's Kconfig
libata-acpi.c: fix copy and paste mistake in ata_acpi_register_power_resource
sata_fsl: Remove redundant NULL check before kfree
ahci: Add Device IDs for Intel Wellsburg PCH
ata_piix: Add MODULE_PARM_DESC to prefer_ms_hyperv
Linus Torvalds [Sat, 23 Mar 2013 19:32:14 +0000 (12:32 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"One bugfix for the tegra driver. Two updates regarding email
addresses and MAINTAINERS which I like to have up-to-date so people
can be reached immediately. While we are here, there is on PCI_ID
addition."
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: add maintainer entry for atmel i2c driver
i2c: Fix my e-mail address in drivers and documentation
i2c: iSMT: add Intel Avoton DeviceIDs
i2c: tegra: check the clk_prepare_enable() return value
Linus Torvalds [Sat, 23 Mar 2013 19:30:39 +0000 (12:30 -0700)]
Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
"Fix a boot issues and correct the AcpiMmioSel bitmask in the
sp5100_tco watchdog device driver"
* git://www.linux-watchdog.org/linux-watchdog:
watchdog: sp5100_tco: Set the AcpiMmioSel bitmask value to 1 instead of 2
watchdog: sp5100_tco: Remove code that may cause a boot failure
Torsten Duwe [Sat, 23 Mar 2013 14:39:34 +0000 (15:39 +0100)]
KMS: fix EDID detailed timing frame rate
When KMS has parsed an EDID "detailed timing", it leaves the frame rate
zeroed. Consecutive (debug-) output of that mode thus yields 0 for
vsync. This simple fix also speeds up future invocations of
drm_mode_vrefresh().
While it is debatable whether this qualifies as a -stable fix I'd apply
it for consistency's sake; drm_helper_probe_single_connector_modes()
does the same thing already for all probed modes.
Cc: stable@vger.kernel.org
Signed-off-by: Torsten Duwe <duwe@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Torsten Duwe [Sat, 23 Mar 2013 14:38:22 +0000 (15:38 +0100)]
KMS: fix EDID detailed timing vsync parsing
EDID spreads some values across multiple bytes; bit-fiddling is needed
to retrieve these. The current code to parse "detailed timings" has a
cut&paste error that results in a vsync offset of at most 15 lines
instead of 63.
See
http://en.wikipedia.org/wiki/EDID
and in the "EDID Detailed Timing Descriptor" see bytes 10+11 show why
that needs to be a left shift.
Cc: stable@vger.kernel.org
Signed-off-by: Torsten Duwe <duwe@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland Dreier [Sat, 23 Mar 2013 01:08:03 +0000 (18:08 -0700)]
Merge branches 'cxgb4', 'ipoib' and 'qib' into for-next
Vinit Agnihotri [Thu, 14 Mar 2013 18:13:41 +0000 (18:13 +0000)]
IB/qib: change QLogic to Intel
These changes modify the qib driver as part of acquiring
the InfiniBand assets of QLogic.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Dan Carpenter [Mon, 18 Mar 2013 20:25:26 +0000 (20:25 +0000)]
IB/ipath: Silence a static checker warning
I have a static checker which complains that 0x255 is too high for
the "dev->opstats[opcode]" array. It turns out that the hardware
has already validated the opcode at this point so it can't actually
overflow.
However, silencing the warning is good and this matches how the
opcode is treated in qib_ib_rcv() as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Mike Marciniszyn [Tue, 26 Feb 2013 15:46:27 +0000 (15:46 +0000)]
IPoIB: Fix send lockup due to missed TX completion
Commit
f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.
The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.
This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq(). If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.
This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.
Cc: <stable@vger.kernel.org>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Wei Yongjun [Fri, 15 Mar 2013 09:42:12 +0000 (09:42 +0000)]
RDMA/cxgb4: Fix error return code in create_qp()
Fix to return a negative error code from the error handling case
instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Linus Torvalds [Fri, 22 Mar 2013 23:43:53 +0000 (16:43 -0700)]
Merge git://git.infradead.org/users/willy/linux-nvme
Pull NVMe driver update from Matthew Wilcox:
"These patches have mostly been baking for a few months; sorry I didn't
get them in during the merge window. They're all bug fixes, except
for the addition of the SMART log and the addition to MAINTAINERS."
* git://git.infradead.org/users/willy/linux-nvme:
NVMe: Add namespaces with no LBA range feature
MAINTAINERS: Add entry for the NVMe driver
NVMe: Initialize iod nents to 0
NVMe: Define SMART log
NVMe: Add result to nvme_get_features
NVMe: Set result from user admin command
NVMe: End queued bio requests when freeing queue
NVMe: Free cmdid on nvme_submit_bio error
Linus Torvalds [Fri, 22 Mar 2013 23:41:44 +0000 (16:41 -0700)]
Merge branch 'akpm' (fixes from Andrew)
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mqueue: sys_mq_open: do not call mnt_drop_write() if read-only
mm/hotplug: only free wait_table if it's allocated by vmalloc
dma-debug: update DMA debug API to better handle multiple mappings of a buffer
dma-debug: fix locking bug in check_unmap()
drivers/rtc/rtc-at91rm9200.c: use a variable for storing IMR
drivers/video/ep93xx-fb.c: include <linux/io.h> for devm_ioremap()
drivers/rtc/rtc-da9052.c: fix for rtc device registration
mm: zone_end_pfn is too small
poweroff: change orderly_poweroff() to use schedule_work()
mm/hugetlb: fix total hugetlbfs pages count when using memory overcommit accouting
printk: Provide a wake_up_klogd() off-case
irq_work.h: fix warning when CONFIG_IRQ_WORK=n
Vladimir Davydov [Fri, 22 Mar 2013 22:04:51 +0000 (15:04 -0700)]
mqueue: sys_mq_open: do not call mnt_drop_write() if read-only
mnt_drop_write() must be called only if mnt_want_write() succeeded,
otherwise the mnt_writers counter will diverge.
mnt_writers counters are used to check if remounting FS as read-only is
OK, so after an extra mnt_drop_write() call, it would be impossible to
remount mqueue FS as read-only. Besides, on umount a warning would be
printed like this one:
=====================================
[ BUG: bad unlock balance detected! ]
3.9.0-rc3 #5 Not tainted
-------------------------------------
a.out/12486 is trying to release lock (sb_writers) at:
mnt_drop_write+0x1f/0x30
but there are no more locks to release!
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jianguo Wu [Fri, 22 Mar 2013 22:04:50 +0000 (15:04 -0700)]
mm/hotplug: only free wait_table if it's allocated by vmalloc
zone->wait_table may be allocated from bootmem, it can not be freed.
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Duyck [Fri, 22 Mar 2013 22:04:49 +0000 (15:04 -0700)]
dma-debug: update DMA debug API to better handle multiple mappings of a buffer
There were reports of the igb driver unmapping buffers without calling
dma_mapping_error. On closer inspection issues were found in the DMA
debug API and how it handled multiple mappings of the same buffer.
The issue I found is the fact that the debug_dma_mapping_error would
only set the map_err_type to MAP_ERR_CHECKED in the case that the was
only one match for device and device address. However in the case of
non-IOMMU, multiple addresses existed and as a result it was not setting
this field once a second mapping was instantiated. I have resolved this
by changing the search so that it instead will now set MAP_ERR_CHECKED
on the first buffer that matches the device and DMA address that is
currently in the state MAP_ERR_NOT_CHECKED.
A secondary side effect of this patch is that in the case of multiple
buffers using the same address only the last mapping will have a valid
map_err_type. The previous mappings will all end up with map_err_type
set to MAP_ERR_CHECKED because of the dma_mapping_error call in
debug_dma_map_page. However this behavior may be preferable as it means
you will likely only see one real error per multi-mapped buffer, versus
the current behavior of multiple false errors mer multi-mapped buffer.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Reviewed-by: Shuah Khan <shuah.khan@hp.com>
Tested-by: Shuah Khan <shuah.khan@hp.com>
Cc: Jakub Kicinski <kubakici@wp.pl>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Duyck [Fri, 22 Mar 2013 22:04:48 +0000 (15:04 -0700)]
dma-debug: fix locking bug in check_unmap()
In check_unmap() it is possible to get into a dead-locked state if
dma_mapping_error is called. The problem is that the bucket is locked in
check_unmap, and locked again by debug_dma_mapping_error which is called
by dma_mapping_error. To resolve that we must release the lock on the
bucket before making the call to dma_mapping_error.
[akpm@linux-foundation.org: restore 80-col trickery to be consistent with the rest of the file]
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Reviewed-by: Shuah Khan <shuah.khan@hp.com>
Tested-by: Shuah Khan <shuah.khan@hp.com>
Cc: Jakub Kicinski <kubakici@wp.pl>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Ferre [Fri, 22 Mar 2013 22:04:47 +0000 (15:04 -0700)]
drivers/rtc/rtc-at91rm9200.c: use a variable for storing IMR
On some revisions of AT91 SoCs, the RTC IMR register is not working.
Instead of elaborating a workaround for that specific SoC or IP version,
we simply use a software variable to store the Interrupt Mask Register
and modify it for each enabling/disabling of an interrupt. The overhead
of this is negligible anyway.
The interrupt mask register (IMR) for the RTC is broken on the AT91SAM9x5
sub-family of SoCs (good overview of the members here:
http://www.eewiki.net/display/linuxonarm/AT91SAM9x5 ). The "user visible
effect" is the RTC doesn't work.
That sub-family is less than two years old and only has devicetree (DT)
support and came online circa lk 3.7 . The dust is yet to settle on the
DT stuff at least for AT91 SoCs (translation: lots of stuff is still
broken, so much that it is hard to know where to start).
The fix in the patch is pretty simple: just shadow the silicon IMR
register with a variable in the driver. Some older SoCs (pre-DT) use the
the rtc-at91rm9200 driver (e.g. obviously the AT91RM9200) and they should
not be impacted by the change. There shouldn't be a large volume of
interrupts associated with a RTC.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Reported-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
H Hartley Sweeten [Fri, 22 Mar 2013 22:04:45 +0000 (15:04 -0700)]
drivers/video/ep93xx-fb.c: include <linux/io.h> for devm_ioremap()
Commit
be8678149701 ("drivers/video/ep93xx-fb.c: use devm_ functions")
introduced a build error:
drivers/video/ep93xx-fb.c: In function 'ep93xxfb_probe':
drivers/video/ep93xx-fb.c:532: error: implicit declaration of function 'devm_ioremap'
drivers/video/ep93xx-fb.c:533: warning: assignment makes pointer from integer without a cast
Include <linux/io.h> to pickup the declaration of 'devm_ioremap'.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Acked-by: Ryan Mallon <rmallon@gmail.com>
Cc: Damien Cassou <damien.cassou@lifl.fr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ashish Jangam [Fri, 22 Mar 2013 22:04:44 +0000 (15:04 -0700)]
drivers/rtc/rtc-da9052.c: fix for rtc device registration
Add support for the virtual irq since now MFD only handles virtual irq
Without this patch rtc device will fail in registration.
(akpm: Ashish has a different version whcih will be needed for 3.8.x and
earlier kernels)
Signed-off-by: Ashish <ashish.jangam@kpitcummins.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russ Anderson [Fri, 22 Mar 2013 22:04:43 +0000 (15:04 -0700)]
mm: zone_end_pfn is too small
Booting with 32 TBytes memory hits BUG at mm/page_alloc.c:552! (output
below).
The key hint is "page
4294967296 outside zone".
4294967296 = 0x100000000 (bit 32 is set).
The problem is in include/linux/mmzone.h:
530 static inline unsigned zone_end_pfn(const struct zone *zone)
531 {
532 return zone->zone_start_pfn + zone->spanned_pages;
533 }
zone_end_pfn is "unsigned" (32 bits). Changing it to "unsigned long"
(64 bits) fixes the problem.
zone_end_pfn() was added recently in commit
108bcc96ef70 ("mm: add & use
zone_end_pfn() and zone_spans_pfn()")
Output from the failure.
No AGP bridge found
page
4294967296 outside zone [
4294967296 -
4327469056 ]
------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:552!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU 0
Pid: 0, comm: swapper Not tainted 3.9.0-rc2.dtp+ #10
RIP: free_one_page+0x382/0x430
Process swapper (pid: 0, threadinfo
ffffffff81942000, task
ffffffff81955420)
Call Trace:
__free_pages_ok+0x96/0xb0
__free_pages+0x25/0x50
__free_pages_bootmem+0x8a/0x8c
__free_memory_core+0xea/0x131
free_low_memory_core_early+0x4a/0x98
free_all_bootmem+0x45/0x47
mem_init+0x7b/0x14c
start_kernel+0x216/0x433
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x144/0x153
Code: 89 f1 ba 01 00 00 00 31 f6 d3 e2 4c 89 ef e8 66 a4 01 00 e9 2c fe ff ff 0f 0b eb fe 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 eb f3 <0f> 0b eb fe 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 0f 0b eb fe 49
Signed-off-by: Russ Anderson <rja@sgi.com>
Reported-by: George Beshers <gbeshers@sgi.com>
Acked-by: Hedi Berriche <hedi@sgi.com>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 22 Mar 2013 22:04:41 +0000 (15:04 -0700)]
poweroff: change orderly_poweroff() to use schedule_work()
David said:
Commit
6c0c0d4d1080 ("poweroff: fix bug in orderly_poweroff()")
apparently fixes one bug in orderly_poweroff(), but introduces
another. The comments on orderly_poweroff() claim it can be called
from any context - and indeed we call it from interrupt context in
arch/powerpc/platforms/pseries/ras.c for example. But since that
commit this is no longer safe, since call_usermodehelper_fns() is not
safe in interrupt context without the UMH_NO_WAIT option.
orderly_poweroff() can be used from any context but UMH_WAIT_EXEC is
sleepable. Move the "force" logic into __orderly_poweroff() and change
orderly_poweroff() to use the global poweroff_work which simply calls
__orderly_poweroff().
While at it, remove the unneeded "int argc" and change argv_split() to
use GFP_KERNEL.
We use the global "bool poweroff_force" to pass the argument, this can
obviously affect the previous request if it is pending/running. So we
only allow the "false => true" transition assuming that the pending
"true" should succeed anyway. If schedule_work() fails after that we
know that work->func() was not called yet, it must see the new value.
This means that orderly_poweroff() becomes async even if we do not run
the command and always succeeds, schedule_work() can only fail if the
work is already pending. We can export __orderly_poweroff() and change
the non-atomic callers which want the old semantics.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Feng Hong <hongfeng@marvell.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Fri, 22 Mar 2013 22:04:40 +0000 (15:04 -0700)]
mm/hugetlb: fix total hugetlbfs pages count when using memory overcommit accouting
hugetlb_total_pages is used for overcommit calculations but the current
implementation considers only the default hugetlb page size (which is
either the first defined hugepage size or the one specified by
default_hugepagesz kernel boot parameter).
If the system is configured for more than one hugepage size, which is
possible since commit
a137e1cc6d6e ("hugetlbfs: per mount huge page
sizes") then the overcommit estimation done by __vm_enough_memory()
(resp. shown by meminfo_proc_show) is not precise - there is an
impression of more available/allowed memory. This can lead to an
unexpected ENOMEM/EFAULT resp. SIGSEGV when memory is accounted.
Testcase:
boot: hugepagesz=1G hugepages=1
the default overcommit ratio is 50
before patch:
egrep 'CommitLimit' /proc/meminfo
CommitLimit:
55434168 kB
after patch:
egrep 'CommitLimit' /proc/meminfo
CommitLimit:
54909880 kB
[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frederic Weisbecker [Fri, 22 Mar 2013 22:04:39 +0000 (15:04 -0700)]
printk: Provide a wake_up_klogd() off-case
wake_up_klogd() is useless when CONFIG_PRINTK=n because neither printk()
nor printk_sched() are in use and there are actually no waiter on
log_wait waitqueue. It should be a stub in this case for users like
bust_spinlocks().
Otherwise this results in this warning when CONFIG_PRINTK=n and
CONFIG_IRQ_WORK=n:
kernel/built-in.o In function `wake_up_klogd':
(.text.wake_up_klogd+0xb4): undefined reference to `irq_work_queue'
To fix this, provide an off-case for wake_up_klogd() when
CONFIG_PRINTK=n.
There is much more from console_unlock() and other console related code
in printk.c that should be moved under CONFIG_PRINTK. But for now,
focus on a minimal fix as we passed the merged window already.
[akpm@linux-foundation.org: include printk.h in bust_spinlocks.c]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: James Hogan <james.hogan@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
James Hogan [Fri, 22 Mar 2013 22:04:37 +0000 (15:04 -0700)]
irq_work.h: fix warning when CONFIG_IRQ_WORK=n
A randconfig caught repeated compiler warnings when CONFIG_IRQ_WORK=n
due to the definition of a non-inline static function in
<linux/irq_work.h>:
include/linux/irq_work.h +40 : warning: 'irq_work_needs_cpu' defined but not used
Make it inline to supress the warning. This is caused commit
00b42959106a ("irq_work: Don't stop the tick with pending works") merged
in v3.9-rc1.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Takahisa Tanaka [Sun, 3 Mar 2013 05:48:00 +0000 (14:48 +0900)]
watchdog: sp5100_tco: Set the AcpiMmioSel bitmask value to 1 instead of 2
The AcpiMmioSel bit is bit 1 in the AcpiMmioEn register, but the current
sp5100_tco driver is using bit 2.
See 2.3.3 Power Management (PM) Registers page 150 of the
AMD SB800-Series Southbridges Register Reference Guide [1].
AcpiMmioEn - RW – 8/16/32 bits - [PM_Reg: 24h]
Field Name Bits Default Description
AcpiMMioDecodeEn 0 0b Set to 1 to enable AcpiMMio space.
AcpiMMIoSel 1 0b Set AcpiMMio registers to be memory-mapped or IO-mapped space.
0: Memory-mapped space
1: I/O-mapped space
The sp5100_tco driver expects zero as a value of AcpiMmioSel (bit 1).
Fortunately, no problems were caused by this typo, because the default
value of the undocumented misused bit 2 seems to be zero.
However, the sp5100_tco driver should use the correct bitmask value.
[1] http://support.amd.com/us/Embedded_TechDocs/45482.pdf
Signed-off-by: Takahisa Tanaka <mc74hc00@gmail.com>
Signed-off-by: Paul Menzel <paulepanter@users.sourceforge.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Cc: stable <stable@vger.kernel.org>
Takahisa Tanaka [Sun, 3 Mar 2013 05:52:07 +0000 (14:52 +0900)]
watchdog: sp5100_tco: Remove code that may cause a boot failure
A problem was found on PC's with the SB700 chipset: The PC fails to
load BIOS after running the 3.8.x kernel until the power is completely
cut off. It occurs in all 3.8.x versions and the mainline version as of
2/4. The issue does not occur with the 3.7.x builds.
There are two methods for accessing the watchdog registers.
1. Re-programming a resource address obtained by allocate_resource()
to chipset.
2. Use the direct memory-mapped IO access.
The method 1 can be used by all the chipsets (SP5100, SB7x0, SB8x0 or
later). However, experience shows that only PC with the SB8x0 (or
later) chipsets can use the method 2.
This patch removes the method 1, because the critical problem was found.
That's why the watchdog timer was able to be used on SP5100 and SB7x0
chipsets until now.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1116835
Link: https://lkml.org/lkml/2013/2/14/271
Signed-off-by: Takahisa Tanaka <mc74hc00@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Cc: stable <stable@vger.kernel.org>
Kevin Hilman [Wed, 28 Nov 2012 23:46:45 +0000 (15:46 -0800)]
MAINTAINERS: update email address for Kevin Hilman
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Kent Overstreet [Fri, 22 Mar 2013 18:18:24 +0000 (11:18 -0700)]
nfsd: fix bad offset use
vfs_writev() updates the offset argument - but the code then passes the
offset to vfs_fsync_range(). Since offset now points to the offset after
what was just written, this is probably not what was intended
Introduced by
face15025ffdf664de95e86ae831544154d26c9c "nfsd: use
vfs_fsync_range(), not O_SYNC, for stable writes".
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Laxman Dewangan [Fri, 22 Mar 2013 18:35:06 +0000 (12:35 -0600)]
ARM: tegra: fix register address of slink controller
Fix typo on register address of slink3 controller where register
address is wrongly set as 0x7000d480 but it is 0x7000d800.
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Ben Hutchings [Fri, 22 Mar 2013 19:56:51 +0000 (19:56 +0000)]
efivars: Fix check for CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE
The 'CONFIG_' prefix is not implicit in IS_ENABLED().
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Linus Torvalds [Fri, 22 Mar 2013 19:57:30 +0000 (12:57 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fix from Marcelo Tosatti:
"Fix compilation on PPC with !CONFIG_KVM"
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
Revert "KVM: allow host header to be included even for !CONFIG_KVM"
Linus Torvalds [Fri, 22 Mar 2013 19:45:55 +0000 (12:45 -0700)]
Merge tag 'usb-3.9-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are a number of USB fixes that resolve issues that have been
reported against 3.9-rc3."
* tag 'usb-3.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (37 commits)
USB: ti_usb_3410_5052: fix use-after-free in TIOCMIWAIT
USB: ssu100: fix use-after-free in TIOCMIWAIT
USB: spcp8x5: fix use-after-free in TIOCMIWAIT
USB: quatech2: fix use-after-free in TIOCMIWAIT
USB: pl2303: fix use-after-free in TIOCMIWAIT
USB: oti6858: fix use-after-free in TIOCMIWAIT
USB: mos7840: fix use-after-free in TIOCMIWAIT
USB: mos7840: fix broken TIOCMIWAIT
USB: mct_u232: fix use-after-free in TIOCMIWAIT
USB: io_ti: fix use-after-free in TIOCMIWAIT
USB: io_edgeport: fix use-after-free in TIOCMIWAIT
USB: ftdi_sio: fix use-after-free in TIOCMIWAIT
USB: f81232: fix use-after-free in TIOCMIWAIT
USB: cypress_m8: fix use-after-free in TIOCMIWAIT
USB: ch341: fix use-after-free in TIOCMIWAIT
USB: ark3116: fix use-after-free in TIOCMIWAIT
USB: serial: add modem-status-change wait queue
USB: serial: fix interface refcounting
USB: io_ti: fix get_icount for two port adapters
USB: garmin_gps: fix memory leak on disconnect
...
Linus Torvalds [Fri, 22 Mar 2013 19:45:08 +0000 (12:45 -0700)]
Merge tag 'sound-3.9' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Mostly HD-audio and USB-audio regression fixes:
- Oops fix at unloading of snd-hda-codec-conexant module
- A few trivial regression fixes for Cirrus and Conexant HD-audio
codecs
- Relax the USB-audio descriptor parse errors as non-fatal
- Fix locking of HD-audio CA0132 DSP loader
- Fix the generic HD-audio parser for VIA codecs"
* tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix DAC assignment for independent HP
ALSA: hda - Fix abuse of snd_hda_lock_devices() for DSP loader
ALSA: hda - Fix typo in checking IEC958 emphasis bit
ALSA: snd-usb: mixer: ignore -EINVAL in snd_usb_mixer_controls()
ALSA: snd-usb: mixer: propagate errors up the call chain
ALSA: usb: Parse UAC2 extension unit like for UAC1
ALSA: hda - Fix yet missing GPIO/EAPD setup in cirrus driver
ALSA: hda/cirrus - Fix the digital beep registration
ALSA: hda - Fix missing beep detach in patch_conexant.c
ALSA: documentation: Fix typo in Documentation/sound
Linus Torvalds [Fri, 22 Mar 2013 19:44:22 +0000 (12:44 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/bp/bp
Pull EDAC fixes from Borislav Petkov:
"A fix from Mauro to correct csrow size accounting in sysfs and a
sparse fix from Stephen Hemminger."
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: Merge mci.mem_is_per_rank with mci.csbased
amd64_edac: Correct DIMM sizes
EDAC: Make sysfs functions static
Keith Busch [Thu, 31 Jan 2013 21:40:38 +0000 (14:40 -0700)]
NVMe: Add namespaces with no LBA range feature
The LBA Range Type feature is optional in the NVMe specification,
so we should continue with adding namespaces for controllers that do
not implement this feature.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Linus Torvalds [Fri, 22 Mar 2013 18:44:04 +0000 (11:44 -0700)]
vfs,proc: guarantee unique inodes in /proc
Dave Jones found another /proc issue with his Trinity tool: thanks to
the namespace model, we can have multiple /proc dentries that point to
the same inode, aliasing directories in /proc/<pid>/net/ for example.
This ends up being a total disaster, because it acts like hardlinked
directories, and causes locking problems. We rely on the topological
sort of the inodes pointed to by dentries, and if we have aliased
directories, that odering becomes unreliable.
In short: don't do this. Multiple dentries with the same (directory)
inode is just a bad idea, and the namespace code should never have
exposed things this way. But we're kind of stuck with it.
This solves things by just always allocating a new inode during /proc
dentry lookup, instead of using "iget_locked()" to look up existing
inodes by superblock and number. That actually simplies the code a bit,
at the cost of potentially doing more inode [de]allocations.
That said, the inode lookup wasn't free either (and did a lot of locking
of inodes), so it is probably not that noticeable. We could easily keep
the old lookup model for non-directory entries, but rather than try to
be excessively clever this just implements the minimal and simplest
workaround for the problem.
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Analyzed-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Thu, 21 Mar 2013 17:36:09 +0000 (17:36 +0000)]
tcp: preserve ACK clocking in TSO
A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.
Current code leads to following bad bursty behavior :
20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119
While the expected behavior is more like :
20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Beulich [Tue, 12 Mar 2013 15:06:23 +0000 (15:06 +0000)]
xen-pciback: notify hypervisor about devices intended to be assigned to guests
For MSI-X capable devices the hypervisor wants to write protect the
MSI-X table and PBA, yet it can't assume that resources have been
assigned to their final values at device enumeration time. Thus have
pciback do that notification, as having the device controlled by it is
a prerequisite to assigning the device to guests anyway.
This is the kernel part of hypervisor side commit
4245d33 ("x86/MSI:
add mechanism to fully protect MSI-X table from PV guest accesses") on
the master branch of git://xenbits.xen.org/xen.git.
CC: stable@vger.kernel.org
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 19 Mar 2013 18:35:30 +0000 (14:35 -0400)]
xen/acpi-processor: Don't dereference struct acpi_processor on all CPUs.
With git commit
c705c78c0d0835a4aa5d0d9a3422e3218462030c
"acpi: Export the acpi_processor_get_performance_info" we are now
using a different mechanism to access the P-states.
The acpi_processor per-cpu structure is set and filtered by the
core ACPI code which shrinks the per_cpu contents to only online CPUs.
In the past we would call acpi_processor_register_performance()
which would have not tried to dereference offline cpus.
With the new patch and the fact that the loop we take is for
for_all_possible_cpus we end up crashing on some machines.
We could modify the loop to be for online_cpus - but all the other
loops in the code use possible_cpus (for a good reason) - so lets
leave it as so and just check if per_cpu(processor) is NULL.
With this patch we will bypass the !online but possible CPUs.
This fixes:
IP: [<
ffffffffa00d13b5>] xen_acpi_processor_init+0x1b6/0xe01 [xen_acpi_processor]
PGD
4126e6067 PUD
4126e3067 PMD 0
Oops: 0002 [#1] SMP
Pid: 432, comm: modprobe Not tainted 3.9.0-rc3+ #28 To be filled by O.E.M. To be filled by O.E.M./M5A97
RIP: e030:[<
ffffffffa00d13b5>] [<
ffffffffa00d13b5>] xen_acpi_processor_init+0x1b6/0xe01 [xen_acpi_processor]
RSP: e02b:
ffff88040c8a3ce8 EFLAGS:
00010282
.. snip..
Call Trace:
[<
ffffffffa00d11ff>] ? read_acpi_id+0x12b/0x12b [xen_acpi_processor]
[<
ffffffff8100215a>] do_one_initcall+0x12a/0x180
[<
ffffffff810c42c3>] load_module+0x1cd3/0x2870
[<
ffffffff81319b70>] ? ddebug_proc_open+0xc0/0xc0
[<
ffffffff810c4f37>] sys_init_module+0xd7/0x120
[<
ffffffff8166ce19>] system_call_fastpath+0x16/0x1b
on some machines.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>