Paolo Bonzini [Thu, 8 Sep 2016 13:35:44 +0000 (15:35 +0200)]
Merge tag 'kvm-s390-next-4.9-1' of git://git./linux/kernel/git/kvms390/linux into HEAD
KVM: s390: features and fixes for 4.9
- lazy enablement of runtime instrumentation
- up to 255 CPUs for nested guests
- rework of machine check deliver
- cleanups/fixes
Christian Borntraeger [Thu, 8 Sep 2016 11:41:08 +0000 (13:41 +0200)]
Merge remote-tracking branch 'kvms390/s390forkvm' into kvms390next
Markus Elfring [Wed, 24 Aug 2016 18:10:09 +0000 (20:10 +0200)]
KVM: s390: Use memdup_user() rather than duplicating code
* Reuse existing functionality from memdup_user() instead of keeping
duplicate source code.
This issue was detected by using the Coccinelle software.
* Return directly if this copy operation failed.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Message-Id: <
c86f7520-885e-2829-ae9c-
b81caa898e84@users.sourceforge.net>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Markus Elfring [Wed, 24 Aug 2016 17:45:23 +0000 (19:45 +0200)]
KVM: s390: Improve determination of sizes in kvm_s390_import_bp_data()
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus reuse the corresponding function "kmalloc_array".
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
This issue was detected also by using the Coccinelle software.
* Replace the specification of data structures by pointer dereferences
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
* Delete the local variable "size" which became unnecessary with
this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-Id: <
c3323f6b-4af2-0bfb-9399-
e529952e378e@users.sourceforge.net>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
David Hildenbrand [Mon, 8 Aug 2016 20:39:32 +0000 (22:39 +0200)]
KVM: s390: allow 255 VCPUs when sca entries aren't used
If the SCA entries aren't used by the hardware (no SIGPIF), we
can simply not set the entries, stick to the basic sca and allow more
than 64 VCPUs.
To hinder any other facility from using these entries, let's properly
provoke intercepts by not setting the MCN and keeping the entries
unset.
This effectively allows when running KVM under KVM (vSIE) or under z/VM to
provide more than 64 VCPUs to a guest. Let's limit it to 255 for now, to
not run into problems if the CPU numbers are limited somewhere else.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fan Zhang [Mon, 15 Aug 2016 02:53:22 +0000 (04:53 +0200)]
KVM: s390: lazy enable RI
Only enable runtime instrumentation if the guest issues an RI related
instruction or if userspace changes the riccb to a valid state.
This makes entry/exit a tiny bit faster.
Initial patch by Christian Borntraeger
Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:43 +0000 (13:52 -0500)]
svm: Implements update_pi_irte hook to setup posted interrupt
This patch implements update_pi_irte function hook to allow SVM
communicate to IOMMU driver regarding how to set up IRTE for handling
posted interrupt.
In case AVIC is enabled, during vcpu_load/unload, SVM needs to update
IOMMU IRTE with appropriate host physical APIC ID. Also, when
vcpu_blocking/unblocking, SVM needs to update the is-running bit in
the IOMMU IRTE. Both are achieved via calling amd_iommu_update_ga().
However, if GA mode is not enabled for the pass-through device,
IOMMU driver will simply just return when calling amd_iommu_update_ga.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:42 +0000 (13:52 -0500)]
svm: Introduce AMD IOMMU avic_ga_log_notifier
This patch introduces avic_ga_log_notifier, which will be called
by IOMMU driver whenever it handles the Guest vAPIC (GA) log entry.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:41 +0000 (13:52 -0500)]
svm: Introduces AVIC per-VM ID
Introduces per-VM AVIC ID and helper functions to manage the IDs.
Currently, the ID will be used to implement 32-bit AVIC IOMMU GA tag.
The ID is 24-bit one-based indexing value, and is managed via helper
functions to get the next ID, or to free an ID once a VM is destroyed.
There should be no ID conflict for any active VMs.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Janosch Frank [Fri, 29 Jul 2016 09:36:04 +0000 (11:36 +0200)]
KVM: s390: gaccess: simplify translation exception handling
The payload data for protection exceptions is a superset of the
payload of other translation exceptions. Let's set the additional
flags and use a fall through to minimize code duplication.
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
David Hildenbrand [Fri, 27 May 2016 13:24:33 +0000 (15:24 +0200)]
KVM: s390: guestdbg: separate defines for per code
Let's avoid working with the PER_EVENT* defines, used for control register
manipulation, when checking the u8 PER code. Introduce separate defines
based on the existing defines.
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
David Hildenbrand [Wed, 3 Aug 2016 10:25:08 +0000 (12:25 +0200)]
KVM: s390: write external damage code on machine checks
Let's also write the external damage code already provided by
struct kvm_s390_mchk_info.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
David Hildenbrand [Wed, 14 Oct 2015 14:57:56 +0000 (16:57 +0200)]
KVM: s390: fix delivery of vector regs during machine checks
Vector registers are only to be stored if the facility is available
and if the guest has set up the machine check extended save area.
If anything goes wrong while writing the vector registers, the vector
registers are to be marked as invalid. Please note that we are allowed
to write the registers although they are marked as invalid.
Machine checks and "store status" SIGP orders are two different concepts,
let's correctly separate these. As the SIGP part is completely handled in
user space, we can drop it.
This patch is based on a patch from Cornelia Huck.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
David Hildenbrand [Wed, 3 Aug 2016 09:18:57 +0000 (11:18 +0200)]
KVM: s390: split store status and machine check handling
Store status writes the prefix which is not to be done by a machine check.
Also, the psw is stored and later on overwritten by the failing-storage
address, which looks strange at first sight.
Store status and machine check handling look similar, but they are actually
two different things.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
David Hildenbrand [Wed, 14 Oct 2015 14:47:36 +0000 (16:47 +0200)]
KVM: s390: factor out actual delivery of machine checks
Let's factor this out to prepare for bigger changes. Reorder to calls to
match the logical order given in the PoP.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Jan Dakinevich [Sun, 4 Sep 2016 18:23:15 +0000 (21:23 +0300)]
KVM: nVMX: expose INS/OUTS information support
Expose the feature to L1 hypervisor if host CPU supports it, since
certain hypervisors requires it for own purposes.
According to Intel SDM A.1, if CPU supports the feature,
VMX_INSTRUCTION_INFO field of VMCS will contain detailed information
about INS/OUTS instructions handling. This field is already copied to
VMCS12 for L1 hypervisor (see prepare_vmcs12 routine) independently
feature presence.
Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 5 Sep 2016 13:57:00 +0000 (15:57 +0200)]
KVM: VMX: not use vmcs_config in setup_vmcs_config
setup_vmcs_config takes a pointer to the vmcs_config global. The
indirection is somewhat pointless, but just keep things consistent
for now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 19 Aug 2016 15:51:29 +0000 (17:51 +0200)]
KVM: x86: remove stale comments
handle_external_intr does not enable interrupts anymore, vcpu_enter_guest
does it after calling guest_exit_irqoff.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 19 Aug 2016 15:51:20 +0000 (17:51 +0200)]
KVM: x86: ratelimit and decrease severity for guest-triggered printk
These are mostly related to nested VMX. They needn't have
a loglevel as high as KERN_WARN, and mustn't be allowed to
pollute the host logs.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jan Dakinevich [Sun, 4 Sep 2016 18:22:47 +0000 (21:22 +0300)]
KVM: nVMX: pass valid guest linear-address to the L1
If EPT support is exposed to L1 hypervisor, guest linear-address field
of VMCS should contain GVA of L2, the access to which caused EPT violation.
Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Bhaktipriya Shridhar [Tue, 30 Aug 2016 17:59:51 +0000 (23:29 +0530)]
KVM: Remove deprecated create_singlethread_workqueue
The workqueue "irqfd_cleanup_wq" queues a single work item
&irqfd->shutdown and hence doesn't require ordering. It is a host-wide
workqueue for issuing deferred shutdown requests aggregated from all
vm* instances. It is not being used on a memory reclaim path.
Hence, it has been converted to use system_wq.
The work item has been flushed in kvm_irqfd_release().
The workqueue "wqueue" queues a single work item &timer->expired
and hence doesn't require ordering. Also, it is not being used on
a memory reclaim path. Hence, it has been converted to use system_wq.
System workqueues have been able to handle high level of concurrency
for a long time now and hence it's not required to have a singlethreaded
workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
created with create_singlethread_workqueue(), system_wq allows multiple
work items to overlap executions even on the same CPU; however, a
per-cpu workqueue doesn't have any CPU locality or global ordering
guarantee unless the target CPU is explicitly specified and thus the
increase of local concurrency shouldn't make any difference.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Tue, 30 Aug 2016 08:14:01 +0000 (16:14 +0800)]
KVM: nVMX: make emulated nested preemption timer pinned
Commit
61abdbe0bc ("kvm: x86: make lapic hrtimer pinned") pins the emulated
lapic timer. This patch does the same for the emulated nested preemption
timer to avoid vmexit an unrelated vCPU and the latency of kicking IPI to
another vCPU.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Liang Li [Thu, 18 Aug 2016 07:49:19 +0000 (15:49 +0800)]
vmx: refine validity check for guest linear address
The validity check for the guest line address is inefficient,
check the invalid value instead of enumerating the valid ones.
Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 5 Sep 2016 14:00:24 +0000 (16:00 +0200)]
Merge branch 'x86/amd-avic' of git://git./linux/kernel/git/joro/iommu into HEAD
Merge IOMMU bits for virtualization of interrupt injection into
virtual machines.
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:40 +0000 (13:52 -0500)]
iommu/amd: Enable vAPIC interrupt remapping mode by default
Introduce struct iommu_dev_data.use_vapic flag, which IOMMU driver
uses to determine if it should enable vAPIC support, by setting
the ga_mode bit in the device's interrupt remapping table entry.
Currently, it is enabled for all pass-through device if vAPIC mode
is enabled.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:39 +0000 (13:52 -0500)]
iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode for pass-through devices
This patch implements irq_set_vcpu_affinity() function to set up interrupt
remapping table entry with vapic mode for pass-through devices.
In case requirements for vapic mode are not met, it falls back to set up
the IRTE in legacy mode.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:38 +0000 (13:52 -0500)]
iommu/amd: Introduce amd_iommu_update_ga()
Introduces a new IOMMU API, amd_iommu_update_ga(), which allows
KVM (SVM) to update existing posted interrupt IOMMU IRTE when
load/unload vcpu.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:37 +0000 (13:52 -0500)]
iommu/amd: Adding GALOG interrupt handler
This patch adds AMD IOMMU guest virtual APIC log (GALOG) handler.
When IOMMU hardware receives an interrupt targeting a blocking vcpu,
it creates an entry in the GALOG, and generates an interrupt to notify
the AMD IOMMU driver.
At this point, the driver processes the log entry, and notify the SVM
driver via the registered iommu_ga_log_notifier function.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:36 +0000 (13:52 -0500)]
iommu/amd: Detect and initialize guest vAPIC log
This patch adds support to detect and initialize IOMMU Guest vAPIC log
(GALOG). By default, it also enable GALog interrupt to notify IOMMU driver
when GA Log entry is created.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:35 +0000 (13:52 -0500)]
iommu/amd: Add support for multiple IRTE formats
This patch enables support for the new 128-bit IOMMU IRTE format,
which can be used for both legacy and vapic interrupt remapping modes.
It replaces the existing operations on IRTE, which can only support
the older 32-bit IRTE format, with calls to the new struct amd_irt_ops.
It also provides helper functions for setting up, accessing, and
updating interrupt remapping table entries in different mode.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:34 +0000 (13:52 -0500)]
iommu/amd: Introduce interrupt remapping ops structure
Currently, IOMMU support two interrupt remapping table entry formats,
32-bit (legacy) and 128-bit (GA). The spec also implies that it might
support additional modes/formats in the future.
So, this patch introduces the new struct amd_irte_ops, which allows
the same code to work with different irte formats by providing hooks
for various operations on an interrupt remapping table entry.
Suggested-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:33 +0000 (13:52 -0500)]
iommu/amd: Move and introduce new IRTE-related unions and structures
Move existing unions and structs for accessing/managing IRTE to a proper
header file. This is mainly to simplify variable declarations in subsequent
patches.
Besides, this patch also introduces new struct irte_ga for the new
128-bit IRTE format.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Suravee Suthikulpanit [Tue, 23 Aug 2016 18:52:32 +0000 (13:52 -0500)]
iommu/amd: Detect and enable guest vAPIC support
This patch introduces a new IOMMU driver parameter, amd_iommu_guest_ir,
which can be used to specify different interrupt remapping mode for
passthrough devices to VM guest:
* legacy: Legacy interrupt remapping (w/ 32-bit IRTE)
* vapic : Guest vAPIC interrupt remapping (w/ GA mode 128-bit IRTE)
Note that in vapic mode, it can also supports legacy interrupt remapping
for non-passthrough devices with the 128-bit IRTE.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Heiko Carstens [Tue, 16 Aug 2016 08:31:10 +0000 (10:31 +0200)]
KVM: s390: generate facility mask from readable list
Automatically generate the KVM facility mask out of a readable list.
Manually changing the masks is very error prone, especially if the
special IBM bit numbering has to be considered.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Linus Torvalds [Sun, 21 Aug 2016 23:14:10 +0000 (16:14 -0700)]
Linux 4.8-rc3
Linus Torvalds [Sun, 21 Aug 2016 21:28:24 +0000 (14:28 -0700)]
Merge branch 'parisc-4.8-2' of git://git./linux/kernel/git/deller/parisc-linux
Pull two parisc fixes from Helge Deller:
"The first patch ensures that the high-res cr16 clocksource (which was
added in kernel 4.7) gets choosen as default clocksource for parisc.
The second patch moves the #define of EREFUSED down inside errno.h and
thus unbreaks building the gccgo compiler"
* 'parisc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix order of EREFUSED define in errno.h
parisc: Fix automatic selection of cr16 clocksource
Tony Luck [Sat, 20 Aug 2016 23:27:58 +0000 (16:27 -0700)]
EDAC, skx_edac: Add EDAC driver for Skylake
This is an entirely new driver instead of yet another set of patches
to sb_edac.c because:
1) Mapping from PCI devices to socket/memory controller is significantly
different. Skylake scatters devices on a socket across a number of
PCI buses.
2) There is an extra level of interleaving via the "mcroute" register
that would be a little messy to squeeze into the old driver.
3) Validation is getting too expensive. Changes to sb_edac need to
be checked against Sandy Bridge, Ivy Bridge, Haswell, Broadwell and
Knights Landing.
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Helge Deller [Sat, 20 Aug 2016 09:51:38 +0000 (11:51 +0200)]
parisc: Fix order of EREFUSED define in errno.h
When building gccgo in userspace, errno.h gets parsed and the go include file
sysinfo.go is generated.
Since EREFUSED is defined to the same value as ECONNREFUSED, and ECONNREFUSED
is defined later on in errno.h, this leads to go complaining that EREFUSED
isn't defined yet.
Fix this trivial problem by moving the define of EREFUSED down after
ECONNREFUSED in errno.h (and clean up the indenting while touching this line).
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Helge Deller [Fri, 19 Aug 2016 20:39:02 +0000 (22:39 +0200)]
parisc: Fix automatic selection of cr16 clocksource
Commit
54b66800907 (parisc: Add native high-resolution sched_clock()
implementation) added support to use the CPU-internal cr16 counters as reliable
clocksource with the help of HAVE_UNSTABLE_SCHED_CLOCK.
Sadly the commit missed to remove the hack which prevented cr16 to become the
default clocksource even on SMP systems.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.7+
Linus Torvalds [Fri, 19 Aug 2016 19:47:01 +0000 (12:47 -0700)]
Make the hardened user-copy code depend on having a hardened allocator
The kernel test robot reported a usercopy failure in the new hardened
sanity checks, due to a page-crossing copy of the FPU state into the
task structure.
This happened because the kernel test robot was testing with SLOB, which
doesn't actually do the required book-keeping for slab allocations, and
as a result the hardening code didn't realize that the task struct
allocation was one single allocation - and the sanity checks fail.
Since SLOB doesn't even claim to support hardening (and you really
shouldn't use it), the straightforward solution is to just make the
usercopy hardening code depend on the allocator supporting it.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 19 Aug 2016 19:10:06 +0000 (12:10 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has some pretty standard driver bugfixes and one minor cleanup"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: meson: Use complete() instead of complete_all()
i2c: brcmstb: Use complete() instead of complete_all()
i2c: bcm-kona: Use complete() instead of complete_all()
i2c: bcm-iproc: Use complete() instead of complete_all()
i2c: at91: fix support of the "alternative command" feature
i2c: ocores: add missed clk_disable_unprepare() on failure paths
i2c: cros-ec-tunnel: Fix usage of cros_ec_cmd_xfer()
i2c: mux: demux-pinctrl: properly roll back when adding adapter fails
Linus Torvalds [Fri, 19 Aug 2016 16:32:48 +0000 (09:32 -0700)]
Merge tag 'dm-4.8-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a stable fix for DM round robin multipath path selector to disable
preemption before using this_cpu_ptr()
- a slight increase in DM crypt's mempool reserves to make swap ontop
of DM crypt more performant
- a few DM raid fixes to issues found while testing changes that were
merged in v4.8-rc1
* tag 'dm-4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm raid: support raid0 with missing metadata devices
dm raid: enhance attempt_restore_of_faulty_devices() to support more devices
dm raid: fix restoring of failed devices regression
dm raid: fix frozen recovery regression
dm crypt: increase mempool reserve to better support swapping
dm round robin: do not use this_cpu_ptr() without having preemption disabled
Linus Torvalds [Fri, 19 Aug 2016 16:22:50 +0000 (09:22 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six fairly small fixes. The ipr, mpt3sas and ses ones all trigger
oopses. The megaraid one fixes an attach failure on io mapped only
cards, the fcoe one is an obvious problem in the error path and the
aacraid one is a theoretical security issue (ability to trick the
kernel into a buffer overrun)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
ses: Fix racy cleanup of /sys in remove_dev()
mpt3sas: Fix resume on WarpDrive flash cards
ipr: Fix sync scsi scan
megaraid_sas: Fix probing cards without io port
aacraid: Check size values after double-fetch from user
fcoe: Use kfree_skb() instead of kfree()
Linus Torvalds [Fri, 19 Aug 2016 16:21:24 +0000 (09:21 -0700)]
Merge tag 'usb-4.8-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of USB fixes for reported issues for your tree.
The normal amount of gadget fixes, xhci fixes, new device ids, and a
few other minor things. All of them have been in linux-next for a
while, the full details are in the shortlog below"
* tag 'usb-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (43 commits)
xhci: don't dereference a xhci member after removing xhci
usb: xhci: Fix panic if disconnect
xhci: really enqueue zero length TRBs.
xhci: always handle "Command Ring Stopped" events
cdc-acm: fix wrong pipe type on rx interrupt xfers
usb: misc: usbtest: add fix for driver hang
usb: dwc3: gadget: stop processing on HWO set
usb: dwc3: don't set last bit for ISOC endpoints
usb: gadget: rndis: free response queue during REMOTE_NDIS_RESET_MSG
usb: udc: core: fix error handling
usb: gadget: fsl_qe_udc: off by one in setup_received_handle()
usb/gadget: fix gadgetfs aio support.
usb: gadget: composite: Fix return value in case of error
usb: gadget: uvc: Fix return value in case of error
usb: gadget: fix check in sync read from ep in gadgetfs
usb: misc: usbtest: usbtest_do_ioctl may return positive integer
usb: dwc3: fix missing platform_set_drvdata() in dwc3_of_simple_probe()
usb: phy: omap-otg: Fix missing platform_set_drvdata() in omap_otg_probe()
usb: gadget: configfs: add mutex lock before unregister gadget
usb: gadget: u_ether: fix dereference after null check coverify warning
...
Linus Torvalds [Fri, 19 Aug 2016 16:06:41 +0000 (09:06 -0700)]
Merge tag 'xfs-iomap-for-linus-4.8-rc3' of git://git./linux/kernel/git/dgc/linux-xfs
Pull xfs and iomap fixes from Dave Chinner:
"Changes in this update:
Regression fixes for XFS changes introduce in 4.8-rc1:
- buffer IO accounting assert failure
- ENOSPC block accounting reservation issue
- DAX IO path page cache invalidation fix
- rmapbt on-disk block count in agf
- correct classification of rmap block type when updating AGFL.
- iomap support for attribute fork mapping
Regression fixes for iomap infrastructure in 4.8-rc1:
- fiemap: honor FIEMAP_FLAG_SYNC
- fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression
- make mark_page_accessed and pagefault_disable usage consistent with
other IO paths"
* tag 'xfs-iomap-for-linus-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: remove OWN_AG rmap when allocating a block from the AGFL
xfs: (re-)implement FIEMAP_FLAG_XATTR
xfs: simplify xfs_file_iomap_begin
iomap: mark ->iomap_end as optional
iomap: prepare iomap_fiemap for attribute mappings
iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag
iomap: remove superflous pagefault_disable from iomap_write_actor
iomap: remove superflous mark_page_accessed from iomap_write_actor
xfs: store rmapbt block count in the AGF
xfs: don't invalidate whole file on DAX read/write
xfs: fix bogus space reservation in xfs_iomap_write_allocate
xfs: don't assert fail on non-async buffers on ioacct decrement
Linus Torvalds [Fri, 19 Aug 2016 15:52:17 +0000 (08:52 -0700)]
Merge tag 'hwmon-for-linus-v4.8-rc2' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix a bug in it87 driver and URLs in ftsteutates driver"
* tag 'hwmon-for-linus-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ftsteutates) Correct ftp urls in driver documentation
hwmon: (it87) Features mask must be 32 bit wide
Luwei Kang [Tue, 2 Aug 2016 08:01:18 +0000 (16:01 +0800)]
KVM: x86: Expose more Intel AVX512 feature to guest
Expose AVX512DQ, AVX512BW, AVX512VL feature to guest.
Its spec can be found at:
https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
[Resolved a trivial conflict with removed F(PCOMMIT).]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Bandan Das [Tue, 2 Aug 2016 20:32:37 +0000 (16:32 -0400)]
mmu: don't pass *kvm to spte_write_protect and spte_*_dirty
That parameter isn't used in these functions,
it's probably a historical artifact.
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Wed, 3 Aug 2016 04:04:13 +0000 (12:04 +0800)]
KVM: lapic: don't recalculate apic map table twice when enabling LAPIC
APIC map table is recalculated during reset APIC ID to the initial value
when enabling LAPIC. This patch move the recalculate_apic_map() to the
next branch since we don't need to recalculate apic map twice in current
codes.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
James Hogan [Fri, 19 Aug 2016 13:30:29 +0000 (14:30 +0100)]
MIPS: KVM: Check for pfn noslot case
When mapping a page into the guest we error check using is_error_pfn(),
however this doesn't detect a value of KVM_PFN_NOSLOT, indicating an
error HVA for the page. This can only happen on MIPS right now due to
unusual memslot management (e.g. being moved / removed / resized), or
with an Enhanced Virtual Memory (EVA) configuration where the default
KVM_HVA_ERR_* and kvm_is_error_hva() definitions are unsuitable (fixed
in a later patch). This case will be treated as a pfn of zero, mapping
the first page of physical memory into the guest.
It would appear the MIPS KVM port wasn't updated prior to being merged
(in v3.10) to take commit
81c52c56e2b4 ("KVM: do not treat noslot pfn as
a error pfn") into account (merged v3.8), which converted a bunch of
is_error_pfn() calls to is_error_noslot_pfn(). Switch to using
is_error_noslot_pfn() instead to catch this case properly.
Fixes:
858dd5d45733 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.y-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 19 Aug 2016 02:38:18 +0000 (19:38 -0700)]
Merge tag 'drm-fixes-for-4.8-rc3-2' of git://people.freedesktop.org/~airlied/linux
Pull more drm fixes from Dave Airlie:
"Daniel pointed out I'd missed some i915 fixes, and I also found a
single etnaviv fix I missed.
So here they are"
* tag 'drm-fixes-for-4.8-rc3-2' of git://people.freedesktop.org/~airlied/linux:
drm/etnaviv: take GPU lock later in the submit process
drm/i915: Fix modeset handling during gpu reset, v5.
drm/i915: fix aliasing_ppgtt leak
drm/i915: fix WaInsertDummyPushConstPs
drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2
drm/i915/gen9: Give one extra block per line for SKL plane WM calculations
drm/i915: Acquire audio powerwell for HD-Audio registers
drm/i915: Add missing rpm wakelock to GGTT pread
drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0
drm/i915: Program iboost settings for HDMI/DVI on SKL
drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
drm/i915: Handle ENOSPC after failing to insert a mappable node
drm/i915: Flush GT idle status upon reset
Linus Torvalds [Fri, 19 Aug 2016 02:31:08 +0000 (19:31 -0700)]
Merge tag 'devicetree-fixes-for-4.8' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
- a couple of DT node ref counting fixes
- fix __unflatten_device_tree for PPC PCI hotplug case
- rework marking irq controllers as OF_POPULATED in cases where real
driver is used.
- disable of_platform_default_populate_init on PPC. The change in
initcall order causes problems which need to be sorted out later.
* tag 'devicetree-fixes-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: fix reference counting in of_graph_get_endpoint_by_regs
of/platform: disable the of_platform_default_populate_init() for all the ppc boards
ARM: imx6: mark GPC node as not populated after irq init to probe pm domain driver
of/irq: Mark interrupt controllers as populated before initialisation
drivers/of: Validate device node in __unflatten_device_tree()
of: Delete an unnecessary check before the function call "of_node_put"
Linus Torvalds [Fri, 19 Aug 2016 01:54:40 +0000 (18:54 -0700)]
Merge tag '4.8-doc-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Three small fixes for Sphinx-formatted documentation generation"
* tag '4.8-doc-fixes' of git://git.lwn.net/linux:
doc-rst: customize RTD theme, drop padding of inline literal
docs: kernel-documentation: remove some highlight directives
docs: Set the Sphinx default highlight language to "guess"
Dave Airlie [Thu, 18 Aug 2016 22:51:13 +0000 (08:51 +1000)]
Merge tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Collection of i915 fixes.
* tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Fix modeset handling during gpu reset, v5.
drm/i915: fix aliasing_ppgtt leak
drm/i915: fix WaInsertDummyPushConstPs
drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2
drm/i915/gen9: Give one extra block per line for SKL plane WM calculations
drm/i915: Acquire audio powerwell for HD-Audio registers
drm/i915: Add missing rpm wakelock to GGTT pread
drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0
drm/i915: Program iboost settings for HDMI/DVI on SKL
drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
drm/i915: Handle ENOSPC after failing to insert a mappable node
drm/i915: Flush GT idle status upon reset
Dave Airlie [Thu, 18 Aug 2016 22:50:42 +0000 (08:50 +1000)]
Merge branch 'drm-etnaviv-fixes' of git://git.pengutronix.de/git/lst/linux into drm-fixes
Single GPU recovery fix
* 'drm-etnaviv-fixes' of git://git.pengutronix.de/git/lst/linux:
drm/etnaviv: take GPU lock later in the submit process
Linus Torvalds [Thu, 18 Aug 2016 22:09:41 +0000 (15:09 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"An initrd microcode loading fix, and an SMP bootup topology setup fix
to resolve crashes on SGI/UV systems if the BIOS is configured in a
certain way"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Fix __max_logical_packages value setup
x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
Linus Torvalds [Thu, 18 Aug 2016 22:08:31 +0000 (15:08 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Three clocksource driver fixes"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/mips-gic-timer: Make gic_clocksource_of_init() return int
clocksource/drivers/kona: Fix get_counter() error handling
clocksource/drivers/time-armada-370-xp: Fix the clock reference
Linus Torvalds [Thu, 18 Aug 2016 22:07:21 +0000 (15:07 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two cputime fixes - hopefully the last ones"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Resync steal time when guest & host lose sync
sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
Linus Torvalds [Thu, 18 Aug 2016 22:04:53 +0000 (15:04 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also start/stop filter related fixes, a perf
event read() fix, a fix uncovered by fuzzing, and an uprobes leak fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Check return value of the perf_event_read() IPI
perf/core: Enable mapping of the stop filters
perf/core: Update filters only on executable mmap
perf/core: Fix file name handling for start/stop filters
perf/core: Fix event_function_local()
uprobes: Fix the memcg accounting
perf intel-pt: Fix occasional decoding errors when tracing system-wide
tools: Sync kvm related header files for arm64 and s390
perf probe: Release resources on error when handling exit paths
perf probe: Check for dup and fdopen failures
perf symbols: Fix annotation of objects with debuginfo files
perf script: Don't disable use_callchain if input is pipe
perf script: Show proper message when failed list scripts
perf jitdump: Add the right header to get the major()/minor() definitions
perf ppc64le: Fix build failure when libelf is not present
perf tools mem: Fix -t store option for record command
perf intel-pt: Fix ip compression
Linus Torvalds [Thu, 18 Aug 2016 20:45:48 +0000 (13:45 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two lockless_dereference() related fixes"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/barriers: Suppress sparse warnings in lockless_dereference()
Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"
Linus Torvalds [Thu, 18 Aug 2016 18:17:13 +0000 (11:17 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Avoid a literal load with the MMU off on the CPU resume path
(potential inconsistency between cache and RAM)
- Build error with CONFIG_ACPI=n fixed
- Compiler warning in the arch/arm64/mm/dump.c code fixed
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix shift warning in arch/arm64/mm/dump.c
arm64: kernel: avoid literal load of virtual address with MMU off
arm64: Fix NUMA build error when !CONFIG_ACPI
Linus Torvalds [Thu, 18 Aug 2016 18:13:20 +0000 (11:13 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Only three fixes this time:
- Emil found an overflow problem with the memory layout sanity check.
- Ard Biesheuvel noticed that late-allocated page tables (for EFI)
weren't being properly constructed.
- Guenter Roeck reported a problem found on qemu caused by the recent
addr_limit changes"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: fix address limit restoration for undefined instructions
ARM: 8591/1: mm: use fully constructed struct pages for EFI pgd allocations
ARM: 8590/1: sanity_check_meminfo(): avoid overflow on vmalloc_limit
Linus Torvalds [Thu, 18 Aug 2016 18:09:43 +0000 (11:09 -0700)]
Merge tag 'pm-4.8-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"More hibernation-related material: one fix for a recent regression in
the core, one small cleanup of the x86-64 resume code and a
documentation update.
Specifics:
- Fix a hibernate core regression resulting from uncovering a latent
bug in its implementation of memory bitmaps by a recent commit
(James Morse).
- Use __pa() to compute a physical address in the x86-64 code
finalizing resume from hibernation (Rafael Wysocki).
- Update power management documentation related to system sleep
states to remove outdated information from it and to add a
description of a recently introduced hibernation debug feature to
it (Rafael Wysocki)"
* tag 'pm-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
x86/power/64: Use __pa() for physical address computation
PM / sleep: Update some system sleep documentation
Linus Torvalds [Thu, 18 Aug 2016 17:58:50 +0000 (10:58 -0700)]
Merge tag 'drm-fixes-for-4.8-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Pretty quiet so far:
- a few amdgpu/radeon fixup for pcie pm changes
- a couple of amdgpu fixes
- some build fixes
- printk fix"
* tag 'drm-fixes-for-4.8-rc3' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: Change GART offset to 64-bit
drm/mediatek: add ARM_SMCCC dependency
drm/mediatek: add CONFIG_OF dependency
drm/mediatek: add COMMON_CLK dependency
drm/amdgpu: Fix memory trashing if UVD ring test fails
drm/amdgpu: fix vm init error path
drm/amdkfd: print doorbell offset as a hex value
Revert "drm/radeon: work around lack of upstream ACPI support for D3cold"
Revert "drm/amdgpu: work around lack of upstream ACPI support for D3cold"
Johannes Berg [Thu, 11 Aug 2016 09:50:22 +0000 (11:50 +0200)]
locking/barriers: Suppress sparse warnings in lockless_dereference()
After Peter's commit:
331b6d8c7afc ("locking/barriers: Validate lockless_dereference() is used on a pointer type")
... we get a lot of sparse warnings (one for every rcu_dereference, and more)
since the expression here is assigning to the wrong address space.
Instead of validating that 'p' is a pointer this way, instead make
it fail compilation when it's not by using sizeof(*(p)). This will
not cause any sparse warnings (tested, likely since the address
space is irrelevant for sizeof), and will fail compilation when
'p' isn't a pointer type.
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
331b6d8c7afc ("locking/barriers: Validate lockless_dereference() is used on a pointer type")
Link: http://lkml.kernel.org/r/1470909022-687-2-git-send-email-johannes@sipsolutions.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Johannes Berg [Thu, 11 Aug 2016 09:50:21 +0000 (11:50 +0200)]
Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"
This reverts commit:
fa7d81bb3c269 ("drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference")
As Peter explained:
[...] lockless_dereference() is _stronger_ than READ_ONCE(), not weaker.
[...]
Also, clue is in the name: 'dereference', you don't actually dereference
the pointer here, only load it.
My next patch breaks the compile without this revert, because it assumes
you want to deference and thus also need the struct type visible (which
it isn't here), so revert it.
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1470909022-687-1-git-send-email-johannes@sipsolutions.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Catalin Marinas [Fri, 5 Dec 2014 12:34:54 +0000 (12:34 +0000)]
arm64: Fix shift warning in arch/arm64/mm/dump.c
When building with 48-bit VAs and 16K page configuration, it's possible
to get the following warning when building the arm64 page table dumping
code:
arch/arm64/mm/dump.c: In function ‘walk_pud’:
arch/arm64/mm/dump.c:274:102: warning: right shift count >= width of type [-Wshift-count-overflow]
This is because pud_offset(pgd, 0) performs a shift to the right by 36
while the value 0 has the type 'int' by default, therefore 32-bit.
This patch modifies all the p*_offset() uses in arch/arm64/mm/dump.c to
use 0UL for the address argument.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Paolo Bonzini [Thu, 18 Aug 2016 10:19:19 +0000 (12:19 +0200)]
Merge tag 'kvm-arm-for-v4.8-rc3' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM Fixes for v4.8-rc3
This tag contains the following fixes on top of v4.8-rc1:
- ITS init issues
- ITS error handling issues
- ITS IRQ leakage fix
- Plug a couple of ITS race conditions
- An erratum workaround for timers
- Some removal of misleading use of errors and comments
- A fix for GICv3 on 32-bit guests
Peter Feiner [Wed, 17 Aug 2016 16:36:47 +0000 (09:36 -0700)]
kvm: nVMX: fix nested tsc scaling
When the host supported TSC scaling, L2 would use a TSC multiplier of
0, which causes a VM entry failure. Now L2's TSC uses the same
multiplier as L1.
Signed-off-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Radim Krčmář [Mon, 8 Aug 2016 18:16:23 +0000 (20:16 +0200)]
KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
write with vmcs02 as the current VMCS.
This will incorrectly apply modifications intended for vmcs01 to vmcs02
and L2 can use it to gain access to L0's x2APIC registers by disabling
virtualized x2APIC while using msr bitmap that assumes enabled.
Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
current VMCS. An alternative solution would temporarily make vmcs01 the
current VMCS, but it requires more care.
Fixes:
8d14695f9542 ("x86, apicv: add virtual x2apic support")
Reported-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Radim Krčmář [Mon, 8 Aug 2016 18:16:22 +0000 (20:16 +0200)]
KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
msr bitmap can be used to avoid a VM exit (interception) on guest MSR
accesses. In some configurations of VMX controls, the guest can even
directly access host's x2APIC MSRs. See SDM 29.5 VIRTUALIZING MSR-BASED
APIC ACCESSES.
L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI.
To do so, L1 would first trick KVM to disable all possible interceptions
by enabling APICv features and then would turn those features off;
nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would
not intercept previously enabled MSRs even though they were not safe
with the new configuration.
Correctly re-enabling interceptions is not enough as a second bug would
still allow L1+L2 to access host's MSRs: msr bitmap was shared for all
VMCSs, so L1 could trigger a race to get the desired combination of msr
bitmap and VMX controls.
This fix allocates a msr bitmap for every L1 VCPU, allows only safe
x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would
have to intercept everything anyway.
Fixes:
3af18d9c5fe9 ("KVM: nVMX: Prepare for using hardware MSR bitmap")
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Wincy Van <fanwenyi0529@gmail.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Wanpeng Li [Wed, 17 Aug 2016 02:05:46 +0000 (10:05 +0800)]
sched/cputime: Resync steal time when guest & host lose sync
Commit:
57430218317e ("sched/cputime: Count actually elapsed irq & softirq time")
... fixed a bug but also triggered a regression:
On an i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four
CPU hog processes(for loop) running in the guest, I hot-unplug the pCPUs
on host one by one until there is only one left, then observe CPU utilization
via 'top' in the guest, it shows:
100% st for cpu0(housekeeping)
75% st for other CPUs (nohz full mode)
However, w/o this commit it shows the correct 75% for all four CPUs.
When a guest is interrupted for a longer amount of time, missed clock ticks
are not redelivered later. Because of that, we should not limit the amount
of steal time accounted to the amount of time that the calling functions
think have passed.
However, the interval returned by account_other_time() is NOT rounded down
to the nearest jiffy, while the base interval in get_vtime_delta() it is
subtracted from is, so the max cputime limit is required to avoid underflow.
This patch fixes the regression by limiting the account_other_time() from
get_vtime_delta() to avoid underflow, and lets the other three call sites
(in account_other_time() and steal_account_process_time()) account however
much steal time the host told us elapsed.
Suggested-by: Rik van Riel <riel@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1471399546-4069-1-git-send-email-wanpeng.li@hotmail.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Mon, 15 Aug 2016 16:38:42 +0000 (18:38 +0200)]
sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
Mike reports:
Roughly 10% of the time, ltp testcase getrusage04 fails:
getrusage04 0 TINFO : Expected timers granularity is 4000 us
getrusage04 0 TINFO : Using 1 as multiply factor for max [us]time increment (1000+4000us)!
getrusage04 0 TINFO : utime: 0us; stime: 179us
getrusage04 0 TINFO : utime: 3751us; stime: 0us
getrusage04 1 TFAIL : getrusage04.c:133: stime increased > 5000us:
And tracked it down to the case where the task simply doesn't get
_any_ [us]time ticks.
Update the code to assume all rtime is utime when we lack information,
thus ensuring a task that elides the tick gets time accounted.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fredrik Markstrom <fredrik.markstrom@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: stable@vger.kernel.org # 4.3+
Fixes:
9d7fb0427648 ("sched/cputime: Guarantee stime + utime == rtime")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
David Carrillo-Cisneros [Wed, 17 Aug 2016 20:55:04 +0000 (13:55 -0700)]
perf/core: Check return value of the perf_event_read() IPI
The call to smp_call_function_single in perf_event_read() may fail if
an invalid or not online CPU index is passed. Warn user if such bug is
present and return error.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1471467307-61171-2-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Poirier [Mon, 18 Jul 2016 16:43:07 +0000 (10:43 -0600)]
perf/core: Enable mapping of the stop filters
At this time the perf_addr_filter_needs_mmap() function will _not_
return true on a user space 'stop' filter. But stop filters need
exactly the same kind of mapping that range and start filters get.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1468860187-318-4-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Poirier [Mon, 18 Jul 2016 16:43:06 +0000 (10:43 -0600)]
perf/core: Update filters only on executable mmap
Function perf_event_mmap() is called by the MM subsystem each time
part of a binary is loaded in memory. There can be several mapping
for a binary, many times unrelated to the code section.
Each time a section of a binary is mapped address filters are
updated, event when the map doesn't pertain to the code section.
The end result is that filters are configured based on the last map
event that was received rather than the last mapping of the code
segment.
For example if we have an executable 'main' that calls library
'libcstest.so.1.0', and that we want to collect traces on code
that is in that library. The perf cmd line for this scenario
would be:
perf record -e cs_etm// --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main
Resulting in binaries being mapped this way:
root@linaro-nano:~# cat /proc/1950/maps
00400000-
00401000 r-xp
00000000 08:02 33169 /home/linaro/main
00410000-
00411000 r--p
00000000 08:02 33169 /home/linaro/main
00411000-
00412000 rw-p
00001000 08:02 33169 /home/linaro/main
7fa2464000-
7fa2474000 rw-p
00000000 00:00 0
7fa2474000-
7fa25a4000 r-xp
00000000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25a4000-
7fa25b3000 ---p
00130000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b3000-
7fa25b7000 r--p
0012f000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b7000-
7fa25b9000 rw-p
00133000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b9000-
7fa25bd000 rw-p
00000000 00:00 0
7fa25bd000-
7fa25be000 r-xp
00000000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25be000-
7fa25cd000 ---p
00001000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25cd000-
7fa25ce000 r--p
00000000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25ce000-
7fa25cf000 rw-p
00001000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25cf000-
7fa25eb000 r-xp
00000000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7fa25ef000-
7fa25f2000 rw-p
00000000 00:00 0
7fa25f7000-
7fa25f9000 rw-p
00000000 00:00 0
7fa25f9000-
7fa25fa000 r--p
00000000 00:00 0 [vvar]
7fa25fa000-
7fa25fb000 r-xp
00000000 00:00 0 [vdso]
7fa25fb000-
7fa25fc000 r--p
0001c000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7fa25fc000-
7fa25fe000 rw-p
0001d000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7ff2ea8000-
7ff2ec9000 rw-p
00000000 00:00 0 [stack]
root@linaro-nano:~#
Before 'main()' can execute 'libcstest.so.1.0' has to be loaded in
memory. Once that has been done perf_event_mmap() has been called
4 times, with the last map starting at address 0x7fa25ce000 and
the address filter configured to start filtering when the
IP has passed over address 0x0x7fa25ce72c (0x7fa25ce000 + 0x72c).
But that is wrong since the code segment for library 'libcstest.so.1.0'
as been mapped at 0x7fa25bd000, resulting in traces not being
collected.
This patch corrects the situation by requesting that address
filters be updated only if the mapped event is for a code
segment.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1468860187-318-3-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Poirier [Mon, 18 Jul 2016 16:43:05 +0000 (10:43 -0600)]
perf/core: Fix file name handling for start/stop filters
Binary file names have to be supplied for both range and start/stop
filters but the current code only processes the filename if an
address range filter is specified. This code adds processing of
the filename for start/stop filters.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1468860187-318-2-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 16 Aug 2016 11:33:26 +0000 (13:33 +0200)]
perf/core: Fix event_function_local()
Vincent reported triggering the WARN_ON_ONCE() in event_function_local().
While thinking through cases I noticed that by using event_function()
directly, we miss the inactive case usually handled by
event_function_call().
Therefore construct a blend of event_function_call() and
event_function() that handles the cases relevant to
event_function_local().
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # 4.5+
Fixes:
fae3fde65138 ("perf: Collapse and fix event_function_call() users")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Mon, 15 Aug 2016 10:17:00 +0000 (12:17 +0200)]
x86/smp: Fix __max_logical_packages value setup
Frank reported kernel panic when he disabled several cores in BIOS
via following option:
Core Disable Bitmap(Hex) [0]
with number 0xFFE, which leaves 16 CPUs in system (out of 48).
The kernel panic below goes along with following messages:
smpboot: Max logical packages: 2^M
smpboot: APIC(0) Converting physical 0 to logical package 0^M
smpboot: APIC(20) Converting physical 1 to logical package 1^M
smpboot: APIC(40) Package 2 exceeds logical package map^M
smpboot: CPU 8 APICId 40 disabled^M
smpboot: APIC(60) Package 3 exceeds logical package map^M
smpboot: CPU 12 APICId 60 disabled^M
...
general protection fault: 0000 [#1] SMP^M
Modules linked in:^M
CPU: 15 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc5+ #1^M
Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series BIOS 05/25/2016^M
task:
ffff8801673e0000 ti:
ffff8801673ac000 task.ti:
ffff8801673ac000^M
RIP: 0010:[<
ffffffff81014d54>] [<
ffffffff81014d54>] uncore_change_context+0xd4/0x180^M
...
[<
ffffffff810158ac>] uncore_event_init_cpu+0x6c/0x70^M
[<
ffffffff81d8c91c>] intel_uncore_init+0x1c2/0x2dd^M
[<
ffffffff81d8c75a>] ? uncore_cpu_setup+0x17/0x17^M
[<
ffffffff81002190>] do_one_initcall+0x50/0x190^M
[<
ffffffff810ab193>] ? parse_args+0x293/0x480^M
[<
ffffffff81d87365>] kernel_init_freeable+0x1a5/0x249^M
[<
ffffffff81d86a35>] ? set_debug_rodata+0x12/0x12^M
[<
ffffffff816dc19e>] kernel_init+0xe/0x110^M
[<
ffffffff816e93bf>] ret_from_fork+0x1f/0x40^M
[<
ffffffff816dc190>] ? rest_init+0x80/0x80^M
The reason for the panic is wrong value of __max_logical_packages,
which lets logical_package_map uninitialized and the uncore code
relying on this map being properly initialized (maybe we should
add some safety checks there as well).
The __max_logical_packages is computed as:
DIV_ROUND_UP(total_cpus, ncpus);
- ncpus being number of cores
With above BIOS setup we get total_cpus == 16 which set
__max_logical_packages to 2 (ncpus is 12).
Once topology_update_package_map processes CPU with logical
pkg over 2 we display above messages and fail to initialize
the physical_to_logical_pkg map, which makes the uncore code
crash.
The fix is to remove logical_package_map bitmap completely
and keep and update the logical_packages number instead.
After we enumerate all the present CPUs, we check if the
enumerated logical packages count is within its computed
maximum from BIOS data.
If it's not the case, we set this maximum to the new enumerated
value and freeze any new addition of logical packages.
The freeze is because lot of init code like uncore/rapl/cqm
depends on having maximum logical package value set to allocate
their data, so we can't change it later on.
Prarit Bhargava tested the patch and confirms that it solves
the problem:
From dmidecode:
Core Count: 24
Core Enabled: 24
Thread Count: 48
Orig kernel boot log:
[ 0.464981] smpboot: Max logical packages: 19
[ 0.469861] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.477261] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.484760] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.492258] smpboot: APIC(c0) Converting physical 3 to logical package 3
1. nr_cpus=8, should stop enumerating in package 0:
[ 0.533664] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.539596] smpboot: Max logical packages: 19
2. max_cpus=8, should still enumerate all packages:
[ 0.526494] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.532428] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.538456] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.544486] smpboot: APIC(c0) Converting physical 3 to logical package 3
[ 0.550524] smpboot: Max logical packages: 19
3. nr_cpus=49 ( 2 socket + 1 core on 3rd socket), should stop enumerating in
package 2:
[ 0.521378] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.527314] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.533345] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.539368] smpboot: Max logical packages: 19
4. maxcpus=49, should still enumerate all packages:
[ 0.525591] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.531525] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.537547] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.543579] smpboot: APIC(c0) Converting physical 3 to logical package 3
[ 0.549624] smpboot: Max logical packages: 19
5. kdump (nr_cpus=1) works as well.
Reported-by: Frank Ramsay <framsay@redhat.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160815101700.GA30090@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Borislav Petkov [Wed, 17 Aug 2016 11:33:14 +0000 (13:33 +0200)]
x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
Similar to:
efaad554b4ff ("x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y")
... fix microcode loading from the initrd on AMD by adding the
randomization offset to the microcode patch container within the initrd.
Reported-and-tested-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-tip-commits@vger.kernel.org
Link: http://lkml.kernel.org/r/20160817113314.GA19221@nazgul.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Wed, 17 Aug 2016 15:36:29 +0000 (17:36 +0200)]
uprobes: Fix the memcg accounting
__replace_page() wronlgy calls mem_cgroup_cancel_charge() in "success" path,
it should only do this if page_check_address() fails.
This means that every enable/disable leads to unbalanced mem_cgroup_uncharge()
from put_page(old_page), it is trivial to underflow the page_counter->count
and trigger OOM.
Reported-and-tested-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: stable@vger.kernel.org # 3.17+
Fixes:
00501b531c47 ("mm: memcontrol: rewrite charge API")
Link: http://lkml.kernel.org/r/20160817153629.GB29724@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Thu, 18 Aug 2016 02:51:27 +0000 (12:51 +1000)]
Merge branch 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Single 64-bit gart size fix.
* 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: Change GART offset to 64-bit
Rafael J. Wysocki [Thu, 18 Aug 2016 01:27:08 +0000 (03:27 +0200)]
Merge branch 'pm-sleep'
* pm-sleep:
PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
x86/power/64: Use __pa() for physical address computation
PM / sleep: Update some system sleep documentation
Linus Torvalds [Thu, 18 Aug 2016 00:26:58 +0000 (17:26 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
Fietkau.
2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.
3) Fix some tg3 ethtool logic bugs, and one that would cause no
interrupts to be generated when rx-coalescing is set to 0. From
Satish Baddipadige and Siva Reddy Kallam.
4) QLCNIC mailbox corruption and napi budget handling fix from Manish
Chopra.
5) Fix fib_trie logic when walking the trie during /proc/net/route
output than can access a stale node pointer. From David Forster.
6) Several sctp_diag fixes from Phil Sutter.
7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.
8) Checksum fixup fixes in bpf from Daniel Borkmann.
9) Memork leaks in nfnetlink, from Liping Zhang.
10) Use after free in rxrpc, from David Howells.
11) Use after free in new skb_array code of macvtap driver, from Jason
Wang.
12) Calipso resource leak, from Colin Ian King.
13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.
14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.
15) Fix lockdep splats in macsec, from Sabrina Dubroca.
16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
handling.
17) Various tc-action bug fixes, from CONG Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net_sched: allow flushing tc police actions
net_sched: unify the init logic for act_police
net_sched: convert tcf_exts from list to pointer array
net_sched: move tc offload macros to pkt_cls.h
net_sched: fix a typo in tc_for_each_action()
net_sched: remove an unnecessary list_del()
net_sched: remove the leftover cleanup_a()
mlxsw: spectrum: Allow packets to be trapped from any PG
mlxsw: spectrum: Unmap 802.1Q FID before destroying it
mlxsw: spectrum: Add missing rollbacks in error path
mlxsw: reg: Fix missing op field fill-up
mlxsw: spectrum: Trap loop-backed packets
mlxsw: spectrum: Add missing packet traps
mlxsw: spectrum: Mark port as active before registering it
mlxsw: spectrum: Create PVID vPort before registering netdevice
mlxsw: spectrum: Remove redundant errors from the code
mlxsw: spectrum: Don't return upon error in removal path
i40e: check for and deal with non-contiguous TCs
ixgbe: Re-enable ability to toggle VLAN filtering
ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
...
David S. Miller [Wed, 17 Aug 2016 23:27:58 +0000 (19:27 -0400)]
Merge branch 'tc_action-fixes'
Cong Wang says:
====================
net_sched: tc action fixes and updates
This patchset fixes a few regressions caused by the previous
code refactor and more. Thanks to Jamal for catching them!
Note, patch 3/7 and 4/7 are not strictly necessary for this patchset,
I just want to carry them together.
---
v4: adjust an indention for Jamal
add two more patches
v3: avoid list for fast path, suggested by Jamal
v2: replace flex_array with regular dynamic array
keep tcf_action_stats_update() in act_api.h
fix macro typos found by Amir
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Sun, 14 Aug 2016 05:35:02 +0000 (22:35 -0700)]
net_sched: allow flushing tc police actions
The act_police uses its own code to walk the
action hashtable, which leads to that we could
not flush standalone tc police actions, so just
switch to tcf_generic_walker() like other actions.
(Joint work from Roman and Cong.)
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:35:01 +0000 (22:35 -0700)]
net_sched: unify the init logic for act_police
Jamal reported a crash when we create a police action
with a specific index, this is because the init logic
is not correct, we should always create one for this
case. Just unify the logic with other tc actions.
Fixes:
a03e6fe56971 ("act_police: fix a crash during removal")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:35:00 +0000 (22:35 -0700)]
net_sched: convert tcf_exts from list to pointer array
As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to an array of pointers.
The "ugly" part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the array of pointers to a list, instead of
relying on the C99 feature to iterate the array.
Fixes:
a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:34:59 +0000 (22:34 -0700)]
net_sched: move tc offload macros to pkt_cls.h
struct tcf_exts belongs to filters, should not be visible
to plain tc actions.
Cc: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:34:58 +0000 (22:34 -0700)]
net_sched: fix a typo in tc_for_each_action()
It is harmless because all users pass 'a' to this macro.
Fixes:
00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef")
Cc: Amir Vadai <amir@vadai.me>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:34:57 +0000 (22:34 -0700)]
net_sched: remove an unnecessary list_del()
This list_del() for tc action is not needed actually,
because we only use this list to chain bulk operations,
therefore should not be carried for latter operations.
Fixes:
ec0595cc4495 ("net_sched: get rid of struct tcf_common")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:34:56 +0000 (22:34 -0700)]
net_sched: remove the leftover cleanup_a()
After refactoring tc_action into tcf_common, we no
longer need to cleanup temporary "actions" in list,
they are permanently stored in the hashtable.
Fixes:
a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 Aug 2016 23:20:24 +0000 (19:20 -0400)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2016-08-16
This series contains fixes to e1000e, igb, ixgbe and i40e.
Kshitiz Gupta provides a fix for igb to resolve the PHY delay compensation
math in several functions.
Jarod Wilson provides a fix for e1000e which had to broken up into 2
patches, first is prepares the driver for expanding the list of NICs
that have occasional ~10 hour clock jumps when being used for PTP.
Second patch actually fixes i218 silicon which has been experiencing
the clock jumps while using PTP.
Alex provides 2 patches for ixgbe now that he is back at Intel. First
fixes setting VLNCTRL.VFE bit, which was left unchanged in earlier patches
which resulted in disabling VLAN filtering for all the VFs. Second
corrects the support for disabling the VLAN tag filtering via the
feature bit.
Lastly, David fixes i40e which was causing a kernel panic when
non-contiguous traffic classes or traffic classes not starting with TC0,
were configured on a link partner switch. To fix this, changed the
logic when determining the total number of TCs enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 Aug 2016 23:18:34 +0000 (19:18 -0400)]
Merge branch 'mlxsw-fixes'
Jiri Pirko says:
====================
mlxsw: IPv4 UC router fixes
Ido says:
Patches 1-3 fix a long standing problem in the driver's init sequence,
which manifests itself quite often when routing daemons try to configure
an IP address on registered netdevs that don't yet have an associated
vPort.
Patches 4-9 add missing packet traps for the router to work properly and
also fix ordering issue following the recent changes to the driver's init
sequence.
The last patch isn't related to the router, but fixes a general problem
in which under certain conditions packets aren't trapped to CPU.
v1->v2:
- Change order of patch 7
- Add patch 6 following Ilan's comment
- Add patchset name and cover letter
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Aug 2016 14:39:37 +0000 (16:39 +0200)]
mlxsw: spectrum: Allow packets to be trapped from any PG
When packets enter the device they are classified to a priority group
(PG) buffer based on their PCP value. After their egress port and
traffic class are determined they are moved to the switch's shared
buffer and await transmission, if:
(Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres &&
Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres)
||
(Ingress{Port}.Usage < Min || Ingress{Port,PG} < Min ||
Egress{Port}.Usage < Min || Egress{Port,TC}.Usage < Min)
Packets scheduled to transmission through CPU port (trapped to CPU) use
traffic class 7, which has a zero maximum and minimum quotas. However,
when such packets arrive from PG 0 they are admitted to the shared
buffer as PG 0 has a non-zero minimum quota.
Allow all packets to be trapped to the CPU - regardless of the PG they
were classified to - by assigning a 10KB minimum quota for CPU port and
TC7.
Fixes:
8e8dfe9fdf06 ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Tested-by: Tamir Winetroub <tamirw@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Aug 2016 14:39:36 +0000 (16:39 +0200)]
mlxsw: spectrum: Unmap 802.1Q FID before destroying it
Before destroying the 802.1Q FID we should first remove the VID-to-FID
mapping. This makes mlxsw_sp_fid_destroy() symmetric with regards to
mlxsw_sp_fid_create().
Fixes:
14d39461b3f4 ("mlxsw: spectrum: Use per-FID struct for the VLAN-aware bridge")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Aug 2016 14:39:35 +0000 (16:39 +0200)]
mlxsw: spectrum: Add missing rollbacks in error path
While going over the code I noticed we are missing two rollbacks in the
port's creation error path. Add them and adjust the place of one of them
in the port's removal sequence so that both are symmetric.
Fixes:
56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 17 Aug 2016 14:39:34 +0000 (16:39 +0200)]
mlxsw: reg: Fix missing op field fill-up
Ralue pack function needs to set op, otherwise it is 0 for add always.
Fixes:
d5a1c749d22 ("mlxsw: reg: Add Router Algorithmic LPM Unicast Entry Register definition")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Aug 2016 14:39:33 +0000 (16:39 +0200)]
mlxsw: spectrum: Trap loop-backed packets
One of the conditions to generate an ICMP Redirect Message is that "the
packet is being forwarded out the same physical interface that it was
received from" (RFC 1812).
Therefore, we need to be able to trap such packets and let the kernel
decide what to do with them.
For each RIF, enable the loop-back filter, which will raise the LBERROR
trap whenever the ingress RIF equals the egress RIF.
Fixes:
99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces")
Reported-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Wed, 17 Aug 2016 14:39:32 +0000 (16:39 +0200)]
mlxsw: spectrum: Add missing packet traps
Add the following traps:
1) MTU Error: Trap packets whose size is bigger than the egress RIF's
MTU. If DF bit isn't set, traffic will continue to be routed in slow
path.
2) TTL Error: Trap packets whose TTL expired. This allows traceroute to
work properly.
3) OSPF packets.
Fixes:
7b27ce7bb9cd ("mlxsw: spectrum: Add traps needed for router implementation")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>