Konrad Rzeszutek Wilk [Mon, 6 Jan 2014 15:40:36 +0000 (10:40 -0500)]
xen/grant: Implement an grant frame array struct (v3).
The 'xen_hvm_resume_frames' used to be an 'unsigned long'
and contain the virtual address of the grants. That was OK
for most architectures (PVHVM, ARM) were the grants are contiguous
in memory. That however is not the case for PVH - in which case
we will have to do a lookup for each virtual address for the PFN.
Instead of doing that, lets make it a structure which will contain
the array of PFNs, the virtual address and the count of said PFNs.
Also provide a generic functions: gnttab_setup_auto_xlat_frames and
gnttab_free_auto_xlat_frames to populate said structure with
appropriate values for PVHVM and ARM.
To round it off, change the name from 'xen_hvm_resume_frames' to
a more descriptive one - 'xen_auto_xlat_grant_frames'.
For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
we will populate the 'xen_auto_xlat_grant_frames' by ourselves.
v2 moves the xen_remap in the gnttab_setup_auto_xlat_frames
and also introduces xen_unmap for gnttab_free_auto_xlat_frames.
Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v3: Based on top of 'asm/xen/page.h: remove redundant semicolon']
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Konrad Rzeszutek Wilk [Tue, 31 Dec 2013 21:33:31 +0000 (16:33 -0500)]
xen/grant-table: Refactor gnttab_init
We have this odd scenario of where for PV paths we take a shortcut
but for the HVM paths we first ioremap xen_hvm_resume_frames, then
assign it to gnttab_shared.addr. This is needed because gnttab_map
uses gnttab_shared.addr.
Instead of having:
if (pv)
return gnttab_map
if (hvm)
...
gnttab_map
Lets move the HVM part before the gnttab_map and remove the
first call to gnttab_map.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Konrad Rzeszutek Wilk [Tue, 31 Dec 2013 20:55:39 +0000 (15:55 -0500)]
xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
The function gnttab_max_grant_frames() returns the maximum amount
of frames (pages) of grants we can have. Unfortunatly it was
dependent on gnttab_init() having been run before to initialize
the boot max value (boot_max_nr_grant_frames).
This meant that users of gnttab_max_grant_frames would always
get a zero value if they called before gnttab_init() - such as
'platform_pci_init' (drivers/xen/platform-pci.c).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Mukesh Rathor [Wed, 11 Dec 2013 20:36:51 +0000 (15:36 -0500)]
xen/pvh: Piggyback on PVHVM for event channels (v2)
PVH is a PV guest with a twist - there are certain things
that work in it like HVM and some like PV. There is
a similar mode - PVHVM where we run in HVM mode with
PV code enabled - and this patch explores that.
The most notable PV interfaces are the XenBus and event channels.
We will piggyback on how the event channel mechanism is
used in PVHVM - that is we want the normal native IRQ mechanism
and we will install a vector (hvm callback) for which we
will call the event channel mechanism.
This means that from a pvops perspective, we can use
native_irq_ops instead of the Xen PV specific. Albeit in the
future we could support pirq_eoi_map. But that is
a feature request that can be shared with PVHVM.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Mukesh Rathor [Fri, 13 Dec 2013 18:02:58 +0000 (13:02 -0500)]
xen/pvh: Update E820 to work with PVH (v2)
In xen_add_extra_mem() we can skip updating P2M as it's managed
by Xen. PVH maps the entire IO space, but only RAM pages need
to be repopulated.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Mukesh Rathor [Fri, 13 Dec 2013 16:48:08 +0000 (11:48 -0500)]
xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
The VCPU bringup protocol follows the PV with certain twists.
From xen/include/public/arch-x86/xen.h:
Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
for HVM and PVH guests, not all information in this structure is updated:
- For HVM guests, the structures read include: fpu_ctxt (if
VGCT_I387_VALID is set), flags, user_regs, debugreg[*]
- PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
set cr3. All other fields not used should be set to 0.
This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
can load per-cpu data-structures. It has no effect on the VCPU0.
We also piggyback on the %rdi register to pass in the CPU number - so
that when we bootup a new CPU, the cpu_bringup_and_idle will have
passed as the first parameter the CPU number (via %rdi for 64-bit).
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mukesh Rathor [Fri, 13 Dec 2013 18:03:37 +0000 (13:03 -0500)]
xen/pvh: Load GDT/GS in early PV bootup code for BSP.
During early bootup we start life using the Xen provided
GDT, which means that we are running with %cs segment set
to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261).
But for PVH we want to be use HVM type mechanism for
segment operations. As such we need to switch to the HVM
one and also reload ourselves with the __KERNEL_CS:eip
to run in the proper GDT and segment.
For HVM this is usually done in 'secondary_startup_64' in
(head_64.S) but since we are not taking that bootup
path (we start in PV - xen_start_kernel) we need to do
that in the early PV bootup paths.
For good measure we also zero out the %fs, %ds, and %es
(not strictly needed as Xen has already cleared them
for us). The %gs is loaded by 'switch_to_new_gdt'.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Mukesh Rathor [Tue, 31 Dec 2013 19:02:44 +0000 (14:02 -0500)]
xen/pvh: Setup up shared_info.
For PVHVM the shared_info structure is provided via the same way
as for normal PV guests (see include/xen/interface/xen.h).
That is during bootup we get 'xen_start_info' via the %esi register
in startup_xen. Then later we extract the 'shared_info' from said
structure (in xen_setup_shared_info) and start using it.
The 'xen_setup_shared_info' is all setup to work with auto-xlat
guests, but there are two functions which it calls that are not:
xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
This patch modifies the P2M code (xen_setup_mfn_list_list)
while the "Piggyback on PVHVM for event channels" modifies
the xen_setup_vcpu_info_placement.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mukesh Rathor [Fri, 3 Jan 2014 14:48:08 +0000 (09:48 -0500)]
xen/pvh/mmu: Use PV TLB instead of native.
We also optimize one - the TLB flush. The native operation would
needlessly IPI offline VCPUs causing extra wakeups. Using the
Xen one avoids that and lets the hypervisor determine which
VCPU needs the TLB flush.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mukesh Rathor [Tue, 31 Dec 2013 17:41:27 +0000 (12:41 -0500)]
xen/pvh: MMU changes for PVH (v2)
.. which are surprisingly small compared to the amount for PV code.
PVH uses mostly native mmu ops, we leave the generic (native_*) for
the majority and just overwrite the baremetal with the ones we need.
At startup, we are running with pre-allocated page-tables
courtesy of the tool-stack. But we still need to graft them
in the Linux initial pagetables. However there is no need to
unpin/pin and change them to R/O or R/W.
Note that the xen_pagetable_init due to
7836fec9d0994cc9c9150c5a33f0eb0eb08a335a
"xen/mmu/p2m: Refactor the xen_pagetable_init code." does not
need any changes - we just need to make sure that xen_post_allocator_init
does not alter the pvops from the default native one.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Konrad Rzeszutek Wilk [Fri, 3 Jan 2014 19:08:39 +0000 (14:08 -0500)]
xen/mmu: Cleanup xen_pagetable_p2m_copy a bit.
Stefano noticed that the code runs only under 64-bit so
the comments about 32-bit are pointless.
Also we change the condition for xen_revector_p2m_tree
returning the same value (because it could not allocate
a swath of space to put the new P2M in) or it had been
called once already. In such we return early from the
function.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Konrad Rzeszutek Wilk [Tue, 31 Dec 2013 17:37:52 +0000 (12:37 -0500)]
xen/mmu/p2m: Refactor the xen_pagetable_init code (v2).
The revectoring and copying of the P2M only happens when
!auto-xlat and on 64-bit builds. It is not obvious from
the code, so lets have seperate 32 and 64-bit functions.
We also invert the check for auto-xlat to make the code
flow simpler.
Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Sun, 15 Dec 2013 17:37:46 +0000 (12:37 -0500)]
xen/pvh: Don't setup P2M tree.
P2M is not available for PVH. Fortunatly for us the
P2M code already has mostly the support for auto-xlat guest thanks to
commit
3d24bbd7dddbea54358a9795abaf051b0f18973c
"grant-table: call set_phys_to_machine after mapping grant refs"
which: "
introduces set_phys_to_machine calls for auto_translated guests
(even on x86) in gnttab_map_refs and gnttab_unmap_refs.
translated by swiotlb-xen... " so we don't need to muck much.
with above mentioned "commit you'll get set_phys_to_machine calls
from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do
anything with them " (Stefano Stabellini) which is OK - we want
them to be NOPs.
This is because we assume that an "IOMMU is always present on the
plaform and Xen is going to make the appropriate IOMMU pagetable
changes in the hypercall implementation of GNTTABOP_map_grant_ref
and GNTTABOP_unmap_grant_ref, then eveything should be transparent
from PVH priviligied point of view and DMA transfers involving
foreign pages keep working with no issues[sp]
Otherwise we would need a P2M (and an M2P) for PVH priviligied to
track these foreign pages .. (see arch/arm/xen/p2m.c)."
(Stefano Stabellini).
We still have to inhibit the building of the P2M tree.
That had been done in the past by not calling
xen_build_dynamic_phys_to_machine (which setups the P2M tree
and gives us virtual address to access them). But we are missing
a check for xen_build_mfn_list_list - which was continuing to setup
the P2M tree and would blow up at trying to get the virtual
address of p2m_missing (which would have been setup by
xen_build_dynamic_phys_to_machine).
Hence a check is needed to not call xen_build_mfn_list_list when
running in auto-xlat mode.
Instead of replicating the check for auto-xlat in enlighten.c
do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
is called also in xen_arch_post_suspend without any checks for
auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
allocate space for an P2M tree.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Mukesh Rathor [Fri, 13 Dec 2013 17:45:31 +0000 (12:45 -0500)]
xen/pvh: Early bootup changes in PV code (v4).
We don't use the filtering that 'xen_cpuid' is doing
because the hypervisor treats 'XEN_EMULATE_PREFIX' as
an invalid instruction. This means that all of the filtering
will have to be done in the hypervisor/toolstack.
Without the filtering we expose to the guest the:
- cpu topology (sockets, cores, etc);
- the APERF (which the generic scheduler likes to
use), see
5e626254206a709c6e937f3dda69bf26c7344f6f
"xen/setup: filter APERFMPERF cpuid feature out"
- and the inability to figure out whether MWAIT_LEAF
should be exposed or not. See
df88b2d96e36d9a9e325bfcd12eb45671cbbc937
"xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded."
- x2apic, see
4ea9b9aca90cfc71e6872ed3522356755162932c
"xen: mask x2APIC feature in PV"
We also check for vector callback early on, as it is a required
feature. PVH also runs at default kernel IOPL.
Finally, pure PV settings are moved to a separate function that are
only called for pure PV, ie, pv with pvmmu. They are also #ifdef
with CONFIG_XEN_PVMMU.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Mukesh Rathor [Fri, 13 Dec 2013 17:39:56 +0000 (12:39 -0500)]
xen/pvh/x86: Define what an PVH guest is (v3).
Which is a PV guest with auto page translation enabled
and with vector callback. It is a cross between PVHVM and PV.
The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):
"* the guest uses auto translate:
- p2m is managed by Xen
- pagetables are owned by the guest
- mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen. From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."
Also we use the PV cpuid, albeit we can use the HVM (native) cpuid.
However, we do have a fair bit of filtering in the xen_cpuid and
we can piggyback on that until the hypervisor/toolstack filters
the appropiate cpuids. Once that is done we can swap over to
use the native one.
We setup a Kconfig entry that is disabled by default and
cannot be enabled.
Note that on ARM the concept of PVH is non-existent. As Ian
put it: "an ARM guest is neither PV nor HVM nor PVHVM.
It's a bit like PVH but is different also (it's further towards
the H end of the spectrum than even PVH).". As such these
options (PVHVM, PVH) are never enabled nor seen on ARM
compilations.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mukesh Rathor [Fri, 13 Dec 2013 17:09:28 +0000 (12:09 -0500)]
xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn.
Most of the functions in page.h are prefaced with
if (xen_feature(XENFEAT_auto_translated_physmap))
return mfn;
Except the mfn_to_local_pfn. At a first sight, the function
should work without this patch - as the 'mfn_to_mfn' has
a similar check. But there are no such check in the
'get_phys_to_machine' function - so we would crash in there.
This fixes it by following the convention of having the
check for auto-xlat in these static functions.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
David Vrabel [Fri, 15 Mar 2013 13:02:35 +0000 (13:02 +0000)]
xen/events: use the FIFO-based ABI if available
Implement all the event channel port ops for the FIFO-based ABI.
If the hypervisor supports the FIFO-based ABI, enable it by
initializing the control block for the boot VCPU and subsequent VCPUs
as they are brought up and on resume. The event array is expanded as
required when event ports are setup.
The 'xen.fifo_events=0' command line option may be used to disable use
of the FIFO-based ABI.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Mon, 23 Sep 2013 11:52:21 +0000 (12:52 +0100)]
xen/x86: set VIRQ_TIMER priority to maximum
Commit
bee980d9e (xen/events: Handle VIRQ_TIMER before any other hardirq
in event loop) effectively made the VIRQ_TIMER the highest priority event
when using the 2-level ABI.
Set the VIRQ_TIMER priority to the highest so this behaviour is retained
when using the FIFO-based ABI.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Mon, 23 Sep 2013 11:47:26 +0000 (12:47 +0100)]
xen/events: allow event channel priority to be set
Add xen_irq_set_priority() to set an event channels priority. This function
will only work with event channel ABIs that support priority (i.e., the
FIFO-based ABI).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Fri, 15 Mar 2013 10:55:41 +0000 (10:55 +0000)]
xen/events: Add the hypervisor interface for the FIFO-based event channels
Add the hypercall sub-ops and the structures for the shared data used
in the FIFO-based event channel ABI.
The design document for this new ABI is available here:
http://xenbits.xen.org/people/dvrabel/event-channels-H.pdf
In summary, events are reported using a per-domain shared event array
of event words. Each event word has PENDING, LINKED and MASKED bits
and a LINK field for pointing to the next event in the event queue.
There are 16 event queues (with different priorities) per-VCPU.
Key advantages of this new ABI include:
- Support for over 100,000 events (2^17).
- 16 different event priorities.
- Improved fairness in event latency through the use of FIFOs.
The ABI is available in Xen 4.4 and later.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Mon, 23 Sep 2013 20:03:38 +0000 (21:03 +0100)]
xen/evtchn: support more than 4096 ports
Remove the check during unbind for NR_EVENT_CHANNELS as this limits
support to less than 4096 ports.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Thu, 5 Sep 2013 17:11:38 +0000 (18:11 +0100)]
xen/events: add xen_evtchn_mask_all()
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Thu, 17 Oct 2013 14:23:15 +0000 (15:23 +0100)]
xen/events: Refactor evtchn_to_irq array to be dynamically allocated
Refactor static array evtchn_to_irq array to be dynamically allocated by
implementing get and set functions for accesses to the array.
Two new port ops are added: max_channels (maximum supported number of
event channels) and nr_channels (number of currently usable event
channels). For the 2-level ABI, these numbers are both the same as
the shared data structure is a fixed size. For the FIFO ABI, these
will be different as the event array is expanded dynamically.
This allows more than 65000 event channels so an unsigned short is no
longer sufficient for an event channel port number and unsigned int is
used instead.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Mon, 18 Mar 2013 16:54:57 +0000 (16:54 +0000)]
xen/events: add a evtchn_op for port setup
Add a hook for port-specific setup and call it from
xen_irq_info_common_setup().
The FIFO-based ABIs may need to perform additional setup (expanding
the event array) before a bound event channel can start to receive
events.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Mon, 18 Mar 2013 15:50:17 +0000 (15:50 +0000)]
xen/events: allow setup of irq_info to fail
The FIFO-based event ABI requires additional setup of newly bound
events (it may need to expand the event array) and this setup may
fail.
xen_irq_info_common_init() is a useful place to put this setup so
allow this call to fail. This call and the other similar calls are
renamed to be *_setup() to reflect that they may now fail.
This failure can only occur with new event channels not on rebind.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Thu, 14 Mar 2013 12:49:19 +0000 (12:49 +0000)]
xen/events: add struct evtchn_ops for the low-level port operations
evtchn_ops contains the low-level operations that access the shared
data structures. This allows alternate ABIs to be supported.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Wed, 13 Mar 2013 15:29:25 +0000 (15:29 +0000)]
xen/events: move 2-level specific code into its own file
In preparation for alternative event channel ABIs, move all the
functions accessing the shared data structures into their own file.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Wed, 7 Aug 2013 13:32:12 +0000 (14:32 +0100)]
xen/events: move drivers/xen/events.c into drivers/xen/events/
events.c will be split into multiple files so move it into its own
directory.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Wei Liu [Thu, 7 Mar 2013 15:50:28 +0000 (15:50 +0000)]
xen/events: replace raw bit ops with functions
In preparation for adding event channel port ops, use set_evtchn()
instead of sync_set_bit().
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Wei Liu [Thu, 7 Mar 2013 15:50:27 +0000 (15:50 +0000)]
xen/events: introduce test_and_set_mask()
In preparation for adding event channel port ops, add
test_and_set_mask().
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Wed, 13 Mar 2013 13:20:52 +0000 (13:20 +0000)]
xen/events: remove unnecessary init_evtchn_cpu_bindings()
Because the guest-side binding of an event to a VCPU (i.e., setting
the local per-cpu masks) is always explicitly done after an event
channel is bound to a port, there is no need to initialize all
possible events as bound to VCPU 0 at start of day or after a resume.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
David Vrabel [Tue, 12 Mar 2013 18:28:04 +0000 (18:28 +0000)]
xen/events: refactor retrigger_dynirq() and resend_irq_on_evtchn()
These two function did the same thing with different parameters, put
the common bits in retrigger_evtchn().
This changes the return value of resend_irq_on_evtchn() but the only
caller (in arch/ia64/xen/irq_xen.c) ignored the return value so this
is fine.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Ben Hutchings [Tue, 31 Dec 2013 19:46:27 +0000 (20:46 +0100)]
xen/pci: Fix build on non-x86
We can't include <asm/pci_x86.h> if this isn't x86, and we only need
it if CONFIG_PCI_MMCONFIG is enabled.
Fixes:
8deb3eb1461e ('xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.')
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jie Liu [Wed, 13 Nov 2013 12:59:58 +0000 (20:59 +0800)]
xen: simplify balloon_first_page() with list_first_entry_or_null()
Replace the code logic at balloon_first_page() by calling
list_first_entry_or_null() directly. since here is only
one user of that routine, therefore we can just remove it.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Wei Liu [Fri, 3 Jan 2014 14:03:35 +0000 (14:03 +0000)]
asm/xen/page.h: remove redundant semicolon
Signed-off-by: Wei Liu <liuw@liuw.name>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Ian Campbell [Wed, 11 Dec 2013 12:03:17 +0000 (12:03 +0000)]
xen: balloon: enable for ARM
Since
c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of
existing RAM pages" the balloon driver appears to work fine on ARM as far as I
can tell. Prior to that commit it was broken because on ARM RAM doesn't
typically start at zero, effectively leaving a big MMIO hole at the start.
This would cause the balloon driver to give away all of RAM at start of day,
which is rather inconvenient.
It was already enabled (or rather not excluded) on ARM64. The
c1d15f5c8bc1170dafe16e988e55437245966dfe
"xen/balloon: Seperate the auto-translate logic properly (v2)"
added in the proper plumbing to work with ARM and PVH type guests.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v2: Added the bit about PVH]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 11 Dec 2013 20:22:01 +0000 (15:22 -0500)]
xen/pvhvm: Remove the xen_platform_pci int.
Since we have xen_has_pv_devices,xen_has_pv_disk_devices,
xen_has_pv_nic_devices, and xen_has_pv_and_legacy_disk_devices
to figure out the different 'unplug' behaviors - lets
use those instead of this single int.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 26 Nov 2013 20:05:40 +0000 (15:05 -0500)]
xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4).
The user has the option of disabling the platform driver:
00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
which is used to unplug the emulated drivers (IDE, Realtek 8169, etc)
and allow the PV drivers to take over. If the user wishes
to disable that they can set:
xen_platform_pci=0
(in the guest config file)
or
xen_emul_unplug=never
(on the Linux command line)
except it does not work properly. The PV drivers still try to
load and since the Xen platform driver is not run - and it
has not initialized the grant tables, most of the PV drivers
stumble upon:
input: Xen Virtual Keyboard as /devices/virtual/input/input5
input: Xen Virtual Pointer as /devices/virtual/input/input6M
------------[ cut here ]------------
kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206!
invalid opcode: 0000 [#1] SMP
Modules linked in: xen_kbdfront(+) xenfs xen_privcmd
CPU: 6 PID: 1389 Comm: modprobe Not tainted
3.13.0-rc1upstream-00021-ga6c892b-dirty #1
Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013
RIP: 0010:[<
ffffffff813ddc40>] [<
ffffffff813ddc40>] get_free_entries+0x2e0/0x300
Call Trace:
[<
ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240
[<
ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70
[<
ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront]
[<
ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront]
[<
ffffffff813e5757>] xenbus_dev_probe+0x77/0x130
[<
ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50
[<
ffffffff8145e9a9>] driver_probe_device+0x89/0x230
[<
ffffffff8145ebeb>] __driver_attach+0x9b/0xa0
[<
ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
[<
ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
[<
ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0
[<
ffffffff8145e7d9>] driver_attach+0x19/0x20
[<
ffffffff8145e260>] bus_add_driver+0x1a0/0x220
[<
ffffffff8145f1ff>] driver_register+0x5f/0xf0
[<
ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20
[<
ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40
[<
ffffffffa0015000>] ? 0xffffffffa0014fff
[<
ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront]
[<
ffffffff81002049>] do_one_initcall+0x49/0x170
.. snip..
which is hardly nice. This patch fixes this by having each
PV driver check for:
- if running in PV, then it is fine to execute (as that is their
native environment).
- if running in HVM, check if user wanted 'xen_emul_unplug=never',
in which case bail out and don't load any PV drivers.
- if running in HVM, and if PCI device 5853:0001 (xen_platform_pci)
does not exist, then bail out and not load PV drivers.
- (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks',
then bail out for all PV devices _except_ the block one.
Ditto for the network one ('nics').
- (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary'
then load block PV driver, and also setup the legacy IDE paths.
In (v3) make it actually load PV drivers.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it
Reported-by: Anthony PERARD <anthony.perard@citrix.com>
Reported-and-Tested-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Add extra logic to handle the myrid ways 'xen_emul_unplug'
can be used per Ian and Stefano suggestion]
[v3: Make the unnecessary case work properly]
[v4: s/disks/ide-disks/ spotted by Fabio]
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts]
CC: stable@vger.kernel.org
Linus Torvalds [Mon, 30 Dec 2013 00:01:33 +0000 (16:01 -0800)]
Linux 3.13-rc6
Linus Torvalds [Sun, 29 Dec 2013 21:49:51 +0000 (13:49 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Another smallish batch of fixes, it's been quiet due to the holidays.
Nothing controversial here, a handful of things across the board"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: pxa: fix USB gadget driver compilation regression
ARM: OMAP2+: Fix LCD panel backlight regression for LDP legacy booting
ARM: OMAP2+: hwmod_data: fix missing OMAP_INTC_START in irq data
ARM: DRA7: hwmod: Fix boot crash with DEBUG_LL
ARM: shmobile: r8a7790: fix shdi resource sizes
ARM: shmobile: bockw: fixup DMA mask
ARM: shmobile: armadillo: Add PWM backlight power supply
Linus Torvalds [Sun, 29 Dec 2013 21:35:04 +0000 (13:35 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"There is a small EFI fix and a big power regression fix in this batch.
My queue also had a fix for downing a CPU when there are insufficient
number of IRQ vectors available, but I'm holding that one for now due
to recent bug reports"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Don't select EFI from certain special ACPI drivers
x86 idle: Repair large-server 50-watt idle-power regression
Linus Torvalds [Sun, 29 Dec 2013 21:27:51 +0000 (13:27 -0800)]
Merge tag 'pm+acpi-3.13-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes and new device IDs from Rafael Wysocki:
- Fix for a cpufreq regression causing stale sysfs files to be left
behind during system resume if cpufreq_add_dev() fails for one or
more CPUs from Viresh Kumar.
- Fix for a bug in cpufreq causing CONFIG_CPU_FREQ_DEFAULT_* to be
ignored when the intel_pstate driver is used from Jason Baron.
- System suspend fix for a memory leak in pm_vt_switch_unregister()
that forgot to release objects after removing them from
pm_vt_switch_list. From Masami Ichikawa.
- Intel Valley View device ID and energy unit encoding update for the
(recently added) Intel RAPL (Running Average Power Limit) driver from
Jacob Pan.
- Intel Bay Trail SoC GPIO and ACPI device IDs for the Low Power
Subsystem (LPSS) ACPI driver from Paul Drews.
* tag 'pm+acpi-3.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap / RAPL: add support for ValleyView Soc
PM / sleep: Fix memory leak in pm_vt_switch_unregister().
cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers
cpufreq: remove sysfs files for CPUs which failed to come back after resume
ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs
Olof Johansson [Sat, 28 Dec 2013 23:38:32 +0000 (15:38 -0800)]
Merge tag 'omap-for-v3.13/intc-ldp-fix' of git://git./linux/kernel/git/tmlind/linux-omap into fixes
From Tony Lindgren:
Fix a regression for wrong interrupt numbers for some devices after
the sparse IRQ conversion, fix DRA7 console output for earlyprintk,
and fix the LDP LCD backlight when DSS is built into the kernel and
not as a loadable module.
* tag 'omap-for-v3.13/intc-ldp-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: Fix LCD panel backlight regression for LDP legacy booting
ARM: OMAP2+: hwmod_data: fix missing OMAP_INTC_START in irq data
ARM: DRA7: hwmod: Fix boot crash with DEBUG_LL
+ v3.13-rc5
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Sat, 28 Dec 2013 23:20:35 +0000 (15:20 -0800)]
Merge tag 'renesas-fixes2-for-v3.13' of git://git./linux/kernel/git/horms/renesas into fixes
From Simon Horman:
Second Round of Renesas ARM based SoC Fixes for v3.13
* r8a7790 (R-Car H2) based Lager board
- Correct SHDI resource sizes
This bug has been present since sdhi resources were added to the r8a7790 by
8c9b1aa41853272a ("ARM: shmobile: r8a7790: add MMCIF and SDHI DT
templates") in v3.11-rc2.
* r8a7778 (R-Car M1) based Bock-W board
- Correct DMA mask
This resolves a regression introduced by
4dcfa60071b3d23f
("ARM: DMA-API: better handing of DMA masks for coherent allocations")
in v3.12-rc1.
* r8a7740 (R-Mobile A1) based Armadillo board
- Add PWM backlight power supply
This resolves a regression introduced by
22ceeee16eb8f0d0
("pwm-backlight: Add power supply support") in v3.12.
* tag 'renesas-fixes2-for-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: shmobile: r8a7790: fix shdi resource sizes
ARM: shmobile: bockw: fixup DMA mask
ARM: shmobile: armadillo: Add PWM backlight power supply
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Walleij [Wed, 11 Dec 2013 08:48:58 +0000 (09:48 +0100)]
ARM: pxa: fix USB gadget driver compilation regression
After commit
88f718e3fa4d67f3a8dbe79a2f97d722323e4051
"ARM: pxa: delete the custom GPIO header" a compilation
error was introduced in the PXA25x gadget driver.
An attempt to fix the problem was made in
commit
b144e4ab1ef130e8bf30bcd3e529b7f35112c503
"usb: gadget: fix pxa25x compilation problems"
by explictly stating the driver needs the <mach/hardware.h>
header, which solved the compilation for a few boards,
such as the pxa255-idp and its defconfig.
However the Lubbock board has this special clause in
drivers/usb/gadget/pxa25x_udc.c:
This include file has an implicit dependency on
<mach/irqs.h> having been included before <mach/lubbock.h>
was included.
Before commit
88f718e3fa4d67f3a8dbe79a2f97d722323e4051
"ARM: pxa: delete the custom GPIO header" this implicit
dependency for the pxa25x_udc compile on the Lubbock was
satisfied by <linux/gpio.h> implicitly including
<mach/gpio.h> which was in turn including <mach/irqs.h>,
apart from the earlier added <mach/hardware.h>.
Fix this by having the PXA25x <mach/lubbock.h> explicitly
include <mach/irqs.h>.
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Greg Kroah-Hartmann <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Tony Lindgren [Fri, 27 Dec 2013 17:51:25 +0000 (09:51 -0800)]
Merge tag 'for-v3.13-rc/hwmod-fixes-b' of git://git./linux/kernel/git/pjw/omap-pending into debug-ll-and-ldp-backlight-fix
A few OMAP hwmod fixes for v3.13-rc. One patch fixes some IRQ
problems with GPMC, RNG, and ISP/IVA MMUs on OMAP2/3. The other fixes
some problems with DEBUG_LL on DRA7xx.
Basic build, boot, and PM test logs are available here:
http://www.pwsan.com/omap/testlogs/hwmod_fixes_b_v3.13-rc/
20131226021920/
Tony Lindgren [Fri, 27 Dec 2013 17:33:27 +0000 (09:33 -0800)]
ARM: OMAP2+: Fix LCD panel backlight regression for LDP legacy booting
Looks like the LCD panel on LDP has been broken quite a while, and
recently got fixed by commit
0b2aa8bed3e1 (gpio: twl4030: Fix regression
for twl gpio output). However, there's still an issue left where the panel
backlight does not come on if the LCD drivers are built into the
kernel.
Fix the issue by registering the DPI LCD panel only after the twl4030
GPIO has probed.
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
[tony@atomide.com: updated per Tomi's comments]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Rafael J. Wysocki [Thu, 26 Dec 2013 23:43:24 +0000 (00:43 +0100)]
Merge branches 'powercap' and 'acpi-lpss' with new device IDs
* powercap:
powercap / RAPL: add support for ValleyView Soc
* acpi-lpss:
ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs
Rafael J. Wysocki [Thu, 26 Dec 2013 23:42:27 +0000 (00:42 +0100)]
Merge branches 'pm-cpufreq' and 'pm-sleep' containing PM fixes
* pm-cpufreq:
cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers
cpufreq: remove sysfs files for CPUs which failed to come back after resume
* pm-sleep:
PM / sleep: Fix memory leak in pm_vt_switch_unregister().
Linus Torvalds [Thu, 26 Dec 2013 17:26:12 +0000 (09:26 -0800)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A collection of bug fixes destined for stable and some printk cleanups
and a patch so that instead of BUG'ing we use the ext4_error()
framework to mark the file system is corrupted"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: add explicit casts when masking cluster sizes
ext4: fix deadlock when writing in ENOSPC conditions
jbd2: rename obsoleted msg JBD->JBD2
jbd2: revise KERN_EMERG error messages
jbd2: don't BUG but return ENOSPC if a handle runs out of space
ext4: Do not reserve clusters when fs doesn't support extents
ext4: fix del_timer() misuse for ->s_err_report
ext4: check for overlapping extents in ext4_valid_extent_entries()
ext4: fix use-after-free in ext4_mb_new_blocks
ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails
Suman Anna [Mon, 23 Dec 2013 22:53:11 +0000 (16:53 -0600)]
ARM: OMAP2+: hwmod_data: fix missing OMAP_INTC_START in irq data
Commit
7d7e1eb (ARM: OMAP2+: Prepare for irqs.h removal) and commit
ec2c082 (ARM: OMAP2+: Remove hardcoded IRQs and enable SPARSE_IRQ)
updated the way interrupts for OMAP2/3 devices are defined in the
HWMOD data structures to being an index plus a fixed offset (defined
by OMAP_INTC_START).
Couple of irqs in the OMAP2/3 hwmod data were misconfigured completely
as they were missing this OMAP_INTC_START relative offset. Add this
offset back to fix the incorrect irq data for the following modules:
OMAP2 - GPMC, RNG
OMAP3 - GPMC, ISP MMU & IVA MMU
Signed-off-by: Suman Anna <s-anna@ti.com>
Fixes:
7d7e1eba7e92 ("ARM: OMAP2+: Prepare for irqs.h removal")
Fixes:
ec2c0825ca31 ("ARM: OMAP2+: Remove hardcoded IRQs and enable SPARSE_IRQ")
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Rajendra Nayak [Thu, 12 Dec 2013 09:52:49 +0000 (15:22 +0530)]
ARM: DRA7: hwmod: Fix boot crash with DEBUG_LL
With commit '
7dedd34: ARM: OMAP2+: hwmod: Fix a crash in _setup_reset() with
DEBUG_LL' we moved from parsing cmdline to identify uart used for earlycon
to using the requsite hwmod CONFIG_DEBUG_OMAPxUARTy FLAGS.
On DRA7 though, we seem to be missing this flag, and atleast on the DRA7 EVM
where we use uart1 for console, boot fails with DEBUG_LL enabled.
Reported-by: Lokesh Vutla <lokeshvutla@ti.com>
Tested-by: Lokesh Vutla <lokeshvutla@ti.com> # on a different base
Signed-off-by: Rajendra Nayak <rnayak@ti.com>
Fixes:
7dedd346941d ("ARM: OMAP2+: hwmod: Fix a crash in _setup_reset() with DEBUG_LL")
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Linus Torvalds [Tue, 24 Dec 2013 18:06:03 +0000 (10:06 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- fix for a memory leak on certain unplug events
- a collection of bcache fixes from Kent and Nicolas
- a few null_blk fixes and updates form Matias
- a marking of static of functions in the stec pci-e driver
* 'for-linus' of git://git.kernel.dk/linux-block:
null_blk: support submit_queues on use_per_node_hctx
null_blk: set use_per_node_hctx param to false
null_blk: corrections to documentation
null_blk: warning on ignored submit_queues param
null_blk: refactor init and init errors code paths
null_blk: documentation
null_blk: mem garbage on NUMA systems during init
drivers: block: Mark the functions as static in skd_main.c
bcache: New writeback PD controller
bcache: bugfix for race between moving_gc and bucket_invalidate
bcache: fix for gc and writeback race
bcache: bugfix - moving_gc now moves only correct buckets
bcache: fix for gc crashing when no sectors are used
bcache: Fix heap_peek() macro
bcache: Fix for can_attach_cache()
bcache: Fix dirty_data accounting
bcache: Use uninterruptible sleep in writeback
bcache: kthread don't set writeback task to INTERUPTIBLE
block: fix memory leaks on unplugging block device
bcache: fix sparse non static symbol warning
Linus Torvalds [Tue, 24 Dec 2013 17:49:20 +0000 (09:49 -0800)]
Merge branch 'for-3.13-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Two fixes. One fixes a bug in the error path of cgroup_create(). The
other changes cgrp->id lifetime rule so that the id doesn't get
recycled before all controller states are destroyed. This premature
id recycling made memcg malfunction"
* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: don't recycle cgroup id until all csses' have been destroyed
cgroup: fix cgroup_create() error handling path
Linus Torvalds [Tue, 24 Dec 2013 17:48:43 +0000 (09:48 -0800)]
Merge branch 'for-3.13-fixes' of git://git./linux/kernel/git/tj/percpu
Pull percpu fix from Tejun Heo:
"A single commit to fix a spurious sparse warning coming from
DEFINE_PER_CPU()'s hack to support the use of weak symbols. Shouldn't
cause observable behavior change"
* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: fix spurious sparse warnings from DEFINE_PER_CPU()
Linus Torvalds [Tue, 24 Dec 2013 17:35:58 +0000 (09:35 -0800)]
Merge branch 'for-3.13-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"There's one interseting commit - "libata, freezer: avoid block device
removal while system is frozen". It's an ugly hack working around a
deadlock condition between driver core resume and block layer device
removal paths through freezer which was made more reproducible by
writeback being converted to workqueue some releases ago. The bug has
nothing to do with libata but it's just an workaround which is easy to
backport. After discussion, Rafael and I seem to agree that we don't
really need kernel freezables - both kthread and workqueue. There are
few specific workqueues which constitute PM operations and require
freezing, which will be converted to use workqueue_set_max_active()
instead. All other kernel freezer uses are planned to be removed,
followed by the removal of kthread and workqueue freezer support,
hopefully.
Others are device-specific fixes. The most notable is the addition of
NO_NCQ_TRIM which is used to disable queued TRIM commands to Micro
M500 SSDs which otherwise suffers data corruption"
* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata, freezer: avoid block device removal while system is frozen
libata: implement ATA_HORKAGE_NO_NCQ_TRIM and apply it to Micro M500 SSDs
libata: disable a disk via libata.force params
ahci: bail out on ICH6 before using AHCI BAR
ahci: imx: Explicitly clear IMX6Q_GPR13_SATA_MPLL_CLK_EN
libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8
Ard Biesheuvel [Mon, 23 Dec 2013 17:49:30 +0000 (18:49 +0100)]
auxvec.h: account for AT_HWCAP2 in AT_VECTOR_SIZE_BASE
Commit
2171364d1a92 ("powerpc: Add HWCAP2 aux entry") introduced a new
AT_ auxv entry type AT_HWCAP2 but failed to update AT_VECTOR_SIZE_BASE
accordingly.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes:
2171364d1a92 (powerpc: Add HWCAP2 aux entry)
Cc: stable@vger.kernel.org
Acked-by: Michael Neuling <michael@neuling.org>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 24 Dec 2013 01:37:20 +0000 (17:37 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull SELinux fixes from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: selinux_setprocattr()->ptrace_parent() needs rcu_read_lock()
selinux: fix broken peer recv check
Linus Torvalds [Tue, 24 Dec 2013 01:24:38 +0000 (17:24 -0800)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs
Pull ext2 fix from Jan Kara:
"One simple fix of oops in ext2 which was recently hit by Christoph"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
Linus Torvalds [Tue, 24 Dec 2013 01:23:42 +0000 (17:23 -0800)]
Merge tag 'rdma-for-linus' of git://git./linux/kernel/git/roland/infiniband
Pull infiniband fixes from Roland Dreier:
"Last batch of InfiniBand/RDMA changes for 3.13 / 2014:
- Additional checks for uverbs to ensure forward compatibility,
handle malformed input better.
- Fix potential use-after-free in iWARP connection manager.
- Make a function static"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/uverbs: Check access to userspace response buffer in extended command
IB/uverbs: Check input length in flow steering uverbs
IB/uverbs: Set error code when fail to consume all flow_spec items
IB/uverbs: Check reserved fields in create_flow
IB/uverbs: Check comp_mask in destroy_flow
IB/uverbs: Check reserved field in extended command header
IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()
IB/core: const'ify inbuf in struct ib_udata
RDMA/iwcm: Don't touch cm_id after deref in rem_ref
RDMA/cxgb4: Make _c4iw_write_mem_dma() static
Oleg Nesterov [Mon, 23 Dec 2013 22:45:01 +0000 (17:45 -0500)]
selinux: selinux_setprocattr()->ptrace_parent() needs rcu_read_lock()
selinux_setprocattr() does ptrace_parent(p) under task_lock(p),
but task_struct->alloc_lock doesn't pin ->parent or ->ptrace,
this looks confusing and triggers the "suspicious RCU usage"
warning because ptrace_parent() does rcu_dereference_check().
And in theory this is wrong, spin_lock()->preempt_disable()
doesn't necessarily imply rcu_read_lock() we need to access
the ->parent.
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Chad Hanson [Mon, 23 Dec 2013 22:45:01 +0000 (17:45 -0500)]
selinux: fix broken peer recv check
Fix a broken networking check. Return an error if peer recv fails. If
secmark is active and the packet recv succeeds the peer recv error is
ignored.
Signed-off-by: Chad Hanson <chanson@trustedcs.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Linus Torvalds [Mon, 23 Dec 2013 19:49:16 +0000 (11:49 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Xmas fixes pull, all small nothing major, intel, radeon, one ttm
regression, and one build fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/ttm: Fix swapin regression
gpu: fix qxl missing crc32_le
drm/radeon: fix asic gfx values for scrapper asics
drm/i915: Use the correct GMCH_CTRL register for Sandybridge+
drm/radeon: check for 0 count in speaker allocation and SAD code
drm/radeon/dpm: disable ss on Cayman
drm/radeon/dce6: set correct number of audio pins
drm/i915: get a PC8 reference when enabling the power well
drm/i915: change CRTC assertion on LCPLL disable
drm/i915: Fix erroneous dereference of batch_obj inside reset_status
drm/i915: Prevent double unref following alloc failure during execbuffer
Linus Torvalds [Mon, 23 Dec 2013 18:49:44 +0000 (10:49 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull virtio balloon driver fixes from Rusty Russell:
"Refactoring broke the balloon driver, and fixing kallsyms on ARM broke
some (non-ARM) MMUless setups, so we're making that fix ARM-only for
now.
Unfortunately, the ARM refactoring which broke kallsyms/perf was
CC:stable, so the fix (which broken non-ARM) was also CC:stable, so
now the partial reversion is also CC:stable..."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
scripts/link-vmlinux.sh: only filter kernel symbols for arm
virtio_balloon: update_balloon_size(): update correct field
Roland Dreier [Mon, 23 Dec 2013 17:19:02 +0000 (09:19 -0800)]
Merge branches 'cxgb4', 'flowsteer' and 'misc' into for-linus
Dave Airlie [Mon, 23 Dec 2013 00:35:57 +0000 (10:35 +1000)]
Merge tag 'drm-intel-fixes-2013-12-18' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Besides the 2 fixes for tricky corner cases in gem from Chris I've
promised already two patche from Paulo to fix pc8 warnings (both ported
from -next, bug report from Dave Jones) and one patch from to fix vga
enable/disable on snb+. That one is a really old bug, but apparently it
can cause machine hangs if you try hard enough with vgacon/efifb handover.
* tag 'drm-intel-fixes-2013-12-18' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: Use the correct GMCH_CTRL register for Sandybridge+
drm/i915: get a PC8 reference when enabling the power well
drm/i915: change CRTC assertion on LCPLL disable
drm/i915: Fix erroneous dereference of batch_obj inside reset_status
drm/i915: Prevent double unref following alloc failure during execbuffer
Dave Airlie [Mon, 23 Dec 2013 00:34:18 +0000 (10:34 +1000)]
Merge branch 'drm-fixes-3.13' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
- fix for a long standing corruption bug on some Trinity/Richland parts.
- Stability fix for cayman dpm
- audio fixes for dce6+
* 'drm-fixes-3.13' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix asic gfx values for scrapper asics
drm/radeon: check for 0 count in speaker allocation and SAD code
drm/radeon/dpm: disable ss on Cayman
drm/radeon/dce6: set correct number of audio pins
Thomas Hellstrom [Sat, 21 Dec 2013 21:23:02 +0000 (22:23 +0100)]
drm/ttm: Fix swapin regression
Commit "drm/ttm: Don't move non-existing data" didn't take the
swapped-out corner case into account. This patch corrects that.
Fixes blank screen after attempted suspend / hibernate on vmwgfx.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Randy Dunlap [Fri, 20 Dec 2013 18:58:15 +0000 (10:58 -0800)]
gpu: fix qxl missing crc32_le
Fix build error: qxl uses crc32 functions so it needs to select
CRC32.
Also use angle quotes around a kernel header file name.
drivers/built-in.o: In function `qxl_display_read_client_monitors_config':
(.text+0x19d754): undefined reference to `crc32_le'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Sun, 22 Dec 2013 21:08:32 +0000 (13:08 -0800)]
Linux 3.13-rc5
Linus Torvalds [Sun, 22 Dec 2013 19:13:02 +0000 (11:13 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Much smaller batch of fixes this week.
Biggest one is a revert of an OMAP display change that removed some
non-DT pinmux code that was still needed for 3.13 to get DSI displays
to work.
There's also a fix that resolves some misdescribed GPIO controller
resources on shmobile. The rest are mostly smaller fixes, a couple of
MAINTAINERS updates, etc"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
Revert "ARM: OMAP2+: Remove legacy mux code for display.c"
MAINTAINERS: Add keystone clock drivers
MAINTAINERS: Add keystone git tree information
ARM: s3c64xx: dt: Fix boot failure due to double clock initialization
ARM: shmobile: r8a7790: Fix GPIO resources in DTS
irqchip: renesas-intc-irqpin: Fix register bitfield shift calculation
ARM: shmobile: lager: phy fixup needs CONFIG_PHYLIB
Linus Torvalds [Sun, 22 Dec 2013 19:11:57 +0000 (11:11 -0800)]
Merge tag 'firewire-fix' of git://git./linux/kernel/git/ieee1394/linux1394
Pull firewire fixlet from Stefan Richter:
"A one-liner to reenable WRITE SAME over SBP-2 like in v3.8...v3.12.
Buggy targets which could malfunction when being subjected to this
command are already sufficiently protected by a scsi_level check in sd
+ SCSI core"
* tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: sbp2: bring back WRITE SAME support
Linus Torvalds [Sun, 22 Dec 2013 19:11:20 +0000 (11:11 -0800)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
"Mostly minor items this time around, the most notable being a FILEIO
backend change to enforce hw_max_sectors based upon the current
block_size to address a bug where large sized I/Os (> 1M) where being
rejected"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
target: Remove extra percpu_ref_init
target/file: Update hw_max_sectors based on current block_size
iser-target: Move INIT_WORK setup into isert_create_device_ib_res
iscsi-target: Fix incorrect np->np_thread NULL assignment
qla2xxx: Fix schedule_delayed_work() for target timeout calculations
iser-target: fix error return code in isert_create_device_ib_res()
iscsi-target: Fix-up all zero data-length CDBs with R/W_BIT set
target: Remove write-only stats fields and lock from struct se_node_acl
iscsi-target: return -EINVAL on oversized configfs parameter
Linus Torvalds [Sun, 22 Dec 2013 19:03:49 +0000 (11:03 -0800)]
Merge git://git.kvack.org/~bcrl/aio-next
Pull AIO leak fixes from Ben LaHaise:
"I've put these two patches plus Linus's change through a round of
tests, and it passes millions of iterations of the aio numa
migratepage test, as well as a number of repetitions of a few simple
read and write tests.
The first patch fixes the memory leak Kent introduced, while the
second patch makes aio_migratepage() much more paranoid and robust"
* git://git.kvack.org/~bcrl/aio-next:
aio/migratepages: make aio migrate pages sane
aio: fix kioctx leak introduced by "aio: Fix a trinity splat"
Linus Torvalds [Thu, 19 Dec 2013 20:11:12 +0000 (05:11 +0900)]
aio: clean up and fix aio_setup_ring page mapping
Since commit
36bc08cc01709 ("fs/aio: Add support to aio ring pages
migration") the aio ring setup code has used a special per-ring backing
inode for the page allocations, rather than just using random anonymous
pages.
However, rather than remembering the pages as it allocated them, it
would allocate the pages, insert them into the file mapping (dirty, so
that they couldn't be free'd), and then forget about them. And then to
look them up again, it would mmap the mapping, and then use
"get_user_pages()" to get back an array of the pages we just created.
Now, not only is that incredibly inefficient, it also leaked all the
pages if the mmap failed (which could happen due to excessive number of
mappings, for example).
So clean it all up, making it much more straightforward. Also remove
some left-overs of the previous (broken) mm_populate() usage that was
removed in commit
d6c355c7dabc ("aio: fix race in ring buffer page
lookup introduced by page migration support") but left the pointless and
now misleading MAP_POPULATE flag around.
Tested-and-acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jacob Pan [Wed, 11 Dec 2013 22:39:27 +0000 (14:39 -0800)]
powercap / RAPL: add support for ValleyView Soc
This patch adds support for RAPL on Intel ValleyView based SoC
platforms, such as Baytrail.
Besides adding CPU ID, special energy unit encoding is handled
for ValleyView.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Masami Ichikawa [Thu, 19 Dec 2013 11:00:47 +0000 (20:00 +0900)]
PM / sleep: Fix memory leak in pm_vt_switch_unregister().
kmemleak reported a memory leak as below.
unreferenced object 0xffff880118f14700 (size 32):
comm "swapper/0", pid 1, jiffies
4294877401 (age 123.283s)
hex dump (first 32 bytes):
00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de .......... .....
00 d4 d2 18 01 88 ff ff 01 00 00 00 00 04 00 00 ................
backtrace:
[<
ffffffff814edb1e>] kmemleak_alloc+0x4e/0xb0
[<
ffffffff811889dc>] kmem_cache_alloc_trace+0x1ec/0x260
[<
ffffffff810aba66>] pm_vt_switch_required+0x76/0xb0
[<
ffffffff812f39f5>] register_framebuffer+0x195/0x320
[<
ffffffff8130af18>] efifb_probe+0x718/0x780
[<
ffffffff81391495>] platform_drv_probe+0x45/0xb0
[<
ffffffff8138f407>] driver_probe_device+0x87/0x3a0
[<
ffffffff8138f7f3>] __driver_attach+0x93/0xa0
[<
ffffffff8138d413>] bus_for_each_dev+0x63/0xa0
[<
ffffffff8138ee5e>] driver_attach+0x1e/0x20
[<
ffffffff8138ea40>] bus_add_driver+0x180/0x250
[<
ffffffff8138fe74>] driver_register+0x64/0xf0
[<
ffffffff813913ba>] __platform_driver_register+0x4a/0x50
[<
ffffffff8191e028>] efifb_driver_init+0x12/0x14
[<
ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
[<
ffffffff818e40e0>] kernel_init_freeable+0x17b/0x201
In pm_vt_switch_required(), "entry" variable is allocated via kmalloc().
So, in pm_vt_switch_unregister(), it needs to call kfree() when object
is deleted from list.
Signed-off-by: Masami Ichikawa <masami256@gmail.com>
Reviewed-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Jason Baron [Thu, 19 Dec 2013 22:50:50 +0000 (22:50 +0000)]
cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers
When configuring a default governor (via CONFIG_CPU_FREQ_DEFAULT_*) with the
intel_pstate driver, the desired default policy is not properly set. For
example, setting 'CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE' ends up with the
'powersave' policy being set.
Fix by configuring the correct default policy, if either 'powersave' or
'performance' are requested. Otherwise, fallback to what the driver originally
set via its 'init' routine.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Fri, 20 Dec 2013 15:56:02 +0000 (21:26 +0530)]
cpufreq: remove sysfs files for CPUs which failed to come back after resume
There are cases where cpufreq_add_dev() may fail for some CPUs
during system resume. With the current code we will still have
sysfs cpufreq files for those CPUs and struct cpufreq_policy
would be already freed for them. Hence any operation on those
sysfs files would result in kernel warnings.
Example of problems resulting from resume errors (from Bjørn Mork):
WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212()
missing sysfs attribute operations for kobject: (null)
Modules linked in: [stripped as irrelevant]
CPU: 0 PID: 6055 Comm: grep Tainted: G D 3.13.0-rc2 #153
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006
ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000
ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310
Call Trace:
[<
ffffffff81380b0e>] dump_stack+0x55/0x76
[<
ffffffff81038635>] warn_slowpath_common+0x7c/0x96
[<
ffffffff811823c7>] ? sysfs_open_file+0x77/0x212
[<
ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43
[<
ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82
[<
ffffffff81182382>] ? sysfs_open_file+0x32/0x212
[<
ffffffff811823c7>] sysfs_open_file+0x77/0x212
[<
ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac
[<
ffffffff81122562>] do_dentry_open+0x17c/0x257
[<
ffffffff8112267e>] finish_open+0x41/0x4f
[<
ffffffff81130225>] do_last+0x80c/0x9ba
[<
ffffffff8112dbbd>] ? inode_permission+0x40/0x42
[<
ffffffff81130606>] path_openat+0x233/0x4a1
[<
ffffffff81130b7e>] do_filp_open+0x35/0x85
[<
ffffffff8113b787>] ? __alloc_fd+0x172/0x184
[<
ffffffff811232ea>] do_sys_open+0x6b/0xfa
[<
ffffffff811233a7>] SyS_openat+0xf/0x11
[<
ffffffff8138c812>] system_call_fastpath+0x16/0x1b
To fix this, remove those sysfs files or put the associated kobject
in case of such errors. Also, to make it simple, remove the cpufreq
sysfs links from all the CPUs (except for the policy->cpu) during
suspend, as that operation won't result in a loss of sysfs file
permissions and we can create those links during resume just fine.
Fixes:
5302c3fb2e62 ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Benjamin LaHaise [Sat, 21 Dec 2013 22:56:08 +0000 (17:56 -0500)]
aio/migratepages: make aio migrate pages sane
The arbitrary restriction on page counts offered by the core
migrate_page_move_mapping() code results in rather suspicious looking
fiddling with page reference counts in the aio_migratepage() operation.
To fix this, make migrate_page_move_mapping() take an extra_count parameter
that allows aio to tell the code about its own reference count on the page
being migrated.
While cleaning up aio_migratepage(), make it validate that the old page
being passed in is actually what aio_migratepage() expects to prevent
misbehaviour in the case of races.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Benjamin LaHaise [Sat, 21 Dec 2013 20:49:28 +0000 (15:49 -0500)]
aio: fix kioctx leak introduced by "aio: Fix a trinity splat"
e34ecee2ae791df674dfb466ce40692ca6218e43 reworked the percpu reference
counting to correct a bug trinity found. Unfortunately, the change lead
to kioctxes being leaked because there was no final reference count to
put. Add that reference count back in to fix things.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
Matias Bjørling [Fri, 20 Dec 2013 23:11:01 +0000 (00:11 +0100)]
null_blk: support submit_queues on use_per_node_hctx
In the case of both the submit_queues param and use_per_node_hctx param
are used. We limit the number af submit_queues to the number of online
nodes.
If the submit_queues is a multiple of nr_online_nodes, its trivial. Simply map
them to the nodes. For example: 8 submit queues are mapped as node0[0,1],
node1[2,3], ...
If uneven, we are left with an uneven number of submit_queues that must be
mapped. These are mapped toward the first node and onward. E.g. 5
submit queues mapped onto 4 nodes are mapped as node0[0,1], node1[2], ...
Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Matias Bjørling [Fri, 20 Dec 2013 23:11:00 +0000 (00:11 +0100)]
null_blk: set use_per_node_hctx param to false
The defaults for the module is to instantiate itself with blk-mq and a
submit queue for each CPU node in the system.
To save resources, initialize instead with a single submit queue.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Matias Bjørling [Fri, 20 Dec 2013 23:10:59 +0000 (00:10 +0100)]
null_blk: corrections to documentation
Randy Dunlap reported a couple of grammar errors and unfortunate usages of
socket/node/core.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sat, 21 Dec 2013 00:52:45 +0000 (16:52 -0800)]
Don't set the INITRD_COMPRESS environment variable automatically
Commit
1bf49dd4be0b ("./Makefile: export initial ramdisk compression
config option") started setting the INITRD_COMPRESS environment variable
depending on which decompression models the kernel had available.
That is completely broken.
For example, we by default have CONFIG_RD_LZ4 enabled, and are able to
decompress such an initrd, but the user tools to *create* such an initrd
may not be availble. So trying to tell dracut to generate an
lz4-compressed image just because we can decode such an image is
completely inappropriate.
Cc: J P <ppandit@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Dec 2013 23:48:45 +0000 (15:48 -0800)]
Merge tag 'xfs-for-linus-v3.13-rc5' of git://oss.sgi.com/xfs/xfs
Pull xfs bugfixes from Ben Myers:
"This contains fixes for some asserts
related to project quotas, a memory leak, a hang when disabling group or
project quotas before disabling user quotas, Dave's email address, several
fixes for the alignment of file allocation to stripe unit/width geometry, a
fix for an assertion with xfs_zero_remaining_bytes, and the behavior of
metadata writeback in the face of IO errors.
Details:
- fix memory leak in xfs_dir2_node_removename
- fix quota assertion in xfs_setattr_size
- fix quota assertions in xfs_qm_vop_create_dqattach
- fix for hang when disabling group and project quotas before
disabling user quotas
- fix Dave Chinner's email address in MAINTAINERS
- fix for file allocation alignment
- fix for assertion in xfs_buf_stale by removing xfsbdstrat
- fix for alignment with swalloc mount option
- fix for "retry forever" semantics on IO errors"
* tag 'xfs-for-linus-v3.13-rc5' of git://oss.sgi.com/xfs/xfs:
xfs: abort metadata writeback on permanent errors
xfs: swalloc doesn't align allocations properly
xfs: remove xfsbdstrat error
xfs: align initial file allocations correctly
MAINTAINERS: fix incorrect mail address of XFS maintainer
xfs: fix infinite loop by detaching the group/project hints from user dquot
xfs: fix assertion failure at xfs_setattr_nonsize
xfs: fix false assertion at xfs_qm_vop_create_dqattach
xfs: fix memory leak in xfs_dir2_node_removename
Olof Johansson [Fri, 20 Dec 2013 22:28:05 +0000 (14:28 -0800)]
mm: fix build of split ptlock code
Commit
597d795a2a78 ('mm: do not allocate page->ptl dynamically, if
spinlock_t fits to long') restructures some allocators that are compiled
even if USE_SPLIT_PTLOCKS arn't used. It results in compilation
failure:
mm/memory.c:4282:6: error: 'struct page' has no member named 'ptl'
mm/memory.c:4288:12: error: 'struct page' has no member named 'ptl'
Add in the missing ifdef.
Fixes:
597d795a2a78 ('mm: do not allocate page->ptl dynamically, if spinlock_t fits to long')
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Dec 2013 21:50:42 +0000 (13:50 -0800)]
Merge tag 'arc-fixes-for-3.13-rc5' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fix from Vineet Gupta:
"Fix busted syscall table due to unistd header inclusion issue"
* tag 'arc-fixes-for-3.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Allow conditional multiple inclusion of uapi/asm/unistd.h
Linus Torvalds [Fri, 20 Dec 2013 21:50:08 +0000 (13:50 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 ptrace fix from Catalin Marinas.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: ptrace: avoid using HW_BREAKPOINT_EMPTY for disabled events
Luck, Tony [Wed, 18 Dec 2013 23:17:10 +0000 (15:17 -0800)]
pstore: Don't allow high traffic options on fragile devices
Some pstore backing devices use on board flash as persistent
storage. These have limited numbers of write cycles so it
is a poor idea to use them from high frequency operations.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Dec 2013 20:27:41 +0000 (12:27 -0800)]
Merge tag 'dmaengine-fixes-3.13-rc4' of git://git./linux/kernel/git/djbw/dmaengine
Pull dmaengine fixes from Dan Williams:
- deprecation of net_dma to be removed in 3.14
- crash regression fix in pl330 from the dmaengine_unmap rework
- crash regression fix for any channel running raid ops without
CONFIG_ASYNC_TX_DMA from dmaengine_unmap
- memory leak regression in mv_xor from dmaengine_unmap
- build warning regressions in mv_xor, fsldma, ppc4xx, txx9, and
at_hdmac from dmaengine_unmap
- sleep in atomic regression in dma_async_memcpy_pg_to_pg
- new fix in mv_xor for handling channel initialization failures
* tag 'dmaengine-fixes-3.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
net_dma: mark broken
dma: pl330: ensure DMA descriptors are zero-initialised
dmaengine: fix sleep in atomic
dmaengine: mv_xor: fix oops when channels fail to initialise
dma: mv_xor: Use dmaengine_unmap_data for the self-tests
dmaengine: fix enable for high order unmap pools
dma: fix build warnings in txx9
dmatest: fix build warning on mips
dma: fix fsldma build warnings
dma: fix build warnings in ppc4xx
dmaengine: at_hdmac: remove unused function
dma: mv_xor: remove mv_desc_get_dest_addr()
Linus Torvalds [Fri, 20 Dec 2013 20:26:54 +0000 (12:26 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"The PPC folks had a large amount of changes queued for 3.13, and now
they are fixing the bugs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Don't drop low-order page address bits
powerpc: book3s: kvm: Don't abuse host r2 in exit path
powerpc/kvm/booke: Fix build break due to stack frame size warning
KVM: PPC: Book3S: PR: Enable interrupts earlier
KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy
KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
KVM: PPC: Book3S: PR: Don't clobber our exit handler id
powerpc: kvm: fix rare but potential deadlock scene
KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
KVM: PPC: Book3S HV: Fix physical address calculations
Kirill A. Shutemov [Fri, 20 Dec 2013 11:35:58 +0000 (13:35 +0200)]
mm: do not allocate page->ptl dynamically, if spinlock_t fits to long
In struct page we have enough space to fit long-size page->ptl there,
but we use dynamically-allocated page->ptl if size(spinlock_t) is larger
than sizeof(int).
It hurts 64-bit architectures with CONFIG_GENERIC_LOCKBREAK, where
sizeof(spinlock_t) == 8, but it easily fits into struct page.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 20 Dec 2013 14:54:12 +0000 (14:54 +0000)]
mm: page_alloc: revert NUMA aspect of fair allocation policy
Commit
81c0a2bb515f ("mm: page_alloc: fair zone allocator policy") meant
to bring aging fairness among zones in system, but it was overzealous
and badly regressed basic workloads on NUMA systems.
Due to the way kswapd and page allocator interacts, we still want to
make sure that all zones in any given node are used equally for all
allocations to maximize memory utilization and prevent thrashing on the
highest zone in the node.
While the same principle applies to NUMA nodes - memory utilization is
obviously improved by spreading allocations throughout all nodes -
remote references can be costly and so many workloads prefer locality
over memory utilization. The original change assumed that
zone_reclaim_mode would be a good enough predictor for that, but it
turned out to be as indicative as a coin flip.
Revert the NUMA aspect of the fairness until we can find a proper way to
make it configurable and agree on a sane default.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@kernel.org> # 3.12
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 20 Dec 2013 14:54:11 +0000 (14:54 +0000)]
Revert "mm: page_alloc: exclude unreclaimable allocations from zone fairness policy"
This reverts commit
73f038b863df. The NUMA behaviour of this patch is
less than ideal. An alternative approch is to interleave allocations
only within local zones which is implemented in the next patch.
Cc: stable@vger.kernel.org
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Fri, 20 Dec 2013 13:10:03 +0000 (15:10 +0200)]
mm: Fix NULL pointer dereference in madvise(MADV_WILLNEED) support
Sasha Levin found a NULL pointer dereference that is due to a missing
page table lock, which in turn is due to the pmd entry in question being
a transparent huge-table entry.
The code - introduced in commit
1998cc048901 ("mm: make
madvise(MADV_WILLNEED) support swap file prefetch") - correctly checks
for this situation using pmd_none_or_trans_huge_or_clear_bad(), but it
turns out that that function doesn't work correctly.
pmd_none_or_trans_huge_or_clear_bad() expected that pmd_bad() would
trigger if the transparent hugepage bit was set, but it doesn't do that
if pmd_numa() is also set. Note that the NUMA bit only gets set on real
NUMA machines, so people trying to reproduce this on most normal
development systems would never actually trigger this.
Fix it by removing the very subtle (and subtly incorrect) expectation,
and instead just checking pmd_trans_huge() explicitly.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
[ Additionally remove the now stale test for pmd_trans_huge() inside the
pmd_bad() case - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kevin Hilman [Fri, 20 Dec 2013 19:27:12 +0000 (11:27 -0800)]
Merge tag 'renesas-fixes-for-v3.13' of git://git./linux/kernel/git/horms/renesas into fixes
From Simon Horman:
Renesas ARM based SoC fixes for v3.13
* r8a7790 (R-Car H1) SoC
- Correct GPIO resources in DT.
This problem has been present since GPIOs were added to the r8a7790 SoC
by
f98e10c88aa95bf7 ("ARM: shmobile: r8a7790: Add GPIO controller
devices to device tree") in v3.12-rc1.
* irqchip renesas-intc-irqpin
- Correct register bitfield shift calculation
This bug has been present since the renesas-intc-irqpin driver was
introduced by
443580486e3b9657 ("irqchip: Renesas INTC External IRQ pin
driver") in v3.10-rc1
* Lager board
- Do not build the phy fixup unless CONFIG_PHYLIB is enabled
This problem was introduced by
48c8b96f21817aad
* tag 'renesas-fixes-for-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: shmobile: r8a7790: Fix GPIO resources in DTS
irqchip: renesas-intc-irqpin: Fix register bitfield shift calculation
ARM: shmobile: lager: phy fixup needs CONFIG_PHYLIB
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Yann Droneaud [Wed, 11 Dec 2013 22:01:51 +0000 (23:01 +0100)]
IB/uverbs: Check access to userspace response buffer in extended command
This patch adds a check on the output buffer with access_ok(VERIFY_WRITE, ...)
to ensure the whole buffer is in userspace memory before using the
pointer in uverbs functions. If the buffer or a subset of it is not
valid, returns -EFAULT to the caller.
This will also catch invalid buffer before the final call to
copy_to_user() which happen late in most uverb functions.
Just like the check in read(2) syscall, it's a sanity check to detect
invalid parameters provided by userspace. This particular check was added
in vfs_read() by Linus Torvalds for v2.6.12 with following commit message:
https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=
fd770e66c9a65b14ce114e171266cf6f393df502
Make read/write always do the full "access_ok()" tests.
The actual user copy will do them too, but only for the
range that ends up being actually copied. That hides
bugs when the range has been clamped by file size or other
issues.
Note: there's no need to check input buffer since vfs_write() already does
access_ok(VERIFY_READ, ...) as part of write() syscall.
Link: http://marc.info/?i=cover.1387273677.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Yann Droneaud [Wed, 11 Dec 2013 22:01:52 +0000 (23:01 +0100)]
IB/uverbs: Check input length in flow steering uverbs
Since ib_copy_from_udata() doesn't check yet the available input data
length before accessing userspace memory, an explicit check of this
length is required to prevent:
- reading past the user provided buffer,
- underflow when subtracting the expected command size from the input
length.
This will ensure the newly added flow steering uverbs don't try to
process truncated commands.
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Yann Droneaud [Wed, 11 Dec 2013 22:01:50 +0000 (23:01 +0100)]
IB/uverbs: Set error code when fail to consume all flow_spec items
If the flow_spec items parsed count does not match the number of items
declared in the flow_attr command, or if not all bytes are used for
flow_spec items (eg. trailing garbage), a log message is reported and
the function leave through the error path. Unfortunately the error
code is currently not set.
This patch set error code to -EINVAL in such cases, so that the error
is reported to userspace instead of silently fail.
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>