Konrad Rzeszutek Wilk [Fri, 6 Apr 2012 20:10:20 +0000 (16:10 -0400)]
xen/setup: Combine the two hypercall functions - since they are quite similar.
They use the same set of arguments, so it is just the matter
of using the proper hypercall.
Acked-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 6 Apr 2012 14:07:11 +0000 (10:07 -0400)]
xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
When the Xen hypervisor boots a PV kernel it hands it two pieces
of information: nr_pages and a made up E820 entry.
The nr_pages value defines the range from zero to nr_pages of PFNs
which have a valid Machine Frame Number (MFN) underneath it. The
E820 mirrors that (with the VGA hole):
BIOS-provided physical RAM map:
Xen:
0000000000000000 -
00000000000a0000 (usable)
Xen:
00000000000a0000 -
0000000000100000 (reserved)
Xen:
0000000000100000 -
0000000080800000 (usable)
The fun comes when a PV guest that is run with a machine E820 - that
can either be the initial domain or a PCI PV guest, where the E820
looks like the normal thing:
BIOS-provided physical RAM map:
Xen:
0000000000000000 -
000000000009e000 (usable)
Xen:
000000000009ec00 -
0000000000100000 (reserved)
Xen:
0000000000100000 -
0000000020000000 (usable)
Xen:
0000000020000000 -
0000000020200000 (reserved)
Xen:
0000000020200000 -
0000000040000000 (usable)
Xen:
0000000040000000 -
0000000040200000 (reserved)
Xen:
0000000040200000 -
00000000bad80000 (usable)
Xen:
00000000bad80000 -
00000000badc9000 (ACPI NVS)
..
With that overlaying the nr_pages directly on the E820 does not
work as there are gaps and non-RAM regions that won't be used
by the memory allocator. The 'xen_release_chunk' helps with that
by punching holes in the P2M (PFN to MFN lookup tree) for those
regions and tells us that:
Freeing 20000-20200 pfn range: 512 pages freed
Freeing 40000-40200 pfn range: 512 pages freed
Freeing bad80-badf4 pfn range: 116 pages freed
Freeing badf6-bae7f pfn range: 137 pages freed
Freeing bb000-100000 pfn range: 282624 pages freed
Released 283999 pages of unused memory
Those 283999 pages are subtracted from the nr_pages and are returned
to the hypervisor. The end result is that the initial domain
boots with 1GB less memory as the nr_pages has been subtracted by
the amount of pages residing within the PCI hole. It can balloon up
to that if desired using 'xl mem-set 0 8092', but the balloon driver
is not always compiled in for the initial domain.
This patch, implements the populate hypercall (XENMEM_populate_physmap)
which increases the the domain with the same amount of pages that
were released.
The other solution (that did not work) was to transplant the MFN in
the P2M tree - the ones that were going to be freed were put in
the E820_RAM regions past the nr_pages. But the modifications to the
M2P array (the other side of creating PTEs) were not carried away.
As the hypervisor is the only one capable of modifying that and the
only two hypercalls that would do this are: the update_va_mapping
(which won't work, as during initial bootup only PFNs up to nr_pages
are mapped in the guest) or via the populate hypercall.
The end result is that the kernel can now boot with the
nr_pages without having to subtract the 283999 pages.
On a 8GB machine, with various dom0_mem= parameters this is what we get:
no dom0_mem
-Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init)
+Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init)
dom0_mem=3G
-Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init)
+Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init)
dom0_mem=max:3G
-Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init)
+Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init)
And the 'xm list' or 'xl list' now reflect what the dom0_mem=
argument is.
Acked-by: David Vrabel <david.vrabel@citrix.com>
[v2: Use populate hypercall]
[v3: Remove debug printks]
[v4: Simplify code]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 30 Mar 2012 19:37:07 +0000 (15:37 -0400)]
xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
Otherwise we can get these meaningless:
Freeing bad80-badf4 pfn range: 0 pages freed
We also can do this for the summary ones - no point of printing
"Set 0 page(s) to 1-1 mapping"
Acked-by: David Vrabel <david.vrabel@citrix.com>
[v1: Extended to the summary printks]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 30 Mar 2012 18:33:14 +0000 (14:33 -0400)]
xen/p2m: An early bootup variant of set_phys_to_machine
During early bootup we can't use alloc_page, so to allocate
leaf pages in the P2M we need to use extend_brk. For that
we are utilizing the early_alloc_p2m and early_alloc_p2m_middle
functions to do the job for us. This function follows the
same logic as set_phys_to_machine.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 30 Mar 2012 18:16:49 +0000 (14:16 -0400)]
xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
At the start of the function we were checking for idx != 0
and bailing out. And later calling extend_brk if idx != 0.
That is unnecessary so remove that checks.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 30 Mar 2012 18:15:14 +0000 (14:15 -0400)]
xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
For identity cases we want to call reserve_brk only on the boundary
conditions of the middle P2M (so P2M[x][y][0] = extend_brk). This is
to work around identify regions (PCI spaces, gaps in E820) which are not
aligned on 2MB regions.
However for the case were we want to allocate P2M middle leafs at the
early bootup stage, irregardless of this alignment check we need some
means of doing that. For that we provide the new argument.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 30 Mar 2012 15:45:01 +0000 (11:45 -0400)]
xen/p2m: Move code around to allow for better re-usage.
We are going to be using the early_alloc_p2m (and
early_alloc_p2m_middle) code in follow up patches which
are not related to setting identity pages.
Hence lets move the code out in its own function and
rename them as appropiate.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Mon, 2 Apr 2012 14:22:39 +0000 (15:22 +0100)]
xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success
The original XenoLinux code has always had things this way, and for
compatibility reasons (in particular with a subsequent pciback
adjustment) upstream Linux should behave the same way (allowing for two
distinct error indications to be returned by the backend).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Mon, 2 Apr 2012 14:32:22 +0000 (15:32 +0100)]
xen/pciback: fix XEN_PCI_OP_enable_msix result
Prior to 2.6.19 and as of 2.6.31, pci_enable_msix() can return a
positive value to indicate the number of vectors (less than the amount
requested) that can be set up for a given device. Returning this as an
operation value (secondary result) is fine, but (primary) operation
results are expected to be negative (error) or zero (success) according
to the protocol. With the frontend fixed to match the XenoLinux
behavior, the backend can now validly return zero (success) here,
passing the upper limit on the number of vectors in op->value.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Srivatsa S. Bhat [Thu, 22 Mar 2012 12:59:24 +0000 (18:29 +0530)]
xen/smp: Remove unnecessary call to smp_processor_id()
There is an extra and unnecessary call to smp_processor_id()
in cpu_bringup(). Remove it.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 20 Mar 2012 19:04:18 +0000 (15:04 -0400)]
xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'
The above mentioned patch checks the IOAPIC and if it contains
-1, then it unmaps said IOAPIC. But under Xen we get this:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000040
IP: [<
ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
PGD 0
Oops: 0002 [#1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron
1525 /0U990C
RIP: e030:[<
ffffffff8134e51f>] [<
ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
RSP: e02b:
ffff8800d42cbb70 EFLAGS:
00010202
RAX:
0000000000000000 RBX:
00000000ffffffef RCX:
0000000000000001
RDX:
0000000000000040 RSI:
00000000ffffffef RDI:
0000000000000001
RBP:
ffff8800d42cbb80 R08:
ffff8800d6400000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000000 R12:
00000000ffffffef
R13:
0000000000000001 R14:
0000000000000001 R15:
0000000000000010
FS:
0000000000000000(0000) GS:
ffff8800df5fe000(0000) knlGS:
0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
0000000000000040 CR3:
0000000001a05000 CR4:
0000000000002660
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process swapper/0 (pid: 1, threadinfo
ffff8800d42ca000, task
ffff8800d42d0000)
Stack:
00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157
ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384
0000000000000002 0000000000000010 00000000ffffffff 0000000000000000
Call Trace:
[<
ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230
[<
ffffffff8100a9b2>] ? check_events+0x12+0x20
[<
ffffffff814bab42>] xen_register_pirq+0x82/0xe0
[<
ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0
[<
ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30
[<
ffffffff8103036f>] acpi_register_gsi+0xf/0x20
[<
ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202
[<
ffffffff814bc849>] pcibios_enable_device+0x39/0x40
[<
ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70
[<
ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0
[<
ffffffff812dc8d3>] pci_enable_device+0x13/0x20
The reason we are dying is b/c the call acpi_get_override_irq() is used,
which returns the polarity and trigger for the IRQs. That function calls
mp_find_ioapics to get the 'struct ioapic' structure - which along with the
mp_irq[x] is used to figure out the default values and the polarity/trigger
overrides. Since the mp_find_ioapics now returns -1 [b/c the IOAPIC is filled
with 0xffffffff], the acpi_get_override_irq() stops trying to lookup in the
mp_irq[x] the proper INT_SRV_OVR and we can't install the SCI interrupt.
The proper fix for this is going in v3.5 and adds an x86_io_apic_ops
struct so that platforms can override it. But for v3.4 lets carry this
work-around. This patch does that by providing a slightly different variant
of the fake IOAPIC entries.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Igor Mammedov [Tue, 27 Mar 2012 17:31:08 +0000 (19:31 +0200)]
xen: only check xen_platform_pci_unplug if hvm
commit
b9136d207f08
xen: initialize platform-pci even if xen_emul_unplug=never
breaks blkfront/netfront by not loading them because of
xen_platform_pci_unplug=0 and it is never set for PV guest.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Sat, 24 Mar 2012 13:18:57 +0000 (09:18 -0400)]
xen/acpi: Fix Kconfig dependency on CPU_FREQ
The functions: "acpi_processor_*" sound like they depend on CONFIG_ACPI_PROCESSOR
but in reality they are exposed when CONFIG_CPU_FREQ=[y|m]. As such
update the Kconfig to have this dependency and fix compile issues:
ERROR: "acpi_processor_unregister_performance" [drivers/xen/xen-acpi-processor.ko] undefined!
ERROR: "acpi_processor_notify_smm" [drivers/xen/xen-acpi-processor.ko] undefined!
ERROR: "acpi_processor_register_performance" [drivers/xen/xen-acpi-processor.ko] undefined!
ERROR: "acpi_processor_preregister_performance" [drivers/xen/xen-acpi-processor.ko] undefined!
Note: We still need the CONFIG_ACPI
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Igor Mammedov [Wed, 21 Mar 2012 14:08:38 +0000 (15:08 +0100)]
xen: initialize platform-pci even if xen_emul_unplug=never
When xen_emul_unplug=never is specified on kernel command line
reading files from /sys/hypervisor is broken (returns -EBUSY).
It is caused by xen_bus dependency on platform-pci and
platform-pci isn't initialized when xen_emul_unplug=never is
specified.
Fix it by allowing platform-pci to ignore xen_emul_unplug=never,
and do not intialize xen_[blk|net]front instead.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 21 Mar 2012 17:03:45 +0000 (13:03 -0400)]
xen/smp: Fix bringup bug in AP code.
The CPU hotplug code has now a callback to help bring up the CPU.
Without the call we end up getting:
BUG: soft lockup - CPU#0 stuck for 29s! [migration/0:6]
Modules linked in:
CPU ] Pid: 6, comm: migration/0 Not tainted
3.3.0upstream-01180-ged378a5 #1 Dell Inc. PowerEdge T105 /0RR825
RIP: e030:[<
ffffffff810d3b8b>] [<
ffffffff810d3b8b>] stop_machine_cpu_stop+0x7b/0xf0
RSP: e02b:
ffff8800ceaabdb0 EFLAGS:
00000293
.. snip..
Call Trace:
[<
ffffffff810d3b10>] ? stop_one_cpu_nowait+0x50/0x50
[<
ffffffff810d3841>] cpu_stopper_thread+0xf1/0x1c0
[<
ffffffff815a9776>] ? __schedule+0x3c6/0x760
[<
ffffffff815aa749>] ? _raw_spin_unlock_irqrestore+0x19/0x30
[<
ffffffff810d3750>] ? res_counter_charge+0x150/0x150
[<
ffffffff8108dc76>] kthread+0x96/0xa0
[<
ffffffff815b27e4>] kernel_thread_helper+0x4/0x10
[<
ffffffff815aacbc>] ? retint_restore_ar
Thix fixes it.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 21 Mar 2012 15:43:32 +0000 (11:43 -0400)]
xen/acpi: Remove the WARN's as they just create noise.
When booting the kernel under machines that do not have P-states
we would end up with:
------------[ cut here ]------------
WARNING: at drivers/xen/xen-acpi-processor.c:504
xen_acpi_processor_init+0x286/0
x2e0()
Hardware name: ProLiant BL460c G6
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.39-200.0.3.el5uek #1
Call Trace:
[<
ffffffff8191d056>] ? xen_acpi_processor_init+0x286/0x2e0
[<
ffffffff81068300>] warn_slowpath_common+0x90/0xc0
[<
ffffffff8191cdd0>] ? check_acpi_ids+0x1e0/0x1e0
[<
ffffffff8106834a>] warn_slowpath_null+0x1a/0x20
[<
ffffffff8191d056>] xen_acpi_processor_init+0x286/0x2e0
[<
ffffffff8191cdd0>] ? check_acpi_ids+0x1e0/0x1e0
[<
ffffffff81002168>] do_one_initcall+0xe8/0x130
.. snip..
Which is OK - the machines do not have P-states, so we fail to register
to process the _PXX states. But there is no need to WARN the user
of it.
Oracle BZ#
13871288
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Fri, 3 Feb 2012 15:09:04 +0000 (15:09 +0000)]
xen/tmem: cleanup
Use 'bool' for boolean variables. Do proper section placement.
Eliminate an unnecessary export.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Stefano Stabellini [Mon, 30 Jan 2012 16:21:48 +0000 (16:21 +0000)]
xen: support pirq_eoi_map
The pirq_eoi_map is a bitmap offered by Xen to check which pirqs need to
be EOI'd without having to issue an hypercall every time.
We use PHYSDEVOP_pirq_eoi_gmfn_v2 to map the bitmap, then if we
succeed we use pirq_eoi_map to check whether pirqs need eoi.
Changes in v3:
- explicitly use PHYSDEVOP_pirq_eoi_gmfn_v2 rather than
PHYSDEVOP_pirq_eoi_gmfn;
- introduce pirq_check_eoi_map, a function to check if a pirq needs an
eoi using the map;
-rename pirq_needs_eoi into pirq_needs_eoi_flag;
- introduce a function pointer called pirq_needs_eoi that is going to be
set to the right implementation depending on the availability of
PHYSDEVOP_pirq_eoi_gmfn_v2.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 13 Mar 2012 17:28:12 +0000 (13:28 -0400)]
xen/acpi-processor: Do not depend on CPU frequency scaling drivers.
With patch "xen/cpufreq: Disable the cpu frequency scaling drivers
from loading." we do not have to worry about said drivers loading
themselves before the xen-acpi-processor driver. Hence we can remove
the default selection (=y if CPU frequency drivers were built-in, or
=m if CPU frequency drivers were built as modules), and just
select =m for the default case.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 14 Mar 2012 00:06:57 +0000 (20:06 -0400)]
xen/cpufreq: Disable the cpu frequency scaling drivers from loading.
By using the functionality provided by "[CPUFREQ]: provide
disable_cpuidle() function to disable the API."
Under the Xen hypervisor we do not want the initial domain to exercise
the cpufreq scaling drivers. This is b/c the Xen hypervisor is
in charge of doing this as well and we can end up with both the
Linux kernel and the hypervisor trying to change the P-states
leading to weird performance issues.
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Fix compile error spotted by Benjamin Schweikert <b.schweikert@googlemail.com>]
Konrad Rzeszutek Wilk [Tue, 13 Mar 2012 23:18:39 +0000 (19:18 -0400)]
provide disable_cpufreq() function to disable the API.
useful for disabling cpufreq altogether. The cpu frequency
scaling drivers and cpu frequency governors will fail to register.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Andrew Jones [Fri, 6 Jan 2012 09:43:09 +0000 (10:43 +0100)]
xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
PV-on-HVM guests may want to use the xen keyboard/mouse frontend, but
they don't use the xen frame buffer frontend. For this case it doesn't
make much sense for INPUT_XEN_KBDDEV_FRONTEND to depend on
XEN_FBDEV_FRONTEND. The opposite direction always makes more sense, i.e.
if you're using xenfb, then you'll want xenkbd. Switch the dependencies.
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 3 Feb 2012 21:03:20 +0000 (16:03 -0500)]
xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.
This driver solves three problems:
1). Parse and upload ACPI0007 (or PROCESSOR_TYPE) information to the
hypervisor - aka P-states (cpufreq data).
2). Upload the the Cx state information (cpuidle data).
3). Inhibit CPU frequency scaling drivers from loading.
The reason for wanting to solve 1) and 2) is such that the Xen hypervisor
is the only one that knows the CPU usage of different guests and can
make the proper decision of when to put CPUs and packages in proper states.
Unfortunately the hypervisor has no support to parse ACPI DSDT tables, hence it
needs help from the initial domain to provide this information. The reason
for 3) is that we do not want the initial domain to change P-states while the
hypervisor is doing it as well - it causes rather some funny cases of P-states
transitions.
For this to work, the driver parses the Power Management data and uploads said
information to the Xen hypervisor. It also calls acpi_processor_notify_smm()
to inhibit the other CPU frequency scaling drivers from being loaded.
Everything revolves around the 'struct acpi_processor' structure which
gets updated during the bootup cycle in different stages. At the startup, when
the ACPI parser starts, the C-state information is processed (processor_idle)
and saved in said structure as 'power' element. Later on, the CPU frequency
scaling driver (powernow-k8 or acpi_cpufreq), would call the the
acpi_processor_* (processor_perflib functions) to parse P-states information
and populate in the said structure the 'performance' element.
Since we do not want the CPU frequency scaling drivers from loading
we have to call the acpi_processor_* functions to parse the P-states and
call "acpi_processor_notify_smm" to stop them from loading.
There is also one oddity in this driver which is that under Xen, the
physical online CPU count can be different from the virtual online CPU count.
Meaning that the macros 'for_[online|possible]_cpu' would process only
up to virtual online CPU count. We on the other hand want to process
the full amount of physical CPUs. For that, the driver checks if the ACPI IDs
count is different from the APIC ID count - which can happen if the user
choose to use dom0_max_vcpu argument. In such a case a backup of the PM
structure is used and uploaded to the hypervisor.
[v1-v2: Initial RFC implementations that were posted]
[v3: Changed the name to passthru suggested by Pasi Kärkkäinen <pasik@iki.fi>]
[v4: Added vCPU != pCPU support - aka dom0_max_vcpus support]
[v5: Cleaned up the driver, fix bug under Athlon XP]
[v6: Changed the driver to a CPU frequency governor]
[v7: Jan Beulich <jbeulich@suse.com> suggestion to make it a cpufreq scaling driver
made me rework it as driver that inhibits cpufreq scaling driver]
[v8: Per Jan's review comments, fixed up the driver]
[v9: Allow to continue even if acpi_processor_preregister_perf.. fails]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Wed, 14 Mar 2012 16:34:19 +0000 (12:34 -0400)]
xen: constify all instances of "struct attribute_group"
The functions these get passed to have been taking pointers to const
since at least 2.6.16.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Stefano Stabellini [Tue, 13 Mar 2012 18:30:44 +0000 (18:30 +0000)]
xen/xenbus: ignore console/0
Unfortunately xend creates a bogus console/0 frotend/backend entry pair
on xenstore that console backends cannot properly cope with.
Any guest behavior that is not completely ignoring console/0 is going
to either cause problems with xenconsoled or qemu.
Returning 0 or -ENODEV from xencons_probe is not enough because it is
going to cause the frontend state to become 4 or 6 respectively.
The best possible thing we can do here is just ignore the entry from
xenbus_probe_frontend.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Stefano Stabellini [Tue, 21 Feb 2012 11:30:42 +0000 (11:30 +0000)]
hvc_xen: introduce HVC_XEN_FRONTEND
Introduce a new config option HVC_XEN_FRONTEND to enable/disable the
xenbus based pv console frontend.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Stefano Stabellini [Mon, 30 Jan 2012 16:02:31 +0000 (16:02 +0000)]
hvc_xen: implement multiconsole support
This patch implements support for multiple consoles:
consoles other than the first one are setup using the traditional xenbus
and grant-table based mechanism.
We use a list to keep track of the allocated consoles, we don't
expect too many of them anyway.
Changes in v3:
- call hvc_remove before removing the console from xenconsoles;
- do not lock xencons_lock twice in the destruction path;
- use the DEFINE_XENBUS_DRIVER macro.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Stefano Stabellini [Fri, 27 Jan 2012 18:31:36 +0000 (18:31 +0000)]
hvc_xen: support PV on HVM consoles
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Mon, 5 Mar 2012 17:11:31 +0000 (17:11 +0000)]
xenbus: don't free other end details too early
The individual drivers' remove functions could legitimately attempt to
access this information (for logging messages if nothing else). Note
that I did not in fact observe a problem anywhere, but I came across
this while looking into the reasons for what turned out to need the
fix at https://lkml.org/lkml/2012/3/5/336 to vsprintf().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 14 Feb 2012 03:26:32 +0000 (22:26 -0500)]
xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
For the hypervisor to take advantage of the MWAIT support it needs
to extract from the ACPI _CST the register address. But the
hypervisor does not have the support to parse DSDT so it relies on
the initial domain (dom0) to parse the ACPI Power Management information
and push it up to the hypervisor. The pushing of the data is done
by the processor_harveset_xen module which parses the information that
the ACPI parser has graciously exposed in 'struct acpi_processor'.
For the ACPI parser to also expose the Cx states for MWAIT, we need
to expose the MWAIT capability (leaf 1). Furthermore we also need to
expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly
function.
The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX
operations, but it can't do it since it needs to be backwards compatible.
Instead we choose to use the native CPUID to figure out if the MWAIT
capability exists and use the XEN_SET_PDC query hypercall to figure out
if the hypervisor wants us to expose the MWAIT_LEAF capability or not.
Note: The XEN_SET_PDC query was implemented in c/s 23783:
"ACPI: add _PDC input override mechanism".
With this in place, instead of
C3 ACPI IOPORT 415
we get now
C3:ACPI FFH INTEL MWAIT 0x20
Note: The cpu_idle which would be calling the mwait variants for idling
never gets set b/c we set the default pm_idle to be the hypercall variant.
Acked-by: Jan Beulich <JBeulich@suse.com>
[v2: Fix missing header file include and #ifdef]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 23 Jan 2012 15:53:57 +0000 (10:53 -0500)]
xen/setup/pm/acpi: Remove the call to boot_option_idle_override.
We needed that call in the past to force the kernel to use
default_idle (which called safe_halt, which called xen_safe_halt).
But set_pm_idle_to_default() does now that, so there is no need
to use this boot option operand.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Fri, 24 Feb 2012 11:46:32 +0000 (11:46 +0000)]
xenbus: address compiler warnings
- casting pointers to integer types of different size is being warned on
- an uninitialized variable warning occurred on certain gcc versions
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Alex Shi [Fri, 13 Jan 2012 15:53:35 +0000 (23:53 +0800)]
xen: use this_cpu_xxx replace percpu_xxx funcs
percpu_xxx funcs are duplicated with this_cpu_xxx funcs, so replace them
for further code clean up.
I don't know much of xen code. But, since the code is in x86 architecture,
the percpu_xxx is exactly same as this_cpu_xxx serials functions. So, the
change is safe.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 4 Jan 2012 19:30:58 +0000 (14:30 -0500)]
xen/pciback: Support pci_reset_function, aka FLR or D3 support.
We use the __pci_reset_function_locked to perform the action.
Also on attaching ("bind") and detaching ("unbind") we save and
restore the configuration states. When the device is disconnected
from a guest we use the "pci_reset_function" to also reset the
device before being passed to another guest.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 4 Jan 2012 19:23:56 +0000 (14:23 -0500)]
pci: Introduce __pci_reset_function_locked to be used when holding device_lock.
The use case of this is when a driver wants to call FLR when a device
is attached to it using the SysFS "bind" or "unbind" functionality.
The call chain when a user does "bind" looks as so:
echo "0000:01.07.0" > /sys/bus/pci/drivers/XXXX/bind
and ends up calling:
driver_bind:
device_lock(dev); <=== TAKES LOCK
XXXX_probe:
.. pci_enable_device()
...__pci_reset_function(), which calls
pci_dev_reset(dev, 0):
if (!0) {
device_lock(dev) <==== DEADLOCK
The __pci_reset_function_locked function allows the the drivers
'probe' function to call the "pci_reset_function" while still holding
the driver mutex lock.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tang Liang [Thu, 8 Dec 2011 09:36:39 +0000 (17:36 +0800)]
xen: Utilize the restore_msi_irqs hook.
to make a hypercall to restore the vectors in the MSI/MSI-X
configuration space.
Signed-off-by: Tang Liang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Linus Torvalds [Thu, 12 Jan 2012 02:50:26 +0000 (18:50 -0800)]
Merge branch 'linux-next' of git://git./linux/kernel/git/jbarnes/pci
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci: (80 commits)
x86/PCI: Expand the x86_msi_ops to have a restore MSIs.
PCI: Increase resource array mask bit size in pcim_iomap_regions()
PCI: DEVICE_COUNT_RESOURCE should be equal to PCI_NUM_RESOURCES
PCI: pci_ids: add device ids for STA2X11 device (aka ConneXT)
PNP: work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB
x86/PCI: amd: factor out MMCONFIG discovery
PCI: Enable ATS at the device state restore
PCI: msi: fix imbalanced refcount of msi irq sysfs objects
PCI: kconfig: English typo in pci/pcie/Kconfig
PCI/PM/Runtime: make PCI traces quieter
PCI: remove pci_create_bus()
xtensa/PCI: convert to pci_scan_root_bus() for correct root bus resources
x86/PCI: convert to pci_create_root_bus() and pci_scan_root_bus()
x86/PCI: use pci_scan_bus() instead of pci_scan_bus_parented()
x86/PCI: read Broadcom CNB20LE host bridge info before PCI scan
sparc32, leon/PCI: convert to pci_scan_root_bus() for correct root bus resources
sparc/PCI: convert to pci_create_root_bus()
sh/PCI: convert to pci_scan_root_bus() for correct root bus resources
powerpc/PCI: convert to pci_create_root_bus()
powerpc/PCI: split PHB part out of pcibios_map_io_space()
...
Fix up conflicts in drivers/pci/msi.c and include/linux/pci_regs.h due
to the same patches being applied in other branches.
Ben Hutchings [Tue, 10 Jan 2012 03:04:32 +0000 (03:04 +0000)]
cpu: Register a generic CPU device on architectures that currently do not
frv, h8300, m68k, microblaze, openrisc, score, um and xtensa currently
do not register a CPU device. Add the config option GENERIC_CPU_DEVICES
which causes a generic CPU device to be registered for each present CPU,
and make all these architectures select it.
Richard Weinberger <richard@nod.at> covered UML and suggested using
per_cpu.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Hutchings [Tue, 10 Jan 2012 02:59:49 +0000 (02:59 +0000)]
cpu: Do not return errors from cpu_dev_init() which will be ignored
cpu_dev_init() is only called from driver_init(), which does not check
its return value. Therefore make cpu_dev_init() return void.
We must register the CPU subsystem, so panic if this fails.
If sched_create_sysfs_power_savings_entries() fails, the damage is
contained, so ignore this (as before).
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 11 Jan 2012 06:01:27 +0000 (22:01 -0800)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (54 commits)
crypto: gf128mul - remove leftover "(EXPERIMENTAL)" in Kconfig
crypto: serpent-sse2 - remove unneeded LRW/XTS #ifdefs
crypto: serpent-sse2 - select LRW and XTS
crypto: twofish-x86_64-3way - remove unneeded LRW/XTS #ifdefs
crypto: twofish-x86_64-3way - select LRW and XTS
crypto: xts - remove dependency on EXPERIMENTAL
crypto: lrw - remove dependency on EXPERIMENTAL
crypto: picoxcell - fix boolean and / or confusion
crypto: caam - remove DECO access initialization code
crypto: caam - fix polarity of "propagate error" logic
crypto: caam - more desc.h cleanups
crypto: caam - desc.h - convert spaces to tabs
crypto: talitos - convert talitos_error to struct device
crypto: talitos - remove NO_IRQ references
crypto: talitos - fix bad kfree
crypto: convert drivers/crypto/* to use module_platform_driver()
char: hw_random: convert drivers/char/hw_random/* to use module_platform_driver()
crypto: serpent-sse2 - should select CRYPTO_CRYPTD
crypto: serpent - rename serpent.c to serpent_generic.c
crypto: serpent - cleanup checkpatch errors and warnings
...
Linus Torvalds [Wed, 11 Jan 2012 05:51:23 +0000 (21:51 -0800)]
Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-security
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security: (32 commits)
ima: fix invalid memory reference
ima: free duplicate measurement memory
security: update security_file_mmap() docs
selinux: Casting (void *) value returned by kmalloc is useless
apparmor: fix module parameter handling
Security: tomoyo: add .gitignore file
tomoyo: add missing rcu_dereference()
apparmor: add missing rcu_dereference()
evm: prevent racing during tfm allocation
evm: key must be set once during initialization
mpi/mpi-mpow: NULL dereference on allocation failure
digsig: build dependency fix
KEYS: Give key types their own lockdep class for key->sem
TPM: fix transmit_cmd error logic
TPM: NSC and TIS drivers X86 dependency fix
TPM: Export wait_for_stat for other vendor specific drivers
TPM: Use vendor specific function for status probe
tpm_tis: add delay after aborting command
tpm_tis: Check return code from getting timeouts/durations
tpm: Introduce function to poll for result of self test
...
Fix up trivial conflict in lib/Makefile due to addition of CONFIG_MPI
and SIGSIG next to CONFIG_DQL addition.
Linus Torvalds [Wed, 11 Jan 2012 05:46:36 +0000 (21:46 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
autofs4: deal with autofs4_write/autofs4_write races
autofs4: catatonic_mode vs. notify_daemon race
autofs4: autofs4_wait() vs. autofs4_catatonic_mode() race
hfsplus: creation of hidden dir on mount can fail
block_dev: Suppress bdev_cache_init() kmemleak warninig
fix shrink_dcache_parent() livelock
coda: switch coda_cnode_make() to sane API as well, clean coda_lookup()
coda: deal correctly with allocation failure from coda_cnode_makectl()
securityfs: fix object creation races
Al Viro [Wed, 11 Jan 2012 03:35:38 +0000 (22:35 -0500)]
autofs4: deal with autofs4_write/autofs4_write races
Just serialize the actual writing of packets into pipe on
a new mutex, independent from everything else in the locking
hierarchy. As soon as something has started feeding a piece
of packet into the pipe to daemon, we *want* everything else
about to try the same to wait until we are done.
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 11 Jan 2012 03:24:48 +0000 (22:24 -0500)]
autofs4: catatonic_mode vs. notify_daemon race
we need to hold ->wq_mutex while we are forming the packet to send,
lest we have autofs4_catatonic_mode() setting wq->name.name to NULL
just as autofs4_notify_daemon() decides to memcpy() from it...
We do have check for catatonic mode immediately after that (under
->wq_mutex, as it ought to be) and packet won't be actually sent,
but it'll be too late for us if we oops on that memcpy() from NULL...
Fix is obvious - just extend the area covered by ->wq_mutex over
that switch and check whether it's catatonic *before* doing anything
else.
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 11 Jan 2012 03:20:12 +0000 (22:20 -0500)]
autofs4: autofs4_wait() vs. autofs4_catatonic_mode() race
We need to recheck ->catatonic after autofs4_wait() got ->wq_mutex
for good, or we might end up with wq inserted into queue after
autofs4_catatonic_mode() had done its thing. It will stick there
forever, since there won't be anything to clear its ->name.name.
A bit of a complication: validate_request() drops and regains ->wq_mutex.
It actually ends up the most convenient place to stick the check into...
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Wed, 11 Jan 2012 02:04:27 +0000 (18:04 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/mst/vhost
lib: use generic pci_iomap on all architectures
Many architectures don't want to pull in iomap.c,
so they ended up duplicating pci_iomap from that file.
That function isn't trivial, and we are going to modify it
https://lkml.org/lkml/2011/11/14/183
so the duplication hurts.
This reduces the scope of the problem significantly,
by moving pci_iomap to a separate file and
referencing that from all architectures.
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
alpha: drop pci_iomap/pci_iounmap from pci-noop.c
mn10300: switch to GENERIC_PCI_IOMAP
mn10300: add missing __iomap markers
frv: switch to GENERIC_PCI_IOMAP
tile: switch to GENERIC_PCI_IOMAP
tile: don't panic on iomap
sparc: switch to GENERIC_PCI_IOMAP
sh: switch to GENERIC_PCI_IOMAP
powerpc: switch to GENERIC_PCI_IOMAP
parisc: switch to GENERIC_PCI_IOMAP
mips: switch to GENERIC_PCI_IOMAP
microblaze: switch to GENERIC_PCI_IOMAP
arm: switch to GENERIC_PCI_IOMAP
alpha: switch to GENERIC_PCI_IOMAP
lib: add GENERIC_PCI_IOMAP
lib: move GENERIC_IOMAP to lib/Kconfig
Fix up trivial conflicts due to changes nearby in arch/{m68k,score}/Kconfig
Linus Torvalds [Wed, 11 Jan 2012 01:39:40 +0000 (17:39 -0800)]
Merge tag 'for-linux-3.3-merge-window' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming
* tag 'for-linux-3.3-merge-window' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming: (29 commits)
C6X: replace tick_nohz_stop/restart_sched_tick calls
C6X: add register_cpu call
C6X: deal with memblock API changes
C6X: fix timer64 initialization
C6X: fix layout of EMIFA registers
C6X: MAINTAINERS
C6X: DSCR - Device State Configuration Registers
C6X: EMIF - External Memory Interface
C6X: general SoC support
C6X: library code
C6X: headers
C6X: ptrace support
C6X: loadable module support
C6X: cache control
C6X: clocks
C6X: build infrastructure
C6X: syscalls
C6X: interrupt handling
C6X: time management
C6X: signal management
...
Linus Torvalds [Wed, 11 Jan 2012 01:37:49 +0000 (17:37 -0800)]
Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze
* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Wire-up new system calls
microblaze: Remove NO_IRQ from architecture
input: xilinx_ps2: Don't use NO_IRQ
block: xsysace: Don't use NO_IRQ
microblaze: Trivial asm fix
microblaze: Fix debug message in module
microblaze: Remove eprintk macro
microblaze: Send CR before LF for early console
microblaze: Change NO_IRQ to 0
microblaze: Use irq_of_parse_and_map for timer
microblaze: intc: Change variable name
microblaze: Use of_find_compatible_node for timer and intc
microblaze: Add __cmpdi2
microblaze: Synchronize __pa __va macros
Linus Torvalds [Wed, 11 Jan 2012 01:37:20 +0000 (17:37 -0800)]
Merge branch 'unicore32' of git://github.com/gxt/linux
* 'unicore32' of git://github.com/gxt/linux:
rtc-puv3: solve section mismatch in rtc-puv3.c
rtc-puv3: using module_platform_driver()
i2c-puv3: using module_platform_driver()
rtc-puv3: irq: remove IRQF_DISABLED
unicore32: Remove IRQF_DISABLED
unicore32: Use set_current_blocked()
unicore32: add ioremap_nocache definition
unicore32: delete specified xlate_dev_mem_ptr
of: add include asm/setup.h in drivers/of/fdt.c
unicore32: standardize /proc/iomem "Kernel code" name
Linus Torvalds [Wed, 11 Jan 2012 01:36:43 +0000 (17:36 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/lliubbo/blackfin
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin:
blackfin: bf561: add adv7183 capture support
blackfin: bf537: add capture support
blackfin: bf548: add capture support
blackfin: time-ts: rm unused func broadcast_timer_setup()
blackfin: i2c-lcd: change default clock rate
blackfin: mac: dsa: add vlan mask in board file
blackfin: bf537: change num_chipselect for spi-sport
blackfin: serial: bfin-uart: remove unused field
bf54x: get mem size: missing break in switch
blackfin: smp: fix msg queue overflow issue
blackfin: config: update macro SPI_BFIN in board file
blackfin: config: update def config for all boards
blackfin: smp: cleanup smp code
blackfin: smp: add suspend and wakeup irq flags
blackfin: bf533-stamp: add missed patches for new asoc driver
blackfin: bf533-stamp: fix ad1836 name
Linus Torvalds [Wed, 11 Jan 2012 00:59:59 +0000 (16:59 -0800)]
Merge branch 'writeback-for-linus' of git://git./linux/kernel/git/wfg/linux
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c
writeback: balanced_rate cannot exceed write bandwidth
writeback: do strict bdi dirty_exceeded
writeback: avoid tiny dirty poll intervals
writeback: max, min and target dirty pause time
writeback: dirty ratelimit - think time compensation
btrfs: fix dirtied pages accounting on sub-page writes
writeback: fix dirtied pages accounting on redirty
writeback: fix dirtied pages accounting on sub-page writes
writeback: charge leaked page dirties to active tasks
writeback: Include all dirty inodes in background writeback
Linus Torvalds [Wed, 11 Jan 2012 00:42:48 +0000 (16:42 -0800)]
Merge branch 'akpm' (aka "Andrew's patch-bomb")
Andrew elucidates:
- First installmeant of MM. We have a HUGE number of MM patches this
time. It's crazy.
- MAINTAINERS updates
- backlight updates
- leds
- checkpatch updates
- misc ELF stuff
- rtc updates
- reiserfs
- procfs
- some misc other bits
* akpm: (124 commits)
user namespace: make signal.c respect user namespaces
workqueue: make alloc_workqueue() take printf fmt and args for name
procfs: add hidepid= and gid= mount options
procfs: parse mount options
procfs: introduce the /proc/<pid>/map_files/ directory
procfs: make proc_get_link to use dentry instead of inode
signal: add block_sigmask() for adding sigmask to current->blocked
sparc: make SA_NOMASK a synonym of SA_NODEFER
reiserfs: don't lock root inode searching
reiserfs: don't lock journal_init()
reiserfs: delay reiserfs lock until journal initialization
reiserfs: delete comments referring to the BKL
drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range
drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030
drivers/rtc/: remove redundant spi driver bus initialization
drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static
drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static
rtc: convert drivers/rtc/* to use module_platform_driver()
drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc()
drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler
...
Serge E. Hallyn [Tue, 10 Jan 2012 23:11:37 +0000 (15:11 -0800)]
user namespace: make signal.c respect user namespaces
ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
user namespace. (new, thanks Oleg)
__send_signal: convert current's uid to the recipient's user namespace
for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
again :)
do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
user namespace
ptrace_signal maps parent's uid into current's user namespace before
including in signal to current. IIUC Oleg has argued that this shouldn't
matter as the debugger will play with it, but it seems like not converting
the value currently being set is misleading.
Changelog:
Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
simplify callers and help make clear what we are translating
(which uid into which namespace). Passing the target task would
make callers even easier to read, but we pass in user_ns because
current_user_ns() != task_cred_xxx(current, user_ns).
Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
in ptrace_signal().
Sep 23: In send_signal(), detect when (user) signal is coming from an
ancestor or unrelated user namespace. Pass that on to __send_signal,
which sets si_uid to 0 or overflowuid if needed.
Oct 12: Base on Oleg's fixup_uid() patch. On top of that, handle all
SI_FROMKERNEL cases at callers, because we can't assume sender is
current in those cases.
Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
Nov 10: (akpm) make the !CONFIG_USER_NS case clearer
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
From: Serge Hallyn <serge.hallyn@canonical.com>
Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)
Eric Biederman pointed out that passing info is a bug and could lead to a
NULL pointer deref to boot.
A collection of signal, securebits, filecaps, cap_bounds, and a few other
ltp tests passed with this kernel.
Changelog:
Nov 18: previous patch missed a leading '&'
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
From: Dan Carpenter <dan.carpenter@oracle.com>
Subject: ipc/mqueue: lock() => unlock() typo
There was a double lock typo introduced in
b085f4bd6b21 "user namespace:
make signal.c respect user namespaces"
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Tue, 10 Jan 2012 23:11:35 +0000 (15:11 -0800)]
workqueue: make alloc_workqueue() take printf fmt and args for name
alloc_workqueue() currently expects the passed in @name pointer to remain
accessible. This is inconvenient and a bit silly given that the whole wq
is being dynamically allocated. This patch updates alloc_workqueue() and
friends to take printf format string instead of opaque string and matching
varargs at the end. The name is allocated together with the wq and
formatted.
alloc_ordered_workqueue() is converted to a macro to unify varargs
handling with alloc_workqueue(), and, while at it, add comment to
alloc_workqueue().
None of the current in-kernel users pass in string with '%' as constant
name and this change shouldn't cause any problem.
[akpm@linux-foundation.org: use __printf]
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vasiliy Kulikov [Tue, 10 Jan 2012 23:11:31 +0000 (15:11 -0800)]
procfs: add hidepid= and gid= mount options
Add support for mount options to restrict access to /proc/PID/
directories. The default backward-compatible "relaxed" behaviour is left
untouched.
The first mount option is called "hidepid" and its value defines how much
info about processes we want to be available for non-owners:
hidepid=0 (default) means the old behavior - anybody may read all
world-readable /proc/PID/* files.
hidepid=1 means users may not access any /proc/<pid>/ directories, but
their own. Sensitive files like cmdline, sched*, status are now protected
against other users. As permission checking done in proc_pid_permission()
and files' permissions are left untouched, programs expecting specific
files' modes are not confused.
hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
users. It doesn't mean that it hides whether a process exists (it can be
learned by other means, e.g. by kill -0 $PID), but it hides process' euid
and egid. It compicates intruder's task of gathering info about running
processes, whether some daemon runs with elevated privileges, whether
another user runs some sensitive program, whether other users run any
program at all, etc.
gid=XXX defines a group that will be able to gather all processes' info
(as in hidepid=0 mode). This group should be used instead of putting
nonroot user in sudoers file or something. However, untrusted users (like
daemons, etc.) which are not supposed to monitor the tasks in the whole
system should not be added to the group.
hidepid=1 or higher is designed to restrict access to procfs files, which
might reveal some sensitive private information like precise keystrokes
timings:
http://www.openwall.com/lists/oss-security/2011/11/05/3
hidepid=1/2 doesn't break monitoring userspace tools. ps, top, pgrep, and
conky gracefully handle EPERM/ENOENT and behave as if the current user is
the only user running processes. pstree shows the process subtree which
contains "pstree" process.
Note: the patch doesn't deal with setuid/setgid issues of keeping
preopened descriptors of procfs files (like
https://lkml.org/lkml/2011/2/7/368). We rely on that the leaked
information like the scheduling counters of setuid apps doesn't threaten
anybody's privacy - only the user started the setuid program may read the
counters.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg KH <greg@kroah.com>
Cc: Theodore Tso <tytso@MIT.EDU>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: James Morris <jmorris@namei.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vasiliy Kulikov [Tue, 10 Jan 2012 23:11:27 +0000 (15:11 -0800)]
procfs: parse mount options
Add support for procfs mount options. Actual mount options are coming in
the next patches.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg KH <greg@kroah.com>
Cc: Theodore Tso <tytso@MIT.EDU>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: James Morris <jmorris@namei.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Tue, 10 Jan 2012 23:11:23 +0000 (15:11 -0800)]
procfs: introduce the /proc/<pid>/map_files/ directory
This one behaves similarly to the /proc/<pid>/fd/ one - it contains
symlinks one for each mapping with file, the name of a symlink is
"vma->vm_start-vma->vm_end", the target is the file. Opening a symlink
results in a file that point exactly to the same inode as them vma's one.
For example the ls -l of some arbitrary /proc/<pid>/map_files/
| lr-x------ 1 root root 64 Aug 26 06:40
7f8f80403000-
7f8f80404000 -> /lib64/libc-2.5.so
| lr-x------ 1 root root 64 Aug 26 06:40
7f8f8061e000-
7f8f80620000 -> /lib64/libselinux.so.1
| lr-x------ 1 root root 64 Aug 26 06:40
7f8f80826000-
7f8f80827000 -> /lib64/libacl.so.1.1.0
| lr-x------ 1 root root 64 Aug 26 06:40
7f8f80a2f000-
7f8f80a30000 -> /lib64/librt-2.5.so
| lr-x------ 1 root root 64 Aug 26 06:40
7f8f80a30000-
7f8f80a4c000 -> /lib64/ld-2.5.so
This *helps* checkpointing process in three ways:
1. When dumping a task mappings we do know exact file that is mapped
by particular region. We do this by opening
/proc/$pid/map_files/$address symlink the way we do with file
descriptors.
2. This also helps in determining which anonymous shared mappings are
shared with each other by comparing the inodes of them.
3. When restoring a set of processes in case two of them has a mapping
shared, we map the memory by the 1st one and then open its
/proc/$pid/map_files/$address file and map it by the 2nd task.
Using /proc/$pid/maps for this is quite inconvenient since it brings
repeatable re-reading and reparsing for this text file which slows down
restore procedure significantly. Also as being pointed in (3) it is a way
easier to use top level shared mapping in children as
/proc/$pid/map_files/$address when needed.
[akpm@linux-foundation.org: coding-style fixes]
[gorcunov@openvz.org: make map_files depend on CHECKPOINT_RESTORE]
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Vasiliy Kulikov <segoon@openwall.com>
Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Tue, 10 Jan 2012 23:11:20 +0000 (15:11 -0800)]
procfs: make proc_get_link to use dentry instead of inode
Prepare the ground for the next "map_files" patch which needs a name of a
link file to analyse.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Fleming [Tue, 10 Jan 2012 23:11:17 +0000 (15:11 -0800)]
signal: add block_sigmask() for adding sigmask to current->blocked
Abstract the code sequence for adding a signal handler's sa_mask to
current->blocked because the sequence is identical for all architectures.
Furthermore, in the past some architectures actually got this code wrong,
so introduce a wrapper that all architectures can use.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Fleming [Tue, 10 Jan 2012 23:11:14 +0000 (15:11 -0800)]
sparc: make SA_NOMASK a synonym of SA_NODEFER
Unlike other architectures, sparc currently has no SA_NODEFER definition
but only the older SA_NOMASK. Since SA_NOMASK is the historical name for
SA_NODEFER, add SA_NODEFER and copy what other architectures do by making
SA_NOMASK a synonym for SA_NODEFER.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frederic Weisbecker [Tue, 10 Jan 2012 23:11:11 +0000 (15:11 -0800)]
reiserfs: don't lock root inode searching
Nothing requires that we lock the filesystem until the root inode is
provided.
Also iget5_locked() triggers a warning because we are holding the
filesystem lock while allocating the inode, which result in a lockdep
suspicion that we have a lock inversion against the reclaim path:
[ 1986.896979] =================================
[ 1986.896990] [ INFO: inconsistent lock state ]
[ 1986.896997] 3.1.1-main #8
[ 1986.897001] ---------------------------------
[ 1986.897007] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 1986.897016] kswapd0/16 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1986.897023] (&REISERFS_SB(s)->lock){+.+.?.}, at: [<
c01f8bd4>] reiserfs_write_lock+0x20/0x2a
[ 1986.897044] {RECLAIM_FS-ON-W} state was registered at:
[ 1986.897050] [<
c014a5b9>] mark_held_locks+0xae/0xd0
[ 1986.897060] [<
c014aab3>] lockdep_trace_alloc+0x7d/0x91
[ 1986.897068] [<
c0190ee0>] kmem_cache_alloc+0x1a/0x93
[ 1986.897078] [<
c01e7728>] reiserfs_alloc_inode+0x13/0x3d
[ 1986.897088] [<
c01a5b06>] alloc_inode+0x14/0x5f
[ 1986.897097] [<
c01a5cb9>] iget5_locked+0x62/0x13a
[ 1986.897106] [<
c01e99e0>] reiserfs_fill_super+0x410/0x8b9
[ 1986.897114] [<
c01953da>] mount_bdev+0x10b/0x159
[ 1986.897123] [<
c01e764d>] get_super_block+0x10/0x12
[ 1986.897131] [<
c0195b38>] mount_fs+0x59/0x12d
[ 1986.897138] [<
c01a80d1>] vfs_kern_mount+0x45/0x7a
[ 1986.897147] [<
c01a83e3>] do_kern_mount+0x2f/0xb0
[ 1986.897155] [<
c01a987a>] do_mount+0x5c2/0x612
[ 1986.897163] [<
c01a9a72>] sys_mount+0x61/0x8f
[ 1986.897170] [<
c044060c>] sysenter_do_call+0x12/0x32
[ 1986.897181] irq event stamp:
7509691
[ 1986.897186] hardirqs last enabled at (
7509691): [<
c0190f34>] kmem_cache_alloc+0x6e/0x93
[ 1986.897197] hardirqs last disabled at (
7509690): [<
c0190eea>] kmem_cache_alloc+0x24/0x93
[ 1986.897209] softirqs last enabled at (
7508896): [<
c01294bd>] __do_softirq+0xee/0xfd
[ 1986.897222] softirqs last disabled at (
7508859): [<
c01030ed>] do_softirq+0x50/0x9d
[ 1986.897234]
[ 1986.897235] other info that might help us debug this:
[ 1986.897242] Possible unsafe locking scenario:
[ 1986.897244]
[ 1986.897250] CPU0
[ 1986.897254] ----
[ 1986.897257] lock(&REISERFS_SB(s)->lock);
[ 1986.897265] <Interrupt>
[ 1986.897269] lock(&REISERFS_SB(s)->lock);
[ 1986.897276]
[ 1986.897277] *** DEADLOCK ***
[ 1986.897278]
[ 1986.897286] no locks held by kswapd0/16.
[ 1986.897291]
[ 1986.897292] stack backtrace:
[ 1986.897299] Pid: 16, comm: kswapd0 Not tainted 3.1.1-main #8
[ 1986.897306] Call Trace:
[ 1986.897314] [<
c0439e76>] ? printk+0xf/0x11
[ 1986.897324] [<
c01482d1>] print_usage_bug+0x20e/0x21a
[ 1986.897332] [<
c01479b8>] ? print_irq_inversion_bug+0x172/0x172
[ 1986.897341] [<
c014855c>] mark_lock+0x27f/0x483
[ 1986.897349] [<
c0148d88>] __lock_acquire+0x628/0x1472
[ 1986.897358] [<
c0149fae>] lock_acquire+0x47/0x5e
[ 1986.897366] [<
c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897384] [<
c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897397] [<
c043b5ef>] mutex_lock_nested+0x35/0x26f
[ 1986.897409] [<
c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897421] [<
c01f8bd4>] reiserfs_write_lock+0x20/0x2a
[ 1986.897433] [<
c01e2edd>] map_block_for_writepage+0xc9/0x590
[ 1986.897448] [<
c01b1706>] ? create_empty_buffers+0x33/0x8f
[ 1986.897461] [<
c0121124>] ? get_parent_ip+0xb/0x31
[ 1986.897472] [<
c043ef7f>] ? sub_preempt_count+0x81/0x8e
[ 1986.897485] [<
c043cae0>] ? _raw_spin_unlock+0x27/0x3d
[ 1986.897496] [<
c0121124>] ? get_parent_ip+0xb/0x31
[ 1986.897508] [<
c01e355d>] reiserfs_writepage+0x1b9/0x3e7
[ 1986.897521] [<
c0173b40>] ? clear_page_dirty_for_io+0xcb/0xde
[ 1986.897533] [<
c014a6e3>] ? trace_hardirqs_on_caller+0x108/0x138
[ 1986.897546] [<
c014a71e>] ? trace_hardirqs_on+0xb/0xd
[ 1986.897559] [<
c0177b38>] shrink_page_list+0x34f/0x5e2
[ 1986.897572] [<
c01780a7>] shrink_inactive_list+0x172/0x22c
[ 1986.897585] [<
c0178464>] shrink_zone+0x303/0x3b1
[ 1986.897597] [<
c043cae0>] ? _raw_spin_unlock+0x27/0x3d
[ 1986.897611] [<
c01788c9>] kswapd+0x3b7/0x5f2
The deadlock shouldn't happen since we are doing that allocation in the
mount path, the filesystem is not available for any reclaim. Still the
warning is annoying.
To solve this, acquire the lock later only where we need it, right before
calling reiserfs_read_locked_inode() that wants to lock to walk the tree.
Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frederic Weisbecker [Tue, 10 Jan 2012 23:11:09 +0000 (15:11 -0800)]
reiserfs: don't lock journal_init()
journal_init() doesn't need the lock since no operation on the filesystem
is involved there. journal_read() and get_list_bitmap() have yet to be
reviewed carefully though before removing the lock there. Just keep the
it around these two calls for safety.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frederic Weisbecker [Tue, 10 Jan 2012 23:11:07 +0000 (15:11 -0800)]
reiserfs: delay reiserfs lock until journal initialization
In the mount path, transactions that are made before journal
initialization don't involve the filesystem. We can delay the reiserfs
lock until we play with the journal.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Tue, 10 Jan 2012 23:11:05 +0000 (15:11 -0800)]
reiserfs: delete comments referring to the BKL
Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Hutchings [Tue, 10 Jan 2012 23:11:02 +0000 (15:11 -0800)]
drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range
Commit
f44f7f96a20a ("RTC: Initialize kernel state from RTC") introduced a
potential infinite loop. If an alarm time contains a wildcard month and
an invalid day (> 31), or a wildcard year and an invalid month (>= 12),
the loop searching for the next matching date will never terminate. Treat
the invalid values as wildcards.
Fixes <http://bugs.debian.org/646429>, <http://bugs.debian.org/653331>
Reported-by: leo weppelman <leoweppelman@googlemail.com>
Reported-by: "P. van Gaans" <mailme667@yahoo.co.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benoit Cousson [Tue, 10 Jan 2012 23:10:59 +0000 (15:10 -0800)]
drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030
Add the DT support for the TI rtc-twl present in the twl4030 and twl6030
devices.
Signed-off-by: Benoit Cousson <b-cousson@ti.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lars-Peter Clausen [Tue, 10 Jan 2012 23:10:58 +0000 (15:10 -0800)]
drivers/rtc/: remove redundant spi driver bus initialization
In ancient times it was necessary to manually initialize the bus field of
an spi_driver to spi_bus_type. These days this is done in
spi_driver_register(), so we can drop the manual assignment.
The patch was generated using the following coccinelle semantic patch:
// <smpl>
@@
identifier _driver;
@@
struct spi_driver _driver = {
.driver = {
- .bus = &spi_bus_type,
},
};
// </smpl>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Tue, 10 Jan 2012 23:10:55 +0000 (15:10 -0800)]
drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Tue, 10 Jan 2012 23:10:52 +0000 (15:10 -0800)]
drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Tue, 10 Jan 2012 23:10:48 +0000 (15:10 -0800)]
rtc: convert drivers/rtc/* to use module_platform_driver()
This patch converts the drivers in drivers/rtc/* to use the
module_platform_driver() macro which makes the code smaller and a bit
simpler.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Ben Dooks <ben@simtec.co.uk>
Cc: John Stultz <john.stultz@linaro.org>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Tue, 10 Jan 2012 23:10:44 +0000 (15:10 -0800)]
drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc()
Marginally less code and eliminate the possibility of memory leaks.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Tue, 10 Jan 2012 23:10:43 +0000 (15:10 -0800)]
drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler
Due to changes in the RTC core the period interrupt is now unused so
delete the code managing it.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Godfrey [Tue, 10 Jan 2012 23:10:42 +0000 (15:10 -0800)]
rtc/ab8500: add calibration attribute to AB8500 RTC
The rtc_calibration attribute allows user-space to get and set the
AB8500's RtcCalibration register. The AB8500 will then use the value in
this register to compensate for RTC drift every 60 seconds.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mark Godfrey <mark.godfrey@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Walleij [Tue, 10 Jan 2012 23:10:41 +0000 (15:10 -0800)]
drivers/rtc/rtc-ab8500.c: change msleep() to usleep_range()
The resolution of msleep is related to HZ, so with HZ set to 100 any
msleep of less than 10ms will become ~10ms. This is not what we want.
Use the hrtimer-based usleep_range() and allow for some slack in the
non-critical path so we have more control of what is happening here.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Jonas Aaberg <jonas.aberg@stericsson.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Lynn [Tue, 10 Jan 2012 23:10:38 +0000 (15:10 -0800)]
rtc/ab8500: set can_wake flag
Set can_wake flag so wakealarm property is visible in sysfs.
Signed-off-by: Andrew Lynn <andrew.lynn@stericsson.com>
Reviewed-by: Jonas ABERG <jonas.aberg@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert Marklund [Tue, 10 Jan 2012 23:10:35 +0000 (15:10 -0800)]
rtc/ab8500: don't disable IRQ:s when suspending
We want this driver to be able to wake up the system.
Signed-off-by: Robert Marklund <robert.marklund@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yauhen Kharuzhy [Tue, 10 Jan 2012 23:10:34 +0000 (15:10 -0800)]
drivers/rtc/rtc-mxc.c: make alarm work
Fix alarm IRQ handling, make the alarm one-shot. Cleanup black magick
with a validation of already validated time data.
Add ability to wake the system with alarm.
[akpm@linux-foundation.org: fix CONFIG_PM=n build]
Signed-off-by: Yauhen Kharuzhy <jekhor@gmail.com>
Cc: Daniel Mack <daniel@caiaq.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yauhen Kharuzhy [Tue, 10 Jan 2012 23:10:32 +0000 (15:10 -0800)]
drivers/rtc/rtc-mxc.c: fix setting time for MX1 SoC
There is no way to track year in the i.MX1 RTC: Days Counter register is
9-bit wide only. Attempt to save date after 1970-01-01 plus 512 days
causes endless loop in mxc_rtc_set_mmss(). Fix this by resetting year to
1970.
[akpm@linux-foundation.org: use conventional comment layout]
Signed-off-by: Yauhen Kharuzhy <jekhor@gmail.com>
Cc: Daniel Mack <daniel@caiaq.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ondrej Zary [Tue, 10 Jan 2012 23:10:26 +0000 (15:10 -0800)]
drivers/rtc/rtc-cmos.c: fix broken NVRAM bank 2 writing
Fix writing to NVRAM bank 2 in rtc-cmos driver. It never worked since its
introduction in 2.6.28 because of a typo.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Daney [Tue, 10 Jan 2012 23:10:22 +0000 (15:10 -0800)]
MIPS: randomize PIE load address
... by selecting ARCH_BINFMT_ELF_RANDOMIZE_PIE
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Daney [Tue, 10 Jan 2012 23:10:21 +0000 (15:10 -0800)]
fs: binfmt_elf: create Kconfig variable for PIE randomization
Randomization of PIE load address is hard coded in binfmt_elf.c for X86
and ARM. Create a new Kconfig variable
(CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE) for this and use it instead. Thus
architecture specific policy is pushed out of the generic binfmt_elf.c and
into the architecture Kconfig files.
X86 and ARM Kconfigs are modified to select the new variable so there is
no change in behavior. A follow on patch will select it for MIPS too.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joakim Tjernlund [Tue, 10 Jan 2012 23:10:18 +0000 (15:10 -0800)]
crc32: optimize inner loop
Taking a pointer reference to each row in the crc table matrix, one can
reduce the inner loop with a few insn's
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Frank Zago <fzago@systemfabricworks.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:10:15 +0000 (15:10 -0800)]
checkpatch: catch all occurences of type and cast spacing errors per line
Fix up type and cast spacing checks such that all occurences on a line are
examined and reported. For example the line below has a valid cast and a
bad type, but currently we check the cast first which is good and stop:
u16* bar = (u16 *)baz;
We will also only report one of the errors in this example:
u16* bar = (u16*)bad;
Move to iterating across all casts and all types, reporting any failure.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:10:13 +0000 (15:10 -0800)]
checkpatch: typeof may have more complex arguments
typeof may have various more complex forms as its arguement, not just an
identifier. For now allow us to leak to the first close perenthesis ')'.
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:10:11 +0000 (15:10 -0800)]
checkpatch: ensure cast type is unique in the context parser
Ensure the cast type is unique in the context parser, we do not want them
to detect as a comma ','.
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:10:10 +0000 (15:10 -0800)]
checkpatch: fix complex macros handling of square brackets
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:10:08 +0000 (15:10 -0800)]
checkpatch: fix 'return is not a function' square bracket handling
We are incorrectly matching square brackets '[' and ']' leading to false
positives on more complex functions as below:
return (dt3155_fbuffer[m]->ready_head -
dt3155_fbuffer[m]->ready_len +
dt3155_fbuffer[m]->nbuffers)%
(dt3155_fbuffer[m]->nbuffers);
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:10:06 +0000 (15:10 -0800)]
checkpatch: complex macro should allow the empty do while loop
It is common to stub out a function as below, this is triggering a complex
macro format incorrectly. Sort this out:
#define cma_early_regions_reserve(reserve) do { } while (0)
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:10:04 +0000 (15:10 -0800)]
checkpatch: fix EXPORT_SYMBOL handling following a function
The following fragment defeats the DEVICE_ATTR style handing, check for
and ignore the close brace '}' in this context:
int foo()
{
}
DEVICE_ATTR(link_power_management_policy, S_IRUGO | S_IWUSR,
ata_scsi_lpm_show, ata_scsi_lpm_put);
EXPORT_SYMBOL_GPL(dev_attr_link_power_management_policy);
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:10:03 +0000 (15:10 -0800)]
checkpatch: only apply kconfig help checks for options which prompt
The intent of this check is to catch the options which the user will see
and ensure they are properly described. It is also common for internal
only options to have a brief description. Allow this form.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:10:01 +0000 (15:10 -0800)]
checkpatch: optimise statement scanner when mid-statement
In the middle of a long definition or similar, there is no possibility of
finding a smaller sub-statement. Optimise this case by skipping statement
aquirey where there are no starts of statement (open brace '{' or
semi-colon ';'). We are likely to scan slightly more than needed still
but this is safest.
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:10:00 +0000 (15:10 -0800)]
checkpatch: ## is not a valid modifier
Inserting a # into the modifiers list will incorrectly add the null string
to the modifiers list, leading to an infinite loop. As neither of these
is a valid modifier form simply ignore them.
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 10 Jan 2012 23:09:58 +0000 (15:09 -0800)]
checkpatch: improve memset and min/max with cast checking
Improve the checking of arguments to memset and min/max tests.
Move the checking of min/max to statement blocks instead of single line.
Change $Constant to allow any case type 0x initiator and trailing ul
specifier. Add $FuncArg type as any function argument with or without a
cast. Print the whole statement when showing memset or min/max messages.
Improve the memset with 0 as 3rd argument error message.
There are still weaknesses in the $FuncArg and $Constant code as arbitrary
parentheses and negative signs are not generically supported.
[akpm@linux-foundation.org: fix per Andy]
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:09:57 +0000 (15:09 -0800)]
checkpatch: check for common memset parameter issues against statments
Move the memset checks over to work against the statement. Also add
checks for 0 and 1 used as lengths. Generally these indicate badly
ordered parameters.
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 10 Jan 2012 23:09:54 +0000 (15:09 -0800)]
checkpatch: correctly track the end of preprocessor commands in context
When looking for a statement we currently run on through preprocessor
commands. This means that a header file with just definitions is parsed
over and over again combining all of the lines from the current line to
the end of file leading to severe performance issues.
Fix up context accumulation to track preprocessor commands and stop when
reaching the end of them. At the same time vastly simplify the #define
handling.
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 10 Jan 2012 23:09:52 +0000 (15:09 -0800)]
checkpatch: prefer __printf over __attribute__((format(printf,...)))
Add a warn for not using __printf.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 10 Jan 2012 23:09:50 +0000 (15:09 -0800)]
checkpatch: update signature "might be better as" warning
email header lines can look like signature tags. It's valid to have
multiple email recipients on a single line but not valid to have multiple
signatures on a single line.
Validate signatures only when not in the email headers.
Clear the $in_commit_log flag when the patch filename appears.
Add '-' to the valid chars in a message header for headers
like "Message-Id:" and "In-Reply-To:".
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steve Hodgson [Tue, 10 Jan 2012 23:09:47 +0000 (15:09 -0800)]
btree: export btree_get_prev() so modules can use btree_for_each
The btree_for_each API is implemented with macros that internally call
btree_get_prev(), so if btree_get_prev() isn't exported then modules fail
to link if they try to use one of the btree_for_each macros. Since the
rest of the btree API is exported, we should keep things orthogonal and
make this work too.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Steve Hodgson <steve@purestorage.com>
Acked-by: Joern Engel <joern@logfs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Tue, 10 Jan 2012 23:09:46 +0000 (15:09 -0800)]
leds: convert wm8350 driver to devm_kzalloc()
Saves a small amount of code and systematically eliminates leaks.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Tue, 10 Jan 2012 23:09:45 +0000 (15:09 -0800)]
leds: convert wm831x status driver to devm_kzalloc()
Saves a small amount of code and systematically eliminates leaks.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>