Arnaldo Carvalho de Melo [Sun, 15 May 2011 12:39:00 +0000 (09:39 -0300)]
perf evlist: Fix per thread mmap setup
The PERF_EVENT_IOC_SET_OUTPUT ioctl was returning -EINVAL when using
--pid when monitoring multithreaded apps, as we can only share a ring
buffer for events on the same thread if not doing per cpu.
Fix it by using per thread ring buffers.
Tested with:
[root@felicio ~]# tuna -t 26131 -CP | nl
1 thread ctxt_switches
2 pid SCHED_ rtpri affinity voluntary nonvoluntary cmd
3 26131 OTHER 0 0,1
10814276 2397830 chromium-browse
4 642 OTHER 0 0,1 14688 0 chromium-browse
5 26148 OTHER 0 0,1 713602 115479 chromium-browse
6 26149 OTHER 0 0,1 801958 2262 chromium-browse
7 26150 OTHER 0 0,1
1271128 248 chromium-browse
8 26151 OTHER 0 0,1 3 0 chromium-browse
9 27049 OTHER 0 0,1 36796 9 chromium-browse
10 618 OTHER 0 0,1 14711 0 chromium-browse
11 661 OTHER 0 0,1 14593 0 chromium-browse
12 29048 OTHER 0 0,1 28125 0 chromium-browse
13 26143 OTHER 0 0,1
2202789 781 chromium-browse
[root@felicio ~]#
So 11 threads under pid 26131, then:
[root@felicio ~]# perf record -F 50000 --pid 26131
[root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
1
7fa4a2538000-
7fa4a25b9000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
2
7fa4a25b9000-
7fa4a263a000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
3
7fa4a263a000-
7fa4a26bb000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
4
7fa4a26bb000-
7fa4a273c000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
5
7fa4a273c000-
7fa4a27bd000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
6
7fa4a27bd000-
7fa4a283e000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
7
7fa4a283e000-
7fa4a28bf000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
8
7fa4a28bf000-
7fa4a2940000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
9
7fa4a2940000-
7fa4a29c1000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
10
7fa4a29c1000-
7fa4a2a42000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
11
7fa4a2a42000-
7fa4a2ac3000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
[root@felicio ~]#
11 mmaps, one per thread since we didn't specify any CPU list, so we need one
mmap per thread and:
[root@felicio ~]# perf record -F 50000 --pid 26131
^M
^C[ perf record: Woken up 79 times to write data ]
[ perf record: Captured and wrote 20.614 MB perf.data (~900639 samples) ]
[root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
1 371310 26131
2 96516 26148
3 95694 26149
4 95203 26150
5 7291 26143
6 87 27049
7 76 661
8 60 29048
9 47 618
10 43 642
[root@felicio ~]#
Ok, one of the threads, 26151 was quiescent, so no samples there, but all the
others are there.
Then, if I specify one CPU:
[root@felicio ~]# perf record -F 50000 --pid 26131 --cpu 1
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.680 MB perf.data (~29730 samples) ]
[root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
1 8444 26131
2 2584 26149
3 2518 26148
4 2324 26150
5 123 26143
6 9 661
7 9 29048
[root@felicio ~]#
This machine has two cores, so fewer threads appeared on the radar, and:
[root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
1
7f484b922000-
7f484b9a3000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
[root@felicio ~]#
Just one mmap, as now we can use just one per-cpu buffer instead of the
per-thread needed in the previous case.
For global profiling:
[root@felicio ~]# perf record -F 50000 -a
^C[ perf record: Woken up 26 times to write data ]
[ perf record: Captured and wrote 7.128 MB perf.data (~311412 samples) ]
[root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
1
7fb49b435000-
7fb49b4b6000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
2
7fb49b4b6000-
7fb49b537000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
[root@felicio ~]#
It uses per-cpu buffers.
For just one thread:
[root@felicio ~]# perf record -F 50000 --tid 26148
^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.330 MB perf.data (~14426 samples) ]
[root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
1 9969 26148
[root@felicio ~]#
[root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
1
7f286a51b000-
7f286a59c000 rwxs
00000000 00:09 4064 anon_inode:[perf_event]
[root@felicio ~]#
Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Lin Ming <ming.m.lin@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: http://lkml.kernel.org/r/20110426204401.GB1746@ghostprotocols.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 25 Apr 2011 19:25:20 +0000 (16:25 -0300)]
perf tools: Honour the cpu list parameter when also monitoring a thread list
The perf_evlist__create_maps was discarding the --cpu parameter when a
--pid or --tid was specified, fix that.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: http://lkml.kernel.org/r/20110426204401.GB1746@ghostprotocols.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Peter Zijlstra [Fri, 22 Apr 2011 11:39:56 +0000 (13:39 +0200)]
perf, x86: Update/fix Intel Nehalem cache events
Change the Nehalem cache events to use retired memory instruction counters
(similar to Westmere), this greatly improves the provided stats.
Using:
main ()
{
int i;
for (i = 0; i <
1000000000; i++) {
asm("mov (%%rsp), %%rbx;"
"mov %%rbx, (%%rsp);" : : : "rbx");
}
}
We find:
$ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores
Performance counter stats for './loop_1b_loads+stores' (10 runs):
4,000,081,056 instructions:u # 0.000 IPC ( +- 0.000% )
4,999,502,846 l1-dcache-loads:u ( +- 0.008% )
1,000,034,832 l1-dcache-stores:u ( +- 0.000% )
1.
565184942 seconds time elapsed ( +- 0.005% )
The 5b is surprising - we'd expect 1b:
$ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores
Performance counter stats for './loop_1b_loads+stores' (10 runs):
4,000,081,054 instructions:u # 0.000 IPC ( +- 0.000% )
1,000,021,961 r10b:u ( +- 0.000% )
1,000,030,951 l1-dcache-stores:u ( +- 0.000% )
1.
565055422 seconds time elapsed ( +- 0.003% )
Which this patch thus fixes.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cyrill Gorcunov [Thu, 21 Apr 2011 15:03:21 +0000 (11:03 -0400)]
perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
It's not enough to simply disable event on overflow the
cpuc->active_mask should be cleared as well otherwise counter
may stall in "active" even in real being already disabled (which
potentially may lead to the situation that user may not use this
counter further).
Don pointed out that:
" I also noticed this patch fixed some unknown NMIs
on a P4 when I stressed the box".
Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 22 Apr 2011 06:44:38 +0000 (08:44 +0200)]
x86, perf event: Turn off unstructured raw event access to offcore registers
Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:
|
| The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
| user space bits were not. This made it impossible to set the extra mask
| and actually do the OFFCORE profiling
|
Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:
|
| Some raw events -- like the Intel OFFCORE events -- support additional
| parameters. These can be appended after a ':'.
|
| For example on a multi socket Intel Nehalem:
|
| perf stat -e r1b7:20ff -a sleep 1
|
| Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
| that measures any access to DRAM on another socket.
|
But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.
The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.
We already have such generalization in place for CPU cache events,
and it's all very extensible.
"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.
We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.
That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:
perf record -e dram ./myapp
perf record -e dram-remote ./myapp
... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.
These generalized events would work on all CPUs and architectures that
have comparable PMU features.
( Note, these are just examples: actual implementation could have more
sophistication and more parameter - as long as they center around
similarly simple usecases. )
Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:
e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere
But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.
( Note: after generalization has been implemented raw offcore events can be
supported as well: there can always be an odd event that is marginally
useful but not useful enough to generalize. DRAM profiling is definitely
*not* such a category so generalization must be done first. )
Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d23a0b commit, not mentioned in the changelog.
As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.
Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.
Reported-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andi Kleen [Thu, 21 Apr 2011 23:48:35 +0000 (16:48 -0700)]
perf: Support Xeon E7's via the Westmere PMU driver
There's a new model number public, 47, for Xeon E7 (aka Westmere EX).
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Thu, 21 Apr 2011 17:50:56 +0000 (10:50 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
block: don't propagate unlisted DISK_EVENTs to userland
elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too
Tejun Heo [Thu, 21 Apr 2011 17:43:59 +0000 (19:43 +0200)]
ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
check_events() implementations in both ide-gd and ide-cd are
inadequate for in-kernel event polling. Both generate media change
events continuously when certain conditions are met causing infinite
event loop between the driver and userland event handler.
As disk event now supports suppression of unlisted events, simply
de-listing DISK_EVENT_MEDIA_CHANGE from disk->events resolves the
problem. Internal handling around media revalidation will behave the
same while userland will fall back to userland event polling after
detecting the device doesn't support disk events.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Tejun Heo [Thu, 21 Apr 2011 17:43:58 +0000 (19:43 +0200)]
block: don't propagate unlisted DISK_EVENTs to userland
DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
internal event for revalidation of removeable devices. Some legacy
drivers don't implement proper event detection and continuously
generate events under certain circumstances. For example, ide-cd
generates media changed continuously if there's no media in the drive,
which can lead to infinite loop of events jumping back and forth
between the driver and userland event handler.
This patch updates disk event infrastructure such that it never
propagates events not listed in disk->events to userland. Those
events are processed the same for internal purposes but uevent
generation is suppressed.
This also ensures that userland only gets events which are advertised
in the @events sysfs node lowering risk of confusion.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Jens Axboe [Thu, 21 Apr 2011 17:28:35 +0000 (19:28 +0200)]
elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too
The sort insert is the one that goes to the IO scheduler. With
the SORT_MERGE addition, we could bypass IO scheduler setup
but still ask the IO scheduler to insert the request. This would
cause an oops on switching IO schedulers through the sysfs
interface, unless the disk just happened to be idle while it
occured.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Linus Torvalds [Thu, 21 Apr 2011 17:01:26 +0000 (10:01 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix duplicate message output
Linus Torvalds [Thu, 21 Apr 2011 17:01:03 +0000 (10:01 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
Revert "x86, NUMA: Fix fakenuma boot failure"
Randy Dunlap [Thu, 21 Apr 2011 16:07:26 +0000 (09:07 -0700)]
raid5: fix build error, sector_t usage
Change <sectors> from unsigned long long to sector_t.
This matches its source field.
ERROR: "__udivdi3" [drivers/md/raid456.ko] undefined!
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 21 Apr 2011 16:58:42 +0000 (09:58 -0700)]
Merge git://git./linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
virtio: console: Enable call to hvc_remove() on console port remove
virtio_pci: Prevent double-free of pci regions after device hot-unplug
virtio: Decrement avail idx on buffer detach
Linus Torvalds [Thu, 21 Apr 2011 16:57:56 +0000 (09:57 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
agp: fix arbitrary kernel memory writes
agp: fix OOM and buffer overflow
drm/radeon/kms: fix IH writeback on r6xx+ on big endian machines
Linus Torvalds [Thu, 21 Apr 2011 16:57:13 +0000 (09:57 -0700)]
Merge branch 'drm-intel-fixes' of git://git./linux/kernel/git/keithp/linux-2.6
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6:
drm/i915: Initialise g4x watermarks for disabled pipes
drm/i915: Sanitize the output registers after resume
drm/i915/tv: Fix modeset flickering introduced in
7f58aabc3
drm/i915/tv: Only poll for TV connections
drm/i915/tv: Remember the detected TV type
Linus Torvalds [Thu, 21 Apr 2011 16:56:35 +0000 (09:56 -0700)]
Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6:
intel_iommu: disable all VT-d PMRs when TXT launched
intel-iommu: Fix get_domain_for_dev() error path
intel-iommu: Unlink domain from iommu
intel-iommu: Fix use after release during device attach
Jan Kara [Wed, 20 Apr 2011 18:30:40 +0000 (20:30 +0200)]
vfs: Pass setxattr(2) flags properly
For some reason generic_setxattr() did not pass flags (XATTR_CREATE,
XATTR_REPLACE) to the filesystem specific helper. This caused that
setxattr(2) syscall just ignored these flags.
Fix the bug by passing flags correctly.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amit Shah [Mon, 14 Mar 2011 12:15:48 +0000 (17:45 +0530)]
virtio: console: Enable call to hvc_remove() on console port remove
This call was disabled as hot-unplugging one virtconsole port led to
another virtconsole port freezing.
Upon testing it again, this now works, so enable it.
In addition, a bug was found in qemu wherein removing a port of one type
caused the guest output from another port to stop working. I doubt it
was just this bug that caused it (since disabling the hvc_remove() call
did allow other ports to continue working), but since it's all solved
now, we're fine with hot-unplugging of virtconsole ports.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Amit Shah [Mon, 14 Mar 2011 12:15:02 +0000 (17:45 +0530)]
virtio_pci: Prevent double-free of pci regions after device hot-unplug
In the case where a virtio-console port is in use (opened by a program)
and a virtio-console device is removed, the port is kept around but all
the virtio-related state is assumed to be gone.
When the port is finally released (close() called), we call
device_destroy() on the port's device. This results in the parent
device's structures to be freed as well. This includes the PCI regions
for the virtio-console PCI device.
Once this is done, however, virtio_pci_release_dev() kicks in, as the
last ref to the virtio device is now gone, and attempts to do
pci_iounmap(pci_dev, vp_dev->ioaddr);
pci_release_regions(pci_dev);
pci_disable_device(pci_dev);
which results in a double-free warning.
Move the code that releases regions, etc., to the virtio_pci_remove()
function, and all that's now left in release_dev is the final freeing of
the vp_dev.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Amit Shah [Wed, 16 Mar 2011 13:42:10 +0000 (19:12 +0530)]
virtio: Decrement avail idx on buffer detach
When detaching a buffer from a vq, the avail.idx value should be
decremented as well.
This was noticed by hot-unplugging a virtio console port and then
plugging in a new one on the same number (re-using the vqs which were
just 'disowned'). qemu reported
'Guest moved used index from 0 to 256'
when any IO was attempted on the new port.
CC: stable@kernel.org
Reported-by: juzhang <juzhang@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Joseph Cihula [Mon, 21 Mar 2011 18:04:24 +0000 (11:04 -0700)]
intel_iommu: disable all VT-d PMRs when TXT launched
Intel VT-d Protected Memory Regions (PMRs) are supposed to be disabled,
on each VT-d engine, after DMA remapping is enabled on the engines.
This is because the behavior of having both enabled is not deterministic
and because, if TXT has been used to launch the kernel, the PMRs may be
programmed to cover memory regions that will be used for DMA.
Under some circumstances (certain quirks detected, lack of multiple
devices, etc.), the current code does not set up DMA remapping on some
VT-d engines. In such cases it also skips disabling the PMRs. This
causes failures when the kernel is launched with TXT (most often this
occurs on the graphics engine and results in colored vertical bars on
the display).
This patch detects when the kernel has been launched with TXT and then
disables the PMRs on all VT-d engines. In some cases where the reason
that remapping is not being enabled is due to possible ACPI DMAR table
errors, the VT-d engine addresses may not be correct and thus not able
to be safely programmed even to disable PMRs. Because part of the TXT
launch process is the verification of these addresses, it will always be
safe to disable PMRs if the TXT launch has succeeded and hence only
doing this in such cases.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
David Rientjes [Thu, 21 Apr 2011 02:19:13 +0000 (19:19 -0700)]
x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y
when NUMA emulation is enabled is currently broken because it does
not iterate through every emulated node and bind cpus that have
affinity to it.
NUMA emulation should bind each cpu to every local node to
accurately represent the true NUMA topology of the underlying
machine.
debug_cpumask_set_cpu() needs to be fixed at the same time so
that the debugging information that it emits shows the new
cpumask of the node being assigned when the cpu is being added
or removed.
It can now take responsibility of setting or clearing the cpu
itself to remove the need for duplicate code.
Also change its last parameter, "enable", to have the correct bool
type since it can only be true or false.
-v2: Fix the return statements, by Kosaki Motohiro
Acked-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
David Rientjes [Thu, 21 Apr 2011 02:19:10 +0000 (19:19 -0700)]
Revert "x86, NUMA: Fix fakenuma boot failure"
Andreas Herrmann reported that
7d6b46707f24 ("x86, NUMA: Fix fakenuma
boot failure") causes certain physical NUMA topologies (for example
AMD Magny-Cours) to move sibling cpus to a single node when in reality
they are in separate domains.
This may result in some nodes being completely void of cpus, which
doesn't accurately represent the correct topology. The system will
boot, but will have suboptimal NUMA performance.
This commit was intended as a fix for NUMA emulation, but should
not cause a regression for real NUMA machines as a side effect.
( There will be a separate fix for the numa-debug code, which
will not affect physical topologies. )
Reported-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918110.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Vasiliy Kulikov [Thu, 14 Apr 2011 16:55:16 +0000 (20:55 +0400)]
agp: fix arbitrary kernel memory writes
pg_start is copied from userspace on AGPIOC_BIND and AGPIOC_UNBIND ioctl
cmds of agp_ioctl() and passed to agpioc_bind_wrap(). As said in the
comment, (pg_start + mem->page_count) may wrap in case of AGPIOC_BIND,
and it is not checked at all in case of AGPIOC_UNBIND. As a result, user
with sufficient privileges (usually "video" group) may generate either
local DoS or privilege escalation.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Vasiliy Kulikov [Thu, 14 Apr 2011 16:55:19 +0000 (20:55 +0400)]
agp: fix OOM and buffer overflow
page_count is copied from userspace. agp_allocate_memory() tries to
check whether this number is too big, but doesn't take into account the
wrap case. Also agp_create_user_memory() doesn't check whether
alloc_size is calculated from num_agp_pages variable without overflow.
This may lead to allocation of too small buffer with following buffer
overflow.
Another problem in agp code is not addressed in the patch - kernel memory
exhaustion (AGPIOC_RESERVE and AGPIOC_ALLOCATE ioctls). It is not checked
whether requested pid is a pid of the caller (no check in agpioc_reserve_wrap()).
Each allocation is limited to 16KB, though, there is no per-process limit.
This might lead to OOM situation, which is not even solved in case of the
caller death by OOM killer - the memory is allocated for another (faked) process.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Thu, 21 Apr 2011 01:18:19 +0000 (18:18 -0700)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
hwmon: (max34440) Add driver documentation
hwmon: (max16064) Add driver documentation
hwmon: (max8688) Add driver documentation
hwmon: (pmbus) Documentation updates
hwmon: (smm665) Fix spelling error in driver documentation
hwmon: (pmbus) Removed unused variable from struct pmbus_data
hwmon: Add submitting-patches checklist to documentation
Linus Torvalds [Thu, 21 Apr 2011 00:41:06 +0000 (17:41 -0700)]
Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.39' of git://linux-nfs.org/~bfields/linux:
Open with O_CREAT flag set fails to open existing files on non writable directories
nfsd4: Fix filp leak
nfsd4: fix struct file leak on delegation
Linus Torvalds [Thu, 21 Apr 2011 00:40:45 +0000 (17:40 -0700)]
Merge branch 'fixes' of /home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: 6881/1: cputype.h uses __attribute_const__ which requires including kernel.h
ARM: Add new syscalls
Linus Torvalds [Thu, 21 Apr 2011 00:40:25 +0000 (17:40 -0700)]
Merge branch 'stable/bug-fixes-rc4' of git://git./linux/kernel/git/konrad/xen
* 'stable/bug-fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32
xen: do not create the extra e820 region at an addr lower than 4G
Linus Torvalds [Thu, 21 Apr 2011 00:40:02 +0000 (17:40 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: Update documentation for sync_min and sync_max entries
md: Cleanup after raid45->raid0 takeover
md: Fix dev_sectors on takeover from raid0 to raid4/5
md/raid5: remove setting of ->queue_lock
Linus Torvalds [Wed, 20 Apr 2011 16:48:52 +0000 (09:48 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: Remove the extra check in queue_requests_store
block, blk-sysfs: Fix an err return path in blk_register_queue()
block: remove stale kerneldoc member from __blk_run_queue()
block: get rid of QUEUE_FLAG_REENTER
cfq-iosched: read_lock() does not always imply rcu_read_lock()
block: kill blk_flush_plug_list() export
Dave Chinner [Tue, 12 Apr 2011 13:34:20 +0000 (13:34 +0000)]
xfs: fix duplicate message output
Commit
957935dc ("xfs: fix xfs_debug warnings" broke the logic in
__xfs_printk(). Instead of only printing one of two possible output
strings based on whether the fs has a name or not, it outputs both.
Fix it to only output one message again.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Sachin Prabhu [Wed, 20 Apr 2011 12:09:35 +0000 (13:09 +0100)]
Open with O_CREAT flag set fails to open existing files on non writable directories
An open on a NFS4 share using the O_CREAT flag on an existing file for
which we have permissions to open but contained in a directory with no
write permissions will fail with EACCES.
A tcpdump shows that the client had set the open mode to UNCHECKED which
indicates that the file should be created if it doesn't exist and
encountering an existing flag is not an error. Since in this case the
file exists and can be opened by the user, the NFS server is wrong in
attempting to check create permissions on the parent directory.
The patch adds a conditional statement to check for create permissions
only if the file doesn't exist.
Signed-off-by: Sachin S. Prabhu <sprabhu@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Stefano Stabellini [Tue, 19 Apr 2011 13:47:31 +0000 (14:47 +0100)]
xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32
The two "is_early_ioremap_ptep" checks in mask_rw_pte are only used on
x86_64, in fact early_ioremap is not used at all to setup the initial
pagetable on x86_32.
Moreover on x86_32 the two checks are wrong because the range
pgt_buf_start..pgt_buf_end initially should be mapped RW because
the pages in the range are not pagetable pages yet and haven't been
cleared yet. Afterwards considering the pgt_buf_start..pgt_buf_end is
part of the initial mapping, xen_alloc_pte is capable of turning
the ptes RO when they become pagetable pages.
Fix the issue and improve the readability of the code providing two
different implementation of mask_rw_pte for x86_32 and x86_64.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Stefano Stabellini [Tue, 12 Apr 2011 11:19:52 +0000 (12:19 +0100)]
xen: do not create the extra e820 region at an addr lower than 4G
Do not add the extra e820 region at a physical address lower than 4G
because it breaks e820_end_of_low_ram_pfn().
It is OK for us to move the xen_extra_mem_start up and down because this
is the index of the memory that can be ballooned in/out - it is memory
not available to the kernel during bootup.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CoolCold [Wed, 20 Apr 2011 05:40:01 +0000 (15:40 +1000)]
md: Update documentation for sync_min and sync_max entries
linux/Documentation/md.txt is missing description for sync_min and
sync_max entries.
This patch adds description for sync_min and sync_max entries.
Signed-off-by: Roman Ovchinnikov <coolthecold@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Krzysztof Wojcik [Wed, 20 Apr 2011 05:39:53 +0000 (15:39 +1000)]
md: Cleanup after raid45->raid0 takeover
Problem:
After raid4->raid0 takeover operation, another takeover operation
(e.g raid0->raid10) results "kernel oops".
Root cause:
Variables 'degraded' in mddev structure is not cleared
on raid45->raid0 takeover.
This patch reset this variable.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 20 Apr 2011 05:38:18 +0000 (15:38 +1000)]
md: Fix dev_sectors on takeover from raid0 to raid4/5
A raid0 array doesn't set 'dev_sectors' as each device might
contribute a different number of sectors.
So when converting to a RAID4 or RAID5 we need to set dev_sectors
as they need the number.
We have already verified that in fact all devices do contribute
the same number of sectors, so use that number.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 20 Apr 2011 05:38:07 +0000 (15:38 +1000)]
md/raid5: remove setting of ->queue_lock
We previously needed to set ->queue_lock to match the raid5
device_lock so we could safely use queue_flag_* operations (e.g. for
plugging). which test the ->queue_lock is in fact locked.
However that need has completely gone away and is unlikely to come
back to remove this now-pointless setting.
Signed-off-by: NeilBrown <neilb@suse.de>
Linus Torvalds [Wed, 20 Apr 2011 01:32:57 +0000 (18:32 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: pll tweaks for r7xx
drm/nouveau: fix allocation of notifier object
drm/nouveau: fix notifier memory corruption bug
drm/nouveau: fix pinning of notifier block
drm/nouveau: populate ttm_alloced with false, when it's not
drm/nouveau: fix nv30 pcie boards
drm/nouveau: split ramin_lock into two locks, one hardirq safe
drm/radeon/kms: adjust evergreen display watermark setup
drm/radeon/kms: add connectors even if i2c fails
drm/radeon/kms: fix bad shift in atom iio table parser
Cédric Cano [Tue, 19 Apr 2011 15:07:13 +0000 (11:07 -0400)]
drm/radeon/kms: fix IH writeback on r6xx+ on big endian machines
agd5f: fix commit message.
Signed-off-by: Cedric Cano <ccano@interfaceconcept.com>
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Tue, 19 Apr 2011 19:24:59 +0000 (15:24 -0400)]
drm/radeon/kms: pll tweaks for r7xx
Prefer min m to max p only on pre-r7xx asics.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=36197
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 19 Apr 2011 23:21:34 +0000 (09:21 +1000)]
Merge remote branch 'nouveau/drm-nouveau-fixes' of /ssd/git/drm-nouveau-next into drm-fixes
* 'nouveau/drm-nouveau-fixes' of /ssd/git/drm-nouveau-next:
drm/nouveau: fix allocation of notifier object
drm/nouveau: fix notifier memory corruption bug
drm/nouveau: fix pinning of notifier block
drm/nouveau: populate ttm_alloced with false, when it's not
drm/nouveau: fix nv30 pcie boards
drm/nouveau: split ramin_lock into two locks, one hardirq safe
Marcin Slusarz [Tue, 19 Apr 2011 21:52:42 +0000 (23:52 +0200)]
drm/nouveau: fix allocation of notifier object
Commit
73412c3854c877e5f37ad944ee8977addde4d35a ("drm/nouveau: allocate
kernel's notifier object at end of block") intended to align end of
notifier block to page boundary, but start of block was miscalculated
to be off by -16 bytes. Fix it.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Marcin Slusarz [Tue, 19 Apr 2011 21:50:48 +0000 (23:50 +0200)]
drm/nouveau: fix notifier memory corruption bug
nouveau_bo_wr32 expects offset to be in words, but we pass value in bytes,
so after commit
73412c3854c877e5f37ad944ee8977addde4d35a ("drm/nouveau: allocate
kernel's notifier object at end of block") we started to overwrite some memory
after notifier buffer object (previously m2mf_ntfy was always 0, so it didn't
matter it was a value in bytes).
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reported-by: Nigel Cunningham <lkml@nigelcunningham.com.au>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Pekka Paalanen <pq@iki.fi>
Cc: stable@kernel.org [2.6.38]
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sun, 17 Apr 2011 23:12:25 +0000 (09:12 +1000)]
drm/nouveau: fix pinning of notifier block
Problem introduced with commit
6ba9a68317781537d6184d3fdb2d0f20c97da3a4
Reported-by: Bob Gleitsmann <rjgleits@bellsouth.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Mon, 11 Apr 2011 06:37:44 +0000 (16:37 +1000)]
drm/nouveau: populate ttm_alloced with false, when it's not
Caught with kmemcheck on unrelated business.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Fri, 8 Apr 2011 00:07:34 +0000 (10:07 +1000)]
drm/nouveau: fix nv30 pcie boards
Wasn't aware they even existed, apparently they do! They're actually
AGP chips with a bridge as far as I can tell, which puts them in the
same boat as nv40/nv45.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Wed, 6 Apr 2011 03:28:35 +0000 (13:28 +1000)]
drm/nouveau: split ramin_lock into two locks, one hardirq safe
Fixes a possible lock ordering reversal between context_switch_lock
and ramin_lock.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Linus Torvalds [Tue, 19 Apr 2011 22:16:41 +0000 (15:16 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (51 commits)
netfilter: ipset: Fix the order of listing of sets
ip6_pol_route panic: Do not allow VLAN on loopback
bnx2x: Fix port identification problem
r8169: add Realtek as maintainer.
ip: ip_options_compile() resilient to NULL skb route
bna: fix memory leak during RX path cleanup
bna: fix for clean fw re-initialization
usbnet: Fix up 'FLAG_POINTTOPOINT' and 'FLAG_MULTI_PACKET' overlaps.
iwlegacy: fix tx_power initialization
Revert "tcp: disallow bind() to reuse addr/port"
qlcnic: limit skb frags for non tso packet
net: can: mscan: fix build breakage in mpc5xxx_can
netfilter: ipset: set match and SET target fixes
netfilter: ipset: bitmap:ip,mac type requires "src" for MAC
sctp: fix oops while removed transport still using as retran path
sctp: fix oops when updating retransmit path with DEBUG on
net: Disable NETIF_F_TSO_ECN when TSO is disabled
net: Disable all TSO features when SG is disabled
sfc: Use rmb() to ensure reads occur in order
ieee802154: Remove hacked CFLAGS in net/ieee802154/Makefile
...
OGAWA Hirofumi [Mon, 18 Apr 2011 15:48:55 +0000 (11:48 -0400)]
nfsd4: Fix filp leak
23fcf2ec93fb8573a653408316af599939ff9a8e (nfsd4: fix oops on lock failure)
The above patch breaks free path for stp->st_file. If stp was inserted
into sop->so_stateids, we have to free stp->st_file refcount. Because
stp->st_file refcount itself is taken whether or not any refcounts are
taken on the stp->st_file->fi_fds[].
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Linus Torvalds [Tue, 19 Apr 2011 19:46:32 +0000 (12:46 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: pci-label: Fix build failure when CONFIG_NLS is set to 'm' by allmodconfig
David S. Miller [Tue, 19 Apr 2011 18:28:35 +0000 (11:28 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/kaber/nf-2.6
Linus Torvalds [Tue, 19 Apr 2011 17:58:13 +0000 (10:58 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, gart: Make sure GART does not map physmem above 1TB
x86, gart: Set DISTLBWALKPRB bit always
x86, gart: Convert spaces to tabs in enable_gart_translation
Linus Torvalds [Tue, 19 Apr 2011 17:56:46 +0000 (10:56 -0700)]
Merge branch 'timer-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
RTC: rtc-omap: Fix a leak of the IRQ during init failure
posix clocks: Replace mutex with reader/writer semaphore
Linus Torvalds [Tue, 19 Apr 2011 17:56:02 +0000 (10:56 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf, x86: Fix AMD family 15h FPU event constraints
perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus
perf evsel: Fix use of inherit
perf hists browser: Fix seg fault when annotate null symbol
Linus Torvalds [Tue, 19 Apr 2011 17:54:44 +0000 (10:54 -0700)]
Revert "[media] V4L: videobuf, don't use dma addr as physical"
This reverts commit
35d9f510b67b10338161aba6229d4f55b4000f5b.
Quoth Jiri Slaby:
"It fixes mmap when IOMMU is used on x86 only, but breaks architectures
like ARM or PPC where virt_to_phys(dma_alloc_coherent) doesn't work.
We need there dma_mmap_coherent or similar (the trickery what
snd_pcm_default_mmap does but in some saner way). But this cannot be
done at this phase."
Requested-by: Jiri Slaby <jslaby@suse.cz>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 19 Apr 2011 17:52:51 +0000 (10:52 -0700)]
Merge git://git./linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: filesystem hang caused by incorrect lock order
GFS2: Don't try to deallocate unlinked inodes when mounted ro
GFS2: directly write blocks past i_size
GFS2: write_end error path fails to unlock transaction lock
Guenter Roeck [Mon, 18 Apr 2011 16:55:59 +0000 (09:55 -0700)]
hwmon: (max34440) Add driver documentation
MAX34440 and MAX34441 have their own driver, thus there should be explicit
documentation instead of mentioning the chips in the generic PMBus driver
documentation.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Reviewed-by: Tom Grennan <tom.grennan@ericsson.com>
Guenter Roeck [Mon, 18 Apr 2011 16:53:54 +0000 (09:53 -0700)]
hwmon: (max16064) Add driver documentation
MAX16064 has its own driver, thus should have its own documentation instead of
being mentioned in the generic PMBus driver documentation.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Reviewed-by: Tom Grennan <tom.grennan@ericsson.com>
Guenter Roeck [Mon, 18 Apr 2011 16:51:04 +0000 (09:51 -0700)]
hwmon: (max8688) Add driver documentation
MAX8688 has its own driver, thus should have its own documentation instead of
being mentioned in the generic PMBus driver documentation.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Reviewed-by: Tom Grennan <tom.grennan@ericsson.com>
Guenter Roeck [Mon, 18 Apr 2011 16:48:58 +0000 (09:48 -0700)]
hwmon: (pmbus) Documentation updates
Fix spelling, correct label name error, and add missing attribute to PMBus
driver documentation.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Reviewed-by: Tom Grennan <tom.grennan@ericsson.com>
Guenter Roeck [Mon, 18 Apr 2011 16:43:22 +0000 (09:43 -0700)]
hwmon: (smm665) Fix spelling error in driver documentation
tempererature may sound interesting, but temperature is still preferred.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Tue, 15 Mar 2011 00:54:25 +0000 (17:54 -0700)]
hwmon: (pmbus) Removed unused variable from struct pmbus_data
struct pmbus_data included an unused variable named status_bits.
Remove it.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Reviewed-by: Tom Grennan <tom.grennan@ericsson.com>
Guenter Roeck [Sat, 2 Apr 2011 15:26:34 +0000 (08:26 -0700)]
hwmon: Add submitting-patches checklist to documentation
When writing hardware monitoring drivers, there are some common pitfalls which
keep coming up in code reviews. This patch provides a document describing all
those pitfalls and how to avoid them.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Jozsef Kadlecsik [Tue, 19 Apr 2011 13:59:15 +0000 (15:59 +0200)]
netfilter: ipset: Fix the order of listing of sets
A restoreable saving of sets requires that list:set type of sets
come last and the code part which should have taken into account
the ordering was broken. The patch fixes the listing order.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Tao Ma [Tue, 19 Apr 2011 11:50:40 +0000 (13:50 +0200)]
block: Remove the extra check in queue_requests_store
In queue_requests_store, the code looks like
if (rl->count[BLK_RW_SYNC] >= q->nr_requests) {
blk_set_queue_full(q, BLK_RW_SYNC);
} else if (rl->count[BLK_RW_SYNC]+1 <= q->nr_requests) {
blk_clear_queue_full(q, BLK_RW_SYNC);
wake_up(&rl->wait[BLK_RW_SYNC]);
}
If we don't satify the situation of "if", we can get that
rl->count[BLK_RW_SYNC} < q->nr_quests. It is the same as
rl->count[BLK_RW_SYNC]+1 <= q->nr_requests.
All the "else" should satisfy the "else if" check so it isn't
needed actually.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Liu Yuan [Tue, 19 Apr 2011 11:47:58 +0000 (13:47 +0200)]
block, blk-sysfs: Fix an err return path in blk_register_queue()
We do not call blk_trace_remove_sysfs() in err return path
if kobject_add() fails. This path fixes it.
Cc: stable@kernel.org
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Jens Axboe [Tue, 19 Apr 2011 11:34:14 +0000 (13:34 +0200)]
block: remove stale kerneldoc member from __blk_run_queue()
We don't pass in a 'force_kblockd' anymore, get rid of the
stsale comment.
Reported-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Jens Axboe [Tue, 19 Apr 2011 11:32:46 +0000 (13:32 +0200)]
block: get rid of QUEUE_FLAG_REENTER
We are currently using this flag to check whether it's safe
to call into ->request_fn(). If it is set, we punt to kblockd.
But we get a lot of false positives and excessive punts to
kblockd, which hurts performance.
The only real abuser of this infrastructure is SCSI. So export
the async queue run and convert SCSI over to use that. There's
room for improvement in that SCSI need not always use the async
call, but this fixes our performance issue and they can fix that
up in due time.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Robert Richter [Sat, 16 Apr 2011 00:27:54 +0000 (02:27 +0200)]
perf, x86: Fix AMD family 15h FPU event constraints
Depending on the unit mask settings some FPU events may be scheduled
only on cpu counter #3. This patch fixes this.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@googlemail.com>
Link: http://lkml.kernel.org/r/1302913676-14352-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andre Przywara [Sat, 16 Apr 2011 00:27:53 +0000 (02:27 +0200)]
perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus
With AMD cpu family 15h a unit mask was introduced for the Data Cache
Miss event (0x041/L1-dcache-load-misses). We need to enable bit 0
(first data cache miss or streaming store to a 64 B cache line) of
this mask to proper count data cache misses.
Now we set this bit for all families and models. In case a PMU does
not implement a unit mask for event 0x041 the bit is ignored.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1302913676-14352-2-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jens Axboe [Tue, 19 Apr 2011 07:10:35 +0000 (09:10 +0200)]
cfq-iosched: read_lock() does not always imply rcu_read_lock()
For some configurations of CONFIG_PREEMPT that is not true. So
get rid of __call_for_each_cic() and always uses the explicitly
rcu_read_lock() protected call_for_each_cic() instead.
This fixes a potential bug related to IO scheduler removal or
online switching.
Thanks to Paul McKenney for clarifying this.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Linus Torvalds [Tue, 19 Apr 2011 04:26:00 +0000 (21:26 -0700)]
Linux 2.6.39-rc4
Linus Torvalds [Mon, 18 Apr 2011 22:44:29 +0000 (15:44 -0700)]
Merge branch 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm
* 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm:
msm: timer: fix missing return value
msm: Remove extraneous ffa device check
Linus Torvalds [Mon, 18 Apr 2011 20:29:03 +0000 (13:29 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xen-kbdfront - fix mouse getting stuck after save/restore
Input: estimate number of events per packet
Input: evdev - indicate buffer overrun with SYN_DROPPED
Input: document event types and codes and their intended use
Input: add KEY_IMAGES specifically for AL Image Browser
Input: twl4030_keypad - fix potential NULL dereference in twl4030_kp_probe()
Input: h3600_ts - fix error handling at connect
Input: twl4030_keypad - avoid potential NULL-pointer dereference
Linus Torvalds [Mon, 18 Apr 2011 20:21:18 +0000 (13:21 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: add blk_run_queue_async
block: blk_delay_queue() should use kblockd workqueue
md: fix up raid1/raid10 unplugging.
md: incorporate new plugging into raid5.
md: provide generic support for handling unplug callbacks.
md - remove old plugging code.
md/dm - remove remains of plug_fn callback.
md: use new plugging interface for RAID IO.
block: drop queue lock before calling __blk_run_queue() for kblockd punt
Revert "block: add callback function for unplug notification"
block: Enhance new plugging support to support general callbacks
Jens Axboe [Mon, 18 Apr 2011 20:06:57 +0000 (22:06 +0200)]
block: kill blk_flush_plug_list() export
With all drivers and file systems converted, we only have
in-core use of this function. So remove the export.
Reporteed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Linus Torvalds [Mon, 18 Apr 2011 19:24:24 +0000 (12:24 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/powermac: Build fix with SMP and CPU hotplug
powerpc/perf_event: Skip updating kernel counters if register value shrinks
powerpc: Don't write protect kernel text with CONFIG_DYNAMIC_FTRACE enabled
powerpc: Fix oops if scan_dispatch_log is called too early
powerpc/pseries: Use a kmem cache for DTL buffers
powerpc/kexec: Fix regression causing compile failure on UP
powerpc/85xx: disable Suspend support if SMP enabled
powerpc/e500mc: Remove CPU_FTR_MAYBE_CAN_NAP/CPU_FTR_MAYBE_CAN_DOZE
powerpc/book3e: Fix CPU feature handling on 64-bit e5500
powerpc: Check device status before adding serial device
powerpc/85xx: Don't add disabled PCIe devices
Linus Torvalds [Mon, 18 Apr 2011 19:24:05 +0000 (12:24 -0700)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
Btrfs: fix free space cache leak
Btrfs: avoid taking the chunk_mutex in do_chunk_alloc
Btrfs end_bio_extent_readpage should look for locked bits
Btrfs: don't force chunk allocation in find_free_extent
Btrfs: Check validity before setting an acl
Btrfs: Fix incorrect inode nlink in btrfs_link()
Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()
Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()
Btrfs: make uncache_state unconditional
btrfs: using cached extent_state in set/unlock combinations
Btrfs: avoid taking the trans_mutex in btrfs_end_transaction
Btrfs: fix subvolume mount by name problem when default mount subvolume is set
fix user annotation in ioctl.c
Btrfs: check for duplicate iov_base's when doing dio reads
btrfs: properly handle overlapping areas in memmove_extent_buffer
Btrfs: fix memory leaks in btrfs_new_inode()
Btrfs: check for duplicate iov_base's when doing dio reads
Btrfs: reuse the extent_map we found when calling btrfs_get_extent
Btrfs: do not use async submit for small DIO io's
Btrfs: don't split dio bios if we don't have to
...
Linus Torvalds [Mon, 18 Apr 2011 17:36:54 +0000 (10:36 -0700)]
proc: do proper range check on readdir offset
Rather than pass in some random truncated offset to the pid-related
functions, check that the offset is in range up-front.
This is just cleanup, the previous commit fixed the real problem.
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 18 Apr 2011 17:35:30 +0000 (10:35 -0700)]
next_pidmap: fix overflow condition
next_pidmap() just quietly accepted whatever 'last' pid that was passed
in, which is not all that safe when one of the users is /proc.
Admittedly the proc code should do some sanity checking on the range
(and that will be the next commit), but that doesn't mean that the
helper functions should just do that pidmap pointer arithmetic without
checking the range of its arguments.
So clamp 'last' to PID_MAX_LIMIT. The fact that we then do "last+1"
doesn't really matter, the for-loop does check against the end of the
pidmap array properly (it's only the actual pointer arithmetic overflow
case we need to worry about, and going one bit beyond isn't going to
overflow).
[ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]
Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Analyzed-by: Robert Święcki <robert@swiecki.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 15 Apr 2011 22:08:26 +0000 (18:08 -0400)]
nfsd4: fix struct file leak on delegation
Introduced by
acfdf5c383b38f7f4dddae41b97c97f1ae058f49.
Cc: stable@kernel.org
Reported-by: Gerhard Heift <ml-nfs-linux-20110412-ef47@gheift.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Igor Mammedov [Mon, 18 Apr 2011 17:17:17 +0000 (10:17 -0700)]
Input: xen-kbdfront - fix mouse getting stuck after save/restore
Mouse gets "stuck" after restore of PV guest but buttons are in working
condition.
If driver has been configured for ABS coordinates at start it will get
XENKBD_TYPE_POS events and then suddenly after restore it'll start getting
XENKBD_TYPE_MOTION events, that will be dropped later and they won't get
into user-space.
Regression was introduced by hunk 5 and 6 of
5ea5254aa0ad269cfbd2875c973ef25ab5b5e9db
("Input: xen-kbdfront - advertise either absolute or relative
coordinates").
Driver on restore should ask xen for request-abs-pointer again if it is
available. So restore parts that did it before
5ea5254.
Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[v1: Expanded the commit description]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Jeff Brown [Mon, 18 Apr 2011 17:08:02 +0000 (10:08 -0700)]
Input: estimate number of events per packet
Calculate a default based on the number of ABS axes, REL axes,
and MT slots for the device during input device registration.
Signed-off-by: Jeff Brown <jeffbrown@android.com>
Reviewed-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Joerg Roedel [Mon, 18 Apr 2011 13:45:46 +0000 (15:45 +0200)]
x86, gart: Make sure GART does not map physmem above 1TB
The GART can only map physical memory below 1TB. Make sure
the gart driver in the kernel does not try to map memory
above 1TB.
Cc: <stable@kernel.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1303134346-5805-5-git-send-email-joerg.roedel@amd.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Joerg Roedel [Mon, 18 Apr 2011 13:45:45 +0000 (15:45 +0200)]
x86, gart: Set DISTLBWALKPRB bit always
The DISTLBWALKPRB bit must be set for the GART because the
gatt table is mapped UC. But the current code does not set
the bit at boot when the BIOS setup the aperture correctly.
Fix that by setting this bit when enabling the GART instead
of the other places.
Cc: <stable@kernel.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1303134346-5805-4-git-send-email-joerg.roedel@amd.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Joerg Roedel [Mon, 18 Apr 2011 13:45:44 +0000 (15:45 +0200)]
x86, gart: Convert spaces to tabs in enable_gart_translation
Probably by copy&paste this function was indented by spaces.
Convert this to tabs.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1303134346-5805-3-git-send-email-joerg.roedel@amd.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Bob Peterson [Thu, 17 Mar 2011 20:19:58 +0000 (16:19 -0400)]
GFS2: filesystem hang caused by incorrect lock order
This patch fixes a deadlock in GFS2 where two processes are trying
to reclaim an unlinked dinode:
One holds the inode glock and calls gfs2_lookup_by_inum trying to look
up the inode, which it can't, due to I_FREEING. The other has set
I_FREEING from vfs and is at the beginning of gfs2_delete_inode
waiting for the glock, which is held by the first. The solution is to
add a new non_block parameter to the gfs2_iget function that causes it
to return -ENOENT if the inode is being freed.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Steven Whitehouse [Wed, 30 Mar 2011 13:17:51 +0000 (14:17 +0100)]
GFS2: Don't try to deallocate unlinked inodes when mounted ro
This adds a couple of missing tests to avoid read-only nodes
from attempting to deallocate unlinked inodes.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Michel Andre de la Porte <madelaporte@ubi.com>
Benjamin Marzinski [Fri, 18 Mar 2011 02:54:46 +0000 (21:54 -0500)]
GFS2: directly write blocks past i_size
GFS2 was relying on the writepage code to write out the zeroed data for
fallocate. However, with FALLOC_FL_KEEP_SIZE set, this may be past i_size.
If it is, it will be ignored. To work around this, gfs2 now calls
write_dirty_buffer directly on the buffer_heads when FALLOC_FL_KEEP_SIZE
is set, and it's writing past i_size.
This version is just a cleanup of my last version
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 16 Mar 2011 20:32:39 +0000 (16:32 -0400)]
GFS2: write_end error path fails to unlock transaction lock
I did an audit of gfs2's transaction glock for bugzilla bug
658619 and ran across this:
In function gfs2_write_end, in the unlikely event that
gfs2_meta_inode_buffer returns an error, the code may forget
to unlock the transaction lock because the "failed" label
appears after the call to function gfs2_trans_end.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Chris Mason [Mon, 18 Apr 2011 12:55:34 +0000 (08:55 -0400)]
Btrfs: fix free space cache leak
The free space caching code was recently reworked to
cache all the pages it needed instead of using find_get_page everywhere.
One loop was missed though, so it ended up leaking pages. This fixes
it to use our page array instead of find_get_page.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Christoph Hellwig [Mon, 18 Apr 2011 09:41:33 +0000 (11:41 +0200)]
block: add blk_run_queue_async
Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly. I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Jens Axboe [Mon, 18 Apr 2011 09:36:39 +0000 (11:36 +0200)]
block: blk_delay_queue() should use kblockd workqueue
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Axel Lin [Sun, 17 Apr 2011 02:02:58 +0000 (10:02 +0800)]
RTC: rtc-omap: Fix a leak of the IRQ during init failure
In omap_rtc_probe error path, free_irq() was using NULL rather than the
driver data as the data pointer so free_irq() wouldn't have matched.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: "George G. Davis" <gdavis@mvista.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: rtc-linux@googlegroups.com
Link: http://lkml.kernel.org/r/%3C1303005778.2889.2.camel%40phoenix%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Richard Cochran [Wed, 30 Mar 2011 13:24:21 +0000 (15:24 +0200)]
posix clocks: Replace mutex with reader/writer semaphore
A dynamic posix clock is protected from asynchronous removal by a mutex.
However, using a mutex has the unwanted effect that a long running clock
operation in one process will unnecessarily block other processes.
For example, one process might call read() to get an external time stamp
coming in at one pulse per second. A second process calling clock_gettime
would have to wait for almost a whole second.
This patch fixes the issue by using a reader/writer semaphore instead of
a mutex.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/%3C20110330132421.GA31771%40riccoc20.at.omicron.at%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
NeilBrown [Mon, 18 Apr 2011 08:25:43 +0000 (18:25 +1000)]
md: fix up raid1/raid10 unplugging.
We just need to make sure that an unplug event wakes up the md
thread, which is exactly what mddev_check_plugged does.
Also remove some plug-related code that is no longer needed.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 18 Apr 2011 08:25:43 +0000 (18:25 +1000)]
md: incorporate new plugging into raid5.
In raid5 plugging is used for 2 things:
1/ collecting writes that require a bitmap update
2/ collecting writes in the hope that we can create full
stripes - or at least more-full.
We now release these different sets of stripes when plug_cnt
is zero.
Also in make_request, we call mddev_check_plug to hopefully increase
plug_cnt, and wake up the thread at the end if plugging wasn't
achieved for some reason.
Signed-off-by: NeilBrown <neilb@suse.de>