Ingo Molnar [Sat, 28 Feb 2009 09:15:58 +0000 (10:15 +0100)]
Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
Steven Rostedt [Sat, 28 Feb 2009 07:54:39 +0000 (02:54 -0500)]
tracing: create the C style tracing for the irq subsystem
This patch utilizes the TRACE_EVENT_FORMAT macro to enable the C style
faster tracing for the irq subsystem trace points.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Sat, 28 Feb 2009 07:47:59 +0000 (02:47 -0500)]
tracing: create the C style tracing for the sched subsystem
This patch utilizes the TRACE_EVENT_FORMAT macro to enable the C style
faster tracing for the sched subsystem trace points.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Sat, 28 Feb 2009 07:41:25 +0000 (02:41 -0500)]
tracing: add raw fast tracing interface for trace events
This patch adds the interface to enable the C style trace points.
In the directory /debugfs/tracing/events/subsystem/event
We now have three files:
enable : values 0 or 1 to enable or disable the trace event.
available_types: values 'raw' and 'printf' which indicate the tracing
types available for the trace point. If a developer does not
use the TRACE_EVENT_FORMAT macro and just uses the TRACE_FORMAT
macro, then only 'printf' will be available. This file is
read only.
type: values 'raw' or 'printf'. This indicates which type of tracing
is active for that trace point. 'printf' is the default and
if 'raw' is not available, this file is read only.
# echo raw > /debug/tracing/events/sched/sched_wakeup/type
# echo 1 > /debug/tracing/events/sched/sched_wakeup/enable
Will enable the C style tracing for the sched_wakeup trace point.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Sat, 28 Feb 2009 00:12:30 +0000 (19:12 -0500)]
tracing: add raw trace point recording infrastructure
Impact: lower overhead tracing
The current event tracer can automatically pick up trace points
that are registered with the TRACE_FORMAT macro. But it required
a printf format string and parsing. Although, this adds the ability
to get guaranteed information like task names and such, it took
a hit in overhead processing. This processing can add about 500-1000
nanoseconds overhead, but in some cases that too is considered
too much and we want to shave off as much from this overhead as
possible.
Tom Zanussi recently posted tracing patches to lkml that are based
on a nice idea about capturing the data via C structs using
STRUCT_ENTER, STRUCT_EXIT type of macros.
I liked that method very much, but did not like the implementation
that required a developer to add data/code in several disjoint
locations.
This patch extends the event_tracer macros to do a similar "raw C"
approach that Tom Zanussi did. But instead of having the developers
needing to tweak a bunch of code all over the place, they can do it
all in one macro - preferably placed near the code that it is
tracing. That makes it much more likely that tracepoints will be
maintained on an ongoing basis by the code they modify.
The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
a developer may still utilize the more low level DECLARE_TRACE macros
if they don't care about getting their traces automatically in the event
tracer.)
They can also use the existing TRACE_FORMAT if they don't need to code
the tracepoint in C, but just want to use the convenience of printf.
So if the developer wants to "hardwire" a tracepoint in the fastest
possible way, and wants to acquire their data via a user space utility
in a raw binary format, or wants to see it in the trace output but not
sacrifice any performance, then they can implement the faster but
more complex TRACE_EVENT_FORMAT macro.
Here's what usage looks like:
TRACE_EVENT_FORMAT(name,
TPPROTO(proto),
TPARGS(args),
TPFMT(fmt, fmt_args),
TRACE_STUCT(
TRACE_FIELD(type1, item1, assign1)
TRACE_FIELD(type2, item2, assign2)
[...]
),
TPRAWFMT(raw_fmt)
);
Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
uses.
name: is the unique identifier of the trace point
proto: The proto type that the trace point uses
args: the args in the proto type
fmt: printf format to use with the event printf tracer
fmt_args: the printf argments to match fmt
TRACE_STRUCT starts the ability to create a structure.
Each item in the structure is defined with a TRACE_FIELD
TRACE_FIELD(type, item, assign)
type: the C type of item.
item: the name of the item in the stucture
assign: what to assign the item in the trace point callback
raw_fmt is a way to pretty print the struct. It must match
the order of the items are added in TRACE_STUCT
An example of this would be:
TRACE_EVENT_FORMAT(sched_wakeup,
TPPROTO(struct rq *rq, struct task_struct *p, int success),
TPARGS(rq, p, success),
TPFMT("task %s:%d %s",
p->comm, p->pid, success?"succeeded":"failed"),
TRACE_STRUCT(
TRACE_FIELD(pid_t, pid, p->pid)
TRACE_FIELD(int, success, success)
),
TPRAWFMT("task %d success=%d")
);
This creates us a unique struct of:
struct {
pid_t pid;
int success;
};
And the way the call back would assign these values would be:
entry->pid = p->pid;
entry->success = success;
The nice part about this is that the creation of the assignent is done
via macro magic in the event tracer. Once the TRACE_EVENT_FORMAT is
created, the developer will then have a faster method to record
into the ring buffer. They do not need to worry about the tracer itself.
The developer would only need to touch the files in include/trace/*.h
Again, I would like to give special thanks to Tom Zanussi for this
nice idea.
Idea-from: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Sat, 28 Feb 2009 00:38:04 +0000 (19:38 -0500)]
tracing: add interface to write into current tracer buffer
Right now all tracers must manage their own trace buffers. This was
to enforce tracers to be independent in case we finally decide to
allow each tracer to have their own trace buffer.
But now we are adding event tracing that writes to the current tracer's
buffer. This adds an interface to allow events to write to the current
tracer buffer without having to manage its own. Since event tracing
has no "tracer", and is just a way to hook into any other tracer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Sat, 28 Feb 2009 04:45:52 +0000 (23:45 -0500)]
tracing: add subsystem sched for sched events
Add the TRACE_SYSTEM sched for the sched events.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Sat, 28 Feb 2009 04:41:43 +0000 (23:41 -0500)]
tracing: add subsystem irq for irq events
Add the TRACE_SYSTEM irq for the irq events.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Sat, 28 Feb 2009 04:32:58 +0000 (23:32 -0500)]
tracing: make the set_event and available_events subsystem aware
This patch makes the event files, set_event and available_events
aware of the subsystem.
Now you can enable an entire subsystem with:
echo 'irq:*' > set_event
Note: the '*' is not needed.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Ingo Molnar [Sat, 28 Feb 2009 08:03:58 +0000 (09:03 +0100)]
Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
Steven Rostedt [Sat, 28 Feb 2009 02:33:02 +0000 (21:33 -0500)]
tracing: add subsystem level to trace events
If a trace point header defines TRACE_SYSTEM, then it will add the
following trace points into that event system.
If include/trace/irq_event_types.h has:
#define TRACE_SYSTEM irq
at the top and
#undef TRACE_SYSTEM
at the bottom, then a directory "irq" will be created in the
/debug/tracing/events directory. Inside that directory will contain the
two trace points that are defined in include/trace/irq_event_types.h.
Only adding the above to irq and not to sched, we get:
# ls /debug/tracing/events/
irq sched_process_exit sched_signal_send sched_wakeup_new
sched_kthread_stop sched_process_fork sched_switch
sched_kthread_stop_ret sched_process_free sched_wait_task
sched_migrate_task sched_process_wait sched_wakeup
# ls /debug/tracing/events/irq
irq_handler_entry irq_handler_exit
If we add #define TRACE_SYSTEM sched to the trace/sched_event_types.h
then the rest of the trace events will be put in a sched directory
within the events directory.
I've been playing with this idea of the subsystem for a while, but
recently Tom Zanussi posted some patches to lkml that included this
method. Tom's approach was clean and got me to finally put some effort
to clean up the event trace points.
Thanks to Tom Zanussi for demonstrating how nice the subsystem
method is.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Fri, 27 Feb 2009 22:36:06 +0000 (17:36 -0500)]
tracing: move trace point formats to files in include/trace directory
Impact: clean up
To further facilitate the ease of adding trace points for developers, this
patch creates include/trace/trace_events.h and
include/trace/trace_event_types.h.
The former file will hold the trace/<type>.h files and the latter will hold
the trace/<type>_event_types.h files.
To create new tracepoints and to have them automatically
appear in the event tracer, a developer makes the trace/<type>.h file
which includes <linux/tracepoint.h> and the trace/<type>_event_types.h file.
The trace/<type>_event_types.h file will hold the TRACE_FORMAT
macros.
Then add the trace/<type>.h file to trace/trace_events.h,
and add the trace/<type>_event_types.h to the trace_event_types.h file.
No need to modify files elsewhere.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Fri, 27 Feb 2009 15:51:10 +0000 (10:51 -0500)]
tracing: replace kzalloc with kcalloc
Impact: clean up
kcalloc is a better approach to allocate a NULL array.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Ingo Molnar [Fri, 27 Feb 2009 08:06:11 +0000 (09:06 +0100)]
Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
Conflicts:
kernel/sched_clock.c
Ingo Molnar [Fri, 27 Feb 2009 08:04:43 +0000 (09:04 +0100)]
Merge branches 'tracing/ftrace' and 'linus' into tracing/core
Ingo Molnar [Thu, 26 Feb 2009 20:21:59 +0000 (21:21 +0100)]
Merge branch 'sched/clock' into tracing/ftrace
Conflicts:
kernel/sched_clock.c
Steven Rostedt [Fri, 27 Feb 2009 05:22:21 +0000 (00:22 -0500)]
tracing: use newline separator for trace options list
Impact: clean up
Instead of listing the trace options like:
# cat /debug/tracing/trace_options
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch annotate nouserstacktrace nosym-userobj
We now list them like:
# cat /debug/tracing/trace_options
print-parent
nosym-offset
nosym-addr
noverbose
noraw
nohex
nobin
noblock
nostacktrace
nosched-tree
ftrace_printk
noftrace_preempt
nobranch
annotate
nouserstacktrace
nosym-userobj
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Fri, 27 Feb 2009 05:12:38 +0000 (00:12 -0500)]
tracing: use pointer error returns for __tracing_open
Impact: fix compile warning and clean up
When I first wrote __tracing_open, instead of passing the error
code via the ERR_PTR macros, I lazily used a separate parameter
to hold the return for errors.
When Frederic Weisbecker updated that function, he used the Linux
kernel ERR_PTR for the returns. This caused the parameter return
to possibly not be initialized on error. gcc correctly pointed this
out with a warning.
This patch converts the entire function to use the Linux kernel
ERR_PTR macro methods.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Fri, 27 Feb 2009 04:55:58 +0000 (23:55 -0500)]
tracing: add protection around open use of current_tracer
Impact: fix to possible race conditions
There's some uses of current_tracer that is not protected by the
trace_types_lock. There is a small chance that a sysadmin changes
the tracer while the current_tracer is being referenced.
If the race is hit, it is unlikely to cause any harm since the
tracers are constant and are not freed. But some strang side
effects may occur.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Fri, 27 Feb 2009 04:43:05 +0000 (23:43 -0500)]
tracing: add tracer dependent options to options directory
This patch adds the tracer dependent options dynamically to the
options directory when the tracer is activated. These options are
removed when the tracer is deactivated.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Fri, 27 Feb 2009 03:19:12 +0000 (22:19 -0500)]
tracing: add options directory and core option files
This patch creates an options directory in the debugfs, that contains
the available tracing options. These files contain 1 or 0, where 1
is the option is enabled and 0 it is disabled.
Simply echoing in 1 will enable the option and 0 will disable it.
This patch only contains the core options, not the tracer options.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Ingo Molnar [Thu, 26 Feb 2009 20:21:59 +0000 (21:21 +0100)]
Merge branch 'sched/clock' into tracing/ftrace
Conflicts:
kernel/sched_clock.c
Ingo Molnar [Thu, 26 Feb 2009 19:16:58 +0000 (20:16 +0100)]
x86: set X86_FEATURE_TSC_RELIABLE
If the TSC is constant and non-stop, also set it reliable.
(We will turn this off in DMI quirks for multi-chassis systems)
The performance number on a 16-way Nehalem system running
32 tasks that context-switch between each other is significant:
sched_clock_stable=0 sched_clock_stable=1
.................... ....................
22.456925 million/sec 24.306972 million/sec [+8.2%]
lmbench's "lat_ctx -s 0 2" goes from 0.63 microseconds to
0.59 microseconds - a 6.7% increase in context-switching
performance.
Perfstat of 1 million pipe context switches between two tasks:
Performance counter stats for './pipe-test-1m':
[before] [after]
............ ............
37621.421089 36436.848378 task clock ticks (msecs)
0 0 CPU migrations (events)
2000274 2000189 context switches (events)
194 193 pagefaults (events)
8433799643 8171016416 CPU cycles (events) -3.21%
8370133368 8180999694 instructions (events) -2.31%
4158565 3895941 cache references (events) -6.74%
44312 46264 cache misses (events)
2349.287976 2279.362465 wall-time (msecs) -3.06%
The speedup comes straight from the reduction in the instruction
count. sched_clock_cpu() got simpler and the whole workload thus
executes faster.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 26 Feb 2009 19:20:29 +0000 (20:20 +0100)]
sched: allow architectures to specify sched_clock_stable
Allow CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures to still specify
that their sched_clock() implementation is reliable.
This will be used by x86 to switch on a faster sched_clock_cpu()
implementation on certain CPU types.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Thu, 26 Feb 2009 18:37:00 +0000 (10:37 -0800)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: try committing transaction before returning ENOSPC
Btrfs: add better -ENOSPC handling
Linus Torvalds [Thu, 26 Feb 2009 18:36:35 +0000 (10:36 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
xen/blkfront: use blk_rq_map_sg to generate ring entries
block: reduce stack footprint of blk_recount_segments()
cciss: shorten 30s timeout on controller reset
block: add documentation for register_blkdev()
block: fix bogus gcc warning for uninitialized var usage
Linus Torvalds [Thu, 26 Feb 2009 18:36:19 +0000 (10:36 -0800)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix 64bit __copy_tofrom_user() regression
powerpc: Fix 64bit memcpy() regression
powerpc: Fix load/store float double alignment handler
Linus Torvalds [Thu, 26 Feb 2009 18:32:31 +0000 (10:32 -0800)]
Make ieee1394_init a fs-initcall
It needs to happen before any firewire driver actually registers itself,
and that was previously handled by having the Makefile list the core
ieee1394 files before the drivers.
But now there are firewire drivers in drivers/media, and the Makefile
games aren't enough. So just make ieee1394_init happen earlier in the
init sequence, the way all other bus layers already do.
Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Henrik Kurelid <henrik@kurelid.se>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Ben Backx <ben@bbackx.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Thu, 26 Feb 2009 17:47:11 +0000 (18:47 +0100)]
tracing: implement trace_clock_*() APIs
Impact: implement new tracing timestamp APIs
Add three trace clock variants, with differing scalability/precision
tradeoffs:
- local: CPU-local trace clock
- medium: scalable global clock with some jitter
- global: globally monotonic, serialized clock
Make the ring-buffer use the local trace clock internally.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 12 May 2008 19:21:14 +0000 (21:21 +0200)]
sched: sched_clock() improvement: use in_nmi()
make sure we dont execute more complex sched_clock() code in NMI context.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jason Baron [Thu, 26 Feb 2009 15:11:05 +0000 (10:11 -0500)]
tracing, genirq: add irq enter and exit trace events
Impact: add new tracepoints
Add them to the generic IRQ code, that way every architecture
gets these new tracepoints, not just x86.
Using Steve's new 'TRACE_FORMAT', I can get function graph
trace as follows using the original two IRQ tracepoints:
3) | handle_IRQ_event() {
3) | /* (irq_handler_entry) irq=28 handler=eth0 */
3) | e1000_intr_msi() {
3) 2.460 us | __napi_schedule();
3) 9.416 us | }
3) | /* (irq_handler_exit) irq=28 handler=eth0 return=handled */
3) + 22.935 us | }
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Wed, 25 Feb 2009 23:41:38 +0000 (00:41 +0100)]
tracing/core: make the per cpu trace files in per cpu directories
Impact: restructure the VFS layout of per CPU trace buffers
The per cpu trace files are all in a single directory:
/debug/tracing/per_cpu. In case of a large number of cpu, the
content of this directory becomes messy so we create now one
directory per cpu inside /debug/tracing/per_cpu which contain
each their own trace_pipe and trace files.
Ie:
/debug/tracing$ ls -R per_cpu
per_cpu:
cpu0 cpu1
per_cpu/cpu0:
trace trace_pipe
per_cpu/cpu1:
trace trace_pipe
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jens Axboe [Tue, 24 Feb 2009 07:10:09 +0000 (08:10 +0100)]
xen/blkfront: use blk_rq_map_sg to generate ring entries
On occasion, the request will apparently have more segments than we
fit into the ring. Jens says:
> The second problem is that the block layer then appears to create one
> too many segments, but from the dump it has rq->nr_phys_segments ==
> BLKIF_MAX_SEGMENTS_PER_REQUEST. I suspect the latter is due to
> xen-blkfront not handling the merging on its own. It should check that
> the new page doesn't form part of the previous page. The
> rq_for_each_segment() iterates all single bits in the request, not dma
> segments. The "easiest" way to do this is to call blk_rq_map_sg() and
> then iterate the mapped sg list. That will give you what you are
> looking for.
> Here's a test patch, compiles but otherwise untested. I spent more
> time figuring out how to enable XEN than to code it up, so YMMV!
> Probably the sg list wants to be put inside the ring and only
> initialized on allocation, then you can get rid of the sg on stack and
> sg_init_table() loop call in the function. I'll leave that, and the
> testing, to you.
[Moved sg array into info structure, and initialize once. -J]
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Jens Axboe [Mon, 23 Feb 2009 08:03:10 +0000 (09:03 +0100)]
block: reduce stack footprint of blk_recount_segments()
blk_recalc_rq_segments() requires a request structure passed in, which
we don't have from blk_recount_segments(). So the latter allocates one on
the stack, using > 400 bytes of stack for that. This can cause us to spill
over one page of stack from ext4 at least:
0) 4560 400 blk_recount_segments+0x43/0x62
1) 4160 32 bio_phys_segments+0x1c/0x24
2) 4128 32 blk_rq_bio_prep+0x2a/0xf9
3) 4096 32 init_request_from_bio+0xf9/0xfe
4) 4064 112 __make_request+0x33c/0x3f6
5) 3952 144 generic_make_request+0x2d1/0x321
6) 3808 64 submit_bio+0xb9/0xc3
7) 3744 48 submit_bh+0xea/0x10e
8) 3696 368 ext4_mb_init_cache+0x257/0xa6a [ext4]
9) 3328 288 ext4_mb_regular_allocator+0x421/0xcd9 [ext4]
10) 3040 160 ext4_mb_new_blocks+0x211/0x4b4 [ext4]
11) 2880 336 ext4_ext_get_blocks+0xb61/0xd45 [ext4]
12) 2544 96 ext4_get_blocks_wrap+0xf2/0x200 [ext4]
13) 2448 80 ext4_da_get_block_write+0x6e/0x16b [ext4]
14) 2368 352 mpage_da_map_blocks+0x7e/0x4b3 [ext4]
15) 2016 352 ext4_da_writepages+0x2ce/0x43c [ext4]
16) 1664 32 do_writepages+0x2d/0x3c
17) 1632 144 __writeback_single_inode+0x162/0x2cd
18) 1488 96 generic_sync_sb_inodes+0x1e3/0x32b
19) 1392 16 sync_sb_inodes+0xe/0x10
20) 1376 48 writeback_inodes+0x69/0xb3
21) 1328 208 balance_dirty_pages_ratelimited_nr+0x187/0x2f9
22) 1120 224 generic_file_buffered_write+0x1d4/0x2c4
23) 896 176 __generic_file_aio_write_nolock+0x35f/0x393
24) 720 80 generic_file_aio_write+0x6c/0xc8
25) 640 80 ext4_file_write+0xa9/0x137 [ext4]
26) 560 320 do_sync_write+0xf0/0x137
27) 240 48 vfs_write+0xb3/0x13c
28) 192 64 sys_write+0x4c/0x74
29) 128 128 system_call_fastpath+0x16/0x1b
Split the segment counting out into a __blk_recalc_rq_segments() helper
to avoid allocating an onstack request just for checking the physical
segment count.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Mon, 23 Feb 2009 07:53:35 +0000 (08:53 +0100)]
cciss: shorten 30s timeout on controller reset
If reset_devices is set for kexec, then cciss will delay 30 seconds
since the old 5i controller _may_ need that long to recover. Replace
the long sleep with incremental sleep and tests to reduce the 30 seconds
to worst case for 5i, so that other controllers will proceed quickly.
Reviewed-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Márton Németh [Fri, 20 Feb 2009 07:12:51 +0000 (08:12 +0100)]
block: add documentation for register_blkdev()
Add documentation for register_blkdev() function and for the parameters.
Signed-off-by: Márton Németh <nm127@freemail.hu>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Thu, 19 Feb 2009 07:50:26 +0000 (08:50 +0100)]
block: fix bogus gcc warning for uninitialized var usage
Newer gcc throw this warning:
fs/bio.c: In function ?bio_alloc_bioset?:
fs/bio.c:305: warning: ?p? may be used uninitialized in this function
since it cannot figure out that 'p' is only ever used if 'bs' is non-NULL.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Mark Nelson [Wed, 25 Feb 2009 13:46:24 +0000 (13:46 +0000)]
powerpc: Fix 64bit __copy_tofrom_user() regression
This fixes a regression introduced by commit
a4e22f02f5b6518c1484faea1f88d81802b9feac ("powerpc: Update 64bit
__copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD").
The same bug that existed in the 64bit memcpy() also exists here so fix
it here too. The fix is the same as that applied to memcpy() with the
addition of fixes for the exception handling code required for
__copy_tofrom_user().
This stops us reading beyond the end of the source region we were told
to copy.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mark Nelson [Wed, 25 Feb 2009 13:26:48 +0000 (13:26 +0000)]
powerpc: Fix 64bit memcpy() regression
This fixes a regression introduced by commit
25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 ("powerpc: Update 64bit memcpy()
using CPU_FTR_UNALIGNED_LD_STD").
This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
feature bit present to do the memcpy() with unaligned load doubles. But,
along with this came a bug where our final load double would read bytes
beyond a page boundary and into the next (unmapped) page. This was caught
by enabling CONFIG_DEBUG_PAGEALLOC,
The fix was to read only the number of bytes that we need to store rather
than reading a full 8-byte doubleword and storing only a portion of that.
In order to minimise the amount of existing code touched we use the
original do_tail for the src_unaligned case.
Below is an example of the regression, as reported by Sachin Sant:
Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [
c00000003baf3020]
pc:
c000000000039574: .memcpy+0x74/0x244
lr:
d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
sp:
c00000003baf32a0
msr:
8000000000009032
dar:
c00000003f380000
dsisr:
40000000
current = 0xc00000003e54b010
paca = 0xc000000000a53680
pid = 1840, comm = readahead
enter ? for help
[link register ]
d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[
c00000003baf32a0]
d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[
c00000003baf3390]
d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[
c00000003baf3400]
c000000000148154 .generic_getxattr+0x74/0x9c
[
c00000003baf34a0]
c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[
c00000003baf3560]
c00000000032c6b0 .security_d_instantiate+0x50/0x68
[
c00000003baf35e0]
c00000000013c818 .d_instantiate+0x78/0x9c
[
c00000003baf3680]
c00000000013ced0 .d_splice_alias+0xf0/0x120
[
c00000003baf3720]
d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[
c00000003baf37c0]
c000000000131e74 .do_lookup+0x110/0x260
[
c00000003baf3880]
c000000000134ed0 .__link_path_walk+0xa98/0x1010
[
c00000003baf3970]
c0000000001354a0 .path_walk+0x58/0xc4
[
c00000003baf3a20]
c000000000135720 .do_path_lookup+0x138/0x1e4
[
c00000003baf3ad0]
c00000000013645c .path_lookup_open+0x6c/0xc8
[
c00000003baf3b70]
c000000000136780 .do_filp_open+0xcc/0x874
[
c00000003baf3d10]
c0000000001251e0 .do_sys_open+0x80/0x140
[
c00000003baf3dc0]
c00000000016aaec .compat_sys_open+0x24/0x38
[
c00000003baf3e30]
c00000000000855c syscall_exit+0x0/0x40
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Thu, 19 Feb 2009 18:52:20 +0000 (18:52 +0000)]
powerpc: Fix load/store float double alignment handler
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct. Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.
Below fixes this and merges the little/big endian case.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ingo Molnar [Thu, 26 Feb 2009 02:48:44 +0000 (03:48 +0100)]
Merge branch 'tip/tracing/ftrace' of ssh:///linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
Ingo Molnar [Thu, 26 Feb 2009 02:47:27 +0000 (03:47 +0100)]
Merge branches 'tracing/ftrace', 'tracing/hw-branch-tracing' and 'linus' into tracing/core
Steven Rostedt [Wed, 25 Feb 2009 20:54:30 +0000 (15:54 -0500)]
tracing: wrap arguments with PARAMS
Peter Zijlstra warned that TPPROTO and TPARGS might become something
other than a simple copy of itself. To prevent this from having
side effects in the TRACE_FORMAT macro in tracepoint.h, we add a
PARAMS() macro to be defined as just a wrapper.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Wed, 25 Feb 2009 20:49:52 +0000 (15:49 -0500)]
tracing: rename DEFINE_TRACE_FMT to just TRACE_FORMAT
There's been a bit confusion to whether DEFINE/DECLARE_TRACE_FMT should
be a DEFINE or a DECLARE. Ingo Molnar suggested simply calling it
TRACE_FORMAT.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Linus Torvalds [Wed, 25 Feb 2009 23:16:18 +0000 (15:16 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: emu10k1 - Fix digital/analog switch on audigy2 ZS
ALSA: hda - Quirk for Acer Aspire 6530G
ALSA: hda - add another MacBook Pro 3,1 SSID
ALSA: fix excessive background noise introduced by OSS emulation rate shrink
ALSA: aw2: do not grab every saa7146 based device
ALSA: hda - Fix parse of init_verbs sysfs entry
ALSA: pcxhr.h replace signed one-bit bitfields
Linus Torvalds [Wed, 25 Feb 2009 23:14:37 +0000 (15:14 -0800)]
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Don't go beyond iosapic_intr_info's arraysize
[IA64] Do not go beyond ARRAY_SIZE of unw.hash
[IA64] enable setting DMAR on by default
Linus Torvalds [Wed, 25 Feb 2009 23:12:48 +0000 (15:12 -0800)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] pata_legacy: for VLB 32bit PIO don't try tricks with slop
[libata] pata_amd: program FIFO
sata_mv: fix SoC interrupt breakage
pata_it821x: resume from hibernation fails with RAID volume
Alan Cox [Wed, 11 Feb 2009 21:08:42 +0000 (13:08 -0800)]
[libata] pata_legacy: for VLB 32bit PIO don't try tricks with slop
These devices are generally used with ATA anyway and it seems that some
ATAPI will need us to issue the right number of words. Therefore as we
can't switch mid burst on VLB devices we should only use 32bit I/O for
suitable block sizes.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alan Cox [Wed, 11 Feb 2009 21:08:41 +0000 (13:08 -0800)]
[libata] pata_amd: program FIFO
With 32bit PIO we can use the posted write buffers, but only for 32bit I/O
cycles. This means we must disable the FIFO for ATAPI where a final 16bit
cycle may occur.
Rework the FIFO logic so that we disable the FIFO then selectively
re-enable it when we set the timings on AMD devices. Also fix a case
where we scribbled on PCI config 0x41 of Nvidia chips when we shouldn't.
Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Mark Lord [Thu, 19 Feb 2009 15:38:04 +0000 (10:38 -0500)]
sata_mv: fix SoC interrupt breakage
For some reason, sata_mv doesn't clear interrupt status during init
when it's running on an SoC host adapter. If the bootloader has
touched the SATA controller before starting Linux, Linux can end up
enabling the SATA interrupt with events pending, which will cause the
interrupt to be marked as spurious and then be disabled, which then
breaks all further accesses to the controller.
This patch makes the SoC path clear interrupt status on init like in
the non-SoC case.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Ondrej Zary [Wed, 11 Feb 2009 21:08:43 +0000 (13:08 -0800)]
pata_it821x: resume from hibernation fails with RAID volume
Hibernation didn't work for me since I started to use IT8212 controller.
I did some debugging (booting with no_console_suspend init=/bin/sh).
Found that resume fails (2.6.28) with "serial number mismatch 'some
garbage' != 'some other garbage'" and "revalidation failed" messages.
That's because the controller firmware fills different serial number in
the IDENTIFY every boot.
The patch below fixes the resume simply clearing the serial number. The
proper fix would be probably to fill in the serial number of the RAID
volume instead. I assume that there must be something like that stored on
the drives but I don't know where.
Fix resume on pata_it821x RAID volume by clearing the serial number in
IDENTIFY data, which is otherwise different on each boot.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Linus Torvalds [Wed, 25 Feb 2009 20:22:06 +0000 (12:22 -0800)]
Merge git://git./linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
ide: fix refcounting in device drivers
ide-cd: document capacity hack
it821x: remove dead URL
atiixp: fix missing parentheses
amd74xx: device/vendor confusion
ide: ide.c 'clear' fix, update "ide=nodma" documentation
Hugh Dickins [Tue, 24 Feb 2009 20:51:52 +0000 (20:51 +0000)]
shmem: fix shared anonymous accounting
Each time I exit Firefox, /proc/meminfo's Committed_AS goes down almost
400 kB: OVERCOMMIT_NEVER would be allowing overcommits it should
prohibit.
Commit
fc8744adc870a8d4366908221508bb113d8b72ee "Stop playing silly
games with the VM_ACCOUNT flag" changed shmem_file_setup() to set the
shmem file's VM_ACCOUNT flag according to VM_NORESERVE not being set in
the vma flags; but did so only _after_ the shmem_acct_size(flags, size)
call which is expected to pre-account a shared anonymous object.
It's all clearer if we switch shmem.c over to use VM_NORESERVE
throughout in place of !VM_ACCOUNT.
But I very nearly sent in a patch which mistakenly removed the
accounting from tmpfs files: shmem_get_inode()'s memset was good for not
setting VM_ACCOUNT, but now it needs to set VM_NORESERVE.
Rather than setting that by default, then perhaps clearing it again in
shmem_file_setup(), let's pass it as a flag to shmem_get_inode(): that
allows us to remove the #ifdef CONFIG_SHMEM from shmem_file_setup().
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Sat, 21 Feb 2009 22:40:27 +0000 (23:40 +0100)]
[IA64] Don't go beyond iosapic_intr_info's arraysize
vi arch/ia64/kernel/iosapic.c +142
static struct iosapic_intr_info {
...
} iosapic_intr_info[NR_IRQS];
But at line 510 we have:
for (i = 0; i <= NR_IRQS; i++) {
s/<=/</
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Roel Kluin [Sun, 22 Feb 2009 01:33:28 +0000 (02:33 +0100)]
[IA64] Do not go beyond ARRAY_SIZE of unw.hash
static struct {
... :114
unsigned short hash[UNW_HASH_SIZE];
... :2152
for (index = 0; index <= UNW_HASH_SIZE; ++index) {
This is a bug, isn't it?
s/<=/</
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Kyle McMartin [Sat, 14 Feb 2009 09:11:29 +0000 (04:11 -0500)]
[IA64] enable setting DMAR on by default
The previous commit which introduced the DMAR_DEFAULT_ON setting in
drivers/pci/dmar.c neglected to add the ability for ia64 to enable
the IOMMU by default. Rectify that mistake, doh!
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Bartlomiej Zolnierkiewicz [Wed, 25 Feb 2009 19:28:24 +0000 (20:28 +0100)]
ide: fix refcounting in device drivers
During host driver module removal del_gendisk() results in a final
put on drive->gendev and freeing the drive by drive_release_dev().
Convert device drivers from using struct kref to use struct device
so device driver's object holds reference on ->gendev and prevents
drive from prematurely going away.
Also fix ->remove methods to not erroneously drop reference on a
host driver by using only put_device() instead of ide*_put().
Reported-by: Stanislaw Gruszka <stf_xl@wp.pl>
Tested-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Bartlomiej Zolnierkiewicz [Wed, 25 Feb 2009 19:28:23 +0000 (20:28 +0100)]
ide-cd: document capacity hack
Just copy the comment from drivers/scsi/sr.c::sr_done()
(from which the capacity hack has been originated).
Cc: Borislav Petkov <petkovbb@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Bartlomiej Zolnierkiewicz [Wed, 25 Feb 2009 19:28:22 +0000 (20:28 +0100)]
it821x: remove dead URL
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Roel Kluin [Wed, 25 Feb 2009 19:28:22 +0000 (20:28 +0100)]
atiixp: fix missing parentheses
Fix missing parentheses so PIO/DMA timings for master device on the
second channel are programmed correctly (IOW "8 0 24 16" offset values
should be used instead of the current "8 0 16 16").
[ The bug went unnoticed because after PIO/DMA timings get programmed
incorrectly for the third device they are overwritten with timings
for the fourth device and since BIOS should also program timings for
the third device everything should work fine until suspend/resume
cycle or user requested transfer mode changes. ]
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
[bart: update patch description]
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Roel Kluin [Wed, 25 Feb 2009 19:28:22 +0000 (20:28 +0100)]
amd74xx: device/vendor confusion
Device and vendor ids were confused
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
David Fries [Wed, 25 Feb 2009 19:28:21 +0000 (20:28 +0100)]
ide: ide.c 'clear' fix, update "ide=nodma" documentation
Documentation/kernel-parameters.txt
- ide=nodma is no longer valid.
drivers/ide/Kconfig
- The module is ide-core.ko not ide.
drivers/ide/ide.c
- It took me a while to figure out what the arguments %d.%d:%d to nodma
module parameter ment, so I added a comment to each.
- Added a comment to each of the sscanf lines.
- There is a bug, if j is 0 it would previously clear all the other bits
except the current device, changed in three different places.
mask &= (1 << i) should be mask &= ~(1 << i).
Signed-off-by: David Fries <david@fries.net>
[bart: s/disk/device/ in ide.c, beautify patch description]
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Linus Torvalds [Wed, 25 Feb 2009 17:49:30 +0000 (09:49 -0800)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/i915: convert DRM_ERROR to DRM_DEBUG in phys object pwrite path
drm/i915: make hw page ioremap use ioremap_wc
drm: edid revision 0 is valid
drm: Correct unbalanced drm_vblank_put() during mode setting.
drm: disable encoders before re-routing them
drm: Fix ordering of bit fields in EDID structure leading huge vsync values.
drm: Fix shifts of EDID vsync offset/width fields.
drm/i915: handle bogus VBT panel timing
drm/i915: remove PLL debugging messages
Linus Torvalds [Wed, 25 Feb 2009 17:34:27 +0000 (09:34 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: avoid races when stopping resync.
md/raid10: Don't call bitmap_cond_end_sync when we are doing recovery.
md/raid10: Don't skip more than 1 bitmap-chunk at a time during recovery.
Linus Torvalds [Wed, 25 Feb 2009 17:31:56 +0000 (09:31 -0800)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Fix crashes in jbusmc_print_dimm()
Linus Torvalds [Wed, 25 Feb 2009 17:31:21 +0000 (09:31 -0800)]
Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6:
intel-iommu: fix endless "Unknown DMAR structure type" loop
VT-d: handle Invalidation Queue Error to avoid system hang
intel-iommu: fix build error with INTR_REMAP=y and DMAR=n
Fenghua Yu [Wed, 25 Feb 2009 05:06:26 +0000 (14:06 +0900)]
Fix iwlan DMA mapping direction
When iwlan runs on IOMMU, IOMMU generates a lot of PTE write faults
because PTE write bit is not set on some of PTE's. This is because
iwlan driver calls DMA mapping with PCI_DMA_TODEVICE which is read only
in mapping PTE. But iwlan device actually writes to the mapped page to
update its contents. This issue is not exposed in swiotlb. But VT-d
hardware can capture this fault and stop the fault transaction.
The following patch fixes the issue.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Bhavesh Davda <bhavesh@vmware.com>
Tested-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frederic Weisbecker [Wed, 25 Feb 2009 05:13:16 +0000 (06:13 +0100)]
tracing/core: make the read callbacks reentrants
Now that several per-cpu files can be read or spliced at the
same, we want the read/splice callbacks for tracing files to be
reentrants.
Until now, a single global mutex (trace_types_lock) serialized
the access to tracing_read_pipe(), tracing_splice_read_pipe(),
and the seq helpers.
Ie: it means that if a user tries to read trace_pipe0 and
trace_pipe1 at the same time, the access to the function
tracing_read_pipe() is contended and one reader must wait for
the other to finish its read call.
The trace_type_lock mutex is mostly here to serialize the access
to the global current tracer (current_trace), which can be
changed concurrently. Although the iter struct keeps a private
pointer to this tracer, its callbacks can be changed by another
function.
The method used here is to not keep anymore private reference to
the tracer inside the iterator but to make a copy of it inside
the iterator. Then it checks on subsequents read calls if the
tracer has changed. This is not costly because the current
tracer is not expected to be changed often, so we use a branch
prediction for that.
Moreover, we add a private mutex to the iterator (there is one
iterator per file descriptor) to serialize the accesses in case
of multiple consumers per file descriptor (which would be a
silly idea from the user). Note that this is not to protect the
ring buffer, since the ring buffer already serializes the
readers accesses. This is to prevent from traces weirdness in
case of concurrent consumers. But these mutexes can be dropped
anyway, that would not result in any crash. Just tell me what
you think about it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Wed, 25 Feb 2009 02:22:28 +0000 (03:22 +0100)]
tracing/core: introduce per cpu tracing files
Impact: split up tracing output per cpu
Currently, on the tracing debugfs directory, three files are
available to the user to let him extracting the trace output:
- trace is an iterator through the ring-buffer. It's a reader
but not a consumer It doesn't block when no more traces are
available.
- trace pretty similar to the former, except that it adds more
informations such as prempt count, irq flag, ...
- trace_pipe is a reader and a consumer, it will also block
waiting for traces if necessary (heh, yes it's a pipe).
The traces coming from different cpus are curretly mixed up
inside these files. Sometimes it messes up the informations,
sometimes it's useful, depending on what does the tracer
capture.
The tracing_cpumask file is useful to filter the output and
select only the traces captured a custom defined set of cpus.
But still it is not enough powerful to extract at the same time
one trace buffer per cpu.
So this patch creates a new directory: /debug/tracing/per_cpu/.
Inside this directory, you will now find one trace_pipe file and
one trace file per cpu.
Which means if you have two cpus, you will have:
trace0
trace1
trace_pipe0
trace_pipe1
And of course, reading these files will have the same effect
than with the usual tracing files, except that you will only see
the traces from the given cpu.
The original all-in-one cpu trace file are still available on
their original place.
Until now, only one consumer was allowed on trace_pipe to avoid
racy consuming on the ring-buffer. Now the approach changed a
bit, you can have only one consumer per cpu.
Which means you are allowed to read concurrently trace_pipe0 and
trace_pipe1 But you can't have two readers on trace_pipe0 or
trace_pipe1.
Following the same logic, if there is one reader on the common
trace_pipe, you can not have at the same time another reader on
trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
a consumer in all cpu buffers in essence.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 25 Feb 2009 11:50:07 +0000 (12:50 +0100)]
Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
Ingo Molnar [Wed, 25 Feb 2009 10:03:44 +0000 (11:03 +0100)]
tracing: remove /debug/tracing/latency_trace
Impact: remove old debug/tracing API
/debug/tracing/latency_trace is an old legacy format we kept from
the old latency tracer. Remove the file for now. If there's any
useful bit missing then we'll propagate any useful output bits into
the /debug/tracing/trace output.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Takashi Iwai [Wed, 25 Feb 2009 08:52:42 +0000 (09:52 +0100)]
Merge branch 'fix/misc' into for-linus
Takashi Iwai [Wed, 25 Feb 2009 08:52:38 +0000 (09:52 +0100)]
Merge branch 'fix/hda' into for-linus
Ingo Molnar [Wed, 25 Feb 2009 07:40:09 +0000 (08:40 +0100)]
tracing/hw-branch-tracing: convert bts-tracer mutex to a spinlock
Impact: fix CPU hotplug lockup
bts_hotcpu_handler() is called with irqs disabled, so using mutex_lock()
is a no-no.
All the BTS codepaths here are atomic (they do not schedule), so using
a spinlock is the right solution.
Cc: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dave Airlie [Wed, 25 Feb 2009 04:52:30 +0000 (14:52 +1000)]
drm/i915: convert DRM_ERROR to DRM_DEBUG in phys object pwrite path
This snuck in when I wrote phys object support.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 25 Feb 2009 04:49:21 +0000 (14:49 +1000)]
drm/i915: make hw page ioremap use ioremap_wc
However we still have another issue with ioremap_wc not falling back
properly or somehow doing something else stupid, this probably needs
to be tracked down.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Kyle McMartin [Wed, 25 Feb 2009 01:31:53 +0000 (20:31 -0500)]
drm: edid revision 0 is valid
edid->revision == 0 should be valid (at least, so the error message
indicates. :) and wikipedia seems to indicate that EDID 1.0 existed.
We can dump the entire check, since edid->revision is a u8, so
it can't ever be less than 0.
Marko reports in RH bz#476735 that his monitor claims to be
EDID 1.0, and therefore hits the check and is stuck at 800x600 because
of it.
Reported-by: Marko Ristola <marko.ristola@kolumbus.fi>
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Chris Wilson [Thu, 19 Feb 2009 14:48:22 +0000 (14:48 +0000)]
drm: Correct unbalanced drm_vblank_put() during mode setting.
The first time we install a mode, the vblank will be disabled for a pipe
and so drm_vblank_get() in drm_vblank_pre_modeset() will fail. As we
unconditionally call drm_vblank_put() afterwards, the vblank reference
counter becomes unbalanced.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Jesse Barnes [Tue, 24 Feb 2009 00:09:34 +0000 (16:09 -0800)]
drm: disable encoders before re-routing them
In some cases we may receive a mode config that has a different
CRTC<->encoder map that the current configuration. In that case, we
need to disable any re-routed encoders before setting the mode,
otherwise they may not pick up the new CRTC (if the output types are
incompatible for example).
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Jesse Barnes [Mon, 23 Feb 2009 23:36:41 +0000 (15:36 -0800)]
drm: Fix ordering of bit fields in EDID structure leading huge vsync values.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Linus Torvalds [Mon, 23 Feb 2009 16:44:33 +0000 (08:44 -0800)]
drm: Fix shifts of EDID vsync offset/width fields.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Jesse Barnes [Mon, 23 Feb 2009 23:36:42 +0000 (15:36 -0800)]
drm/i915: handle bogus VBT panel timing
We've seen cases in the wild where the VBT sync data is wrong, so add
some code to fix it up in that case, taking care to make sure that the
total is greater than the sync end.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Jesse Barnes [Mon, 23 Feb 2009 23:36:40 +0000 (15:36 -0800)]
drm/i915: remove PLL debugging messages
These are normal; we walk through different values looking for the right
one, so why flood the screen with messages?
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Steven Rostedt [Tue, 24 Feb 2009 19:15:08 +0000 (14:15 -0500)]
tracing: make event directory structure
This patch adds the directory /debug/tracing/events/ that will contain
all the registered trace points.
# ls /debug/tracing/events/
sched_kthread_stop sched_process_fork sched_switch
sched_kthread_stop_ret sched_process_free sched_wait_task
sched_migrate_task sched_process_wait sched_wakeup
sched_process_exit sched_signal_send sched_wakeup_new
# ls /debug/tracing/events/sched_switch/
enable
# cat /debug/tracing/events/sched_switch/enable
1
# cat /debug/tracing/set_event
sched_switch
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 24 Feb 2009 15:22:57 +0000 (10:22 -0500)]
tracing: add schedule events to event trace
This patch changes the trace/sched.h to use the DECLARE_TRACE_FMT
such that they are automatically registered with the event tracer.
And it also adds the tracing sched headers to kernel/trace/events.c
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 24 Feb 2009 15:21:36 +0000 (10:21 -0500)]
tracing: add event trace infrastructure
This patch creates the event tracing infrastructure of ftrace.
It will create the files:
/debug/tracing/available_events
/debug/tracing/set_event
The available_events will list the trace points that have been
registered with the event tracer.
set_events will allow the user to enable or disable an event hook.
example:
# echo sched_wakeup > /debug/tracing/set_event
Will enable the sched_wakeup event (if it is registered).
# echo "!sched_wakeup" >> /debug/tracing/set_event
Will disable the sched_wakeup event (and only that event).
# echo > /debug/tracing/set_event
Will disable all events (notice the '>')
# cat /debug/tracing/available_events > /debug/tracing/set_event
Will enable all registered event hooks.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 24 Feb 2009 17:07:53 +0000 (12:07 -0500)]
tracing: add DEFINE_TRACE_FMT to tracepoint.h
This patch creates a DEFINE_TRACE_FMT to map to DECLARE_TRACE.
This allows for the developers to place format strings and
args in with their tracepoint declaration. A tracer may now
override the DEFINE_TRACE_FMT macro and use it to record
a default format.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
NeilBrown [Wed, 25 Feb 2009 02:18:47 +0000 (13:18 +1100)]
md: avoid races when stopping resync.
There has been a race in raid10 and raid1 for a long time
which has only recently started showing up due to a scheduler changed.
When a sync_read request finishes, as soon as reschedule_retry
is called, another thread can mark the resync request as having
completed, so md_do_sync can finish, ->stop can be called, and
->conf can be freed. So using conf after reschedule_retry is not
safe.
Similarly, when finishing a sync_write, calling md_done_sync must be
the last thing we do, as it allows a chain of events which will free
conf and other data structures.
The first of these requires action in raid10.c
The second requires action in raid1.c and raid10.c
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 25 Feb 2009 02:18:47 +0000 (13:18 +1100)]
md/raid10: Don't call bitmap_cond_end_sync when we are doing recovery.
For raid1/4/5/6, resync (fixing inconsistencies between devices) is
very similar to recovery (rebuilding a failed device onto a spare).
The both walk through the device addresses in order.
For raid10 it can be quite different. resync follows the 'array'
address, and makes sure all copies are the same. Recover walks
through 'device' addresses and recreates each missing block.
The 'bitmap_cond_end_sync' function allows the write-intent-bitmap
(When present) to be updated to reflect a partially completed resync.
It makes assumptions which mean that it does not work correctly for
raid10 recovery at all.
In particularly, it can cause bitmap-directed recovery of a raid10 to
not recovery some of the blocks that need to be recovered.
So move the call to bitmap_cond_end_sync into the resync path, rather
than being in the common "resync or recovery" path.
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 25 Feb 2009 02:18:47 +0000 (13:18 +1100)]
md/raid10: Don't skip more than 1 bitmap-chunk at a time during recovery.
When doing recovery on a raid10 with a write-intent bitmap, we only
need to recovery chunks that are flagged in the bitmap.
However if we choose to skip a chunk as it isn't flag, the code
currently skips the whole raid10-chunk, thus it might not recovery
some blocks that need recovering.
This patch fixes it.
In case that is confusing, it might help to understand that there
is a 'raid10 chunk size' which guides how data is distributed across
the devices, and a 'bitmap chunk size' which says how much data
corresponds to a single bit in the bitmap.
This bug only affects cases where the bitmap chunk size is smaller
than the raid10 chunk size.
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Linus Torvalds [Tue, 24 Feb 2009 23:42:08 +0000 (15:42 -0800)]
Merge branch 'proc-linus' of git://git./linux/kernel/git/adobriyan/proc
* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
proc: fix PG_locked reporting in /proc/kpageflags
Linus Torvalds [Tue, 24 Feb 2009 23:40:19 +0000 (15:40 -0800)]
Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
Add i2c_board_info for RiscPC PCF8583
i2c: Make sure i2c_algo_bit_data.timeout is HZ-independent
i2c-dev: Clarify the unit of ioctl I2C_TIMEOUT
i2c: Timeouts reach -1
i2c: Fix misplaced parentheses
Linus Torvalds [Tue, 24 Feb 2009 23:39:54 +0000 (15:39 -0800)]
Merge branch 'firedtv-merge' of git://git./linux/kernel/git/ieee1394/linux1394-2.6
* 'firedtv-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firedtv: dvb_frontend_info for FireDTV S2, fix "frequency limits undefined" error
firedtv: massive refactoring
firedtv: rename files, variables, functions from firesat to firedtv
firedtv: Use DEFINE_SPINLOCK
firedtv: fix registration - adapter number could only be zero
firedtv: use length_field() of PMT as length
firedtv: fix returned struct for ca_info
firedtv: cleanups and minor fixes
ieee1394: remove superfluous assertions
ieee1394: inherit ud vendor_id from node vendor_id
ieee1394: add hpsb_node_read() and hpsb_node_lock()
ieee1394: use correct barrier types between accesses of nodeid and generation
firesat: copyrights, rename to firedtv, API conversions, fix remote control input
firesat: avc resend
firesat: update isochronous interface, add CI support
firesat: add DVB-S support for DVB-S2 devices
firesat: fix DVB-S2 device recognition
DVB: add firesat driver
Linus Torvalds [Tue, 24 Feb 2009 23:39:34 +0000 (15:39 -0800)]
Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix deadlock in ext4_write_begin() and ext4_da_write_begin()
ext4: Add fallback for find_group_flex
Russell King [Tue, 24 Feb 2009 18:19:50 +0000 (19:19 +0100)]
Add i2c_board_info for RiscPC PCF8583
Add the necessary i2c_board_info structure to fix the lack of PCF8583
RTC on RiscPC.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Jean Delvare [Tue, 24 Feb 2009 18:19:49 +0000 (19:19 +0100)]
i2c: Make sure i2c_algo_bit_data.timeout is HZ-independent
i2c_algo_bit_data.timeout is supposed to be in jiffies, so drivers
should use set this value in terms of HZ.
Ultimately I think this field should be discarded in favor of
i2c_adapter.timeout, but that's left for a future patch.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Lennert Buytenhek <kernel@wantstofly.org>
Acked-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Jean Delvare [Tue, 24 Feb 2009 18:19:49 +0000 (19:19 +0100)]
i2c-dev: Clarify the unit of ioctl I2C_TIMEOUT
The unit in which user-space can set the bus timeout value is jiffies
for historical reasons (back when HZ was always 100.) This is however
not good because user-space doesn't know how long a jiffy lasts. The
timeout value should instead be set in a fixed time unit. Given the
original value of HZ, this unit should be 10 ms, for compatibility.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Roel Kluin [Tue, 24 Feb 2009 18:19:48 +0000 (19:19 +0100)]
i2c: Timeouts reach -1
With a postfix decrement these timeouts reach -1 rather than 0, but
after the loop it is tested whether they have become 0.
As pointed out by Jean Delvare, the condition we are waiting for should
also be tested before the timeout. With the current order, you could
exit with a timeout error while the job is actually done.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Roel Kluin [Tue, 24 Feb 2009 18:19:48 +0000 (19:19 +0100)]
i2c: Fix misplaced parentheses
Fix misplaced parentheses.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Helge Bahmann [Fri, 20 Feb 2009 13:24:12 +0000 (16:24 +0300)]
proc: fix PG_locked reporting in /proc/kpageflags
Expr always evaluates to zero.
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>