Steven Rostedt [Wed, 3 Sep 2008 21:42:50 +0000 (17:42 -0400)]
ftrace: binary and not logical for continue test
Peter Zijlstra provided me with a nice brown paper bag while letting me know
that I was doing a logical AND and not a binary one, making a condition
true more often than it should be.
Luckily, a false true is handled by the calling function and no harm is
done. But this needs to be fixed regardless.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Michael Ellerman [Wed, 20 Aug 2008 23:36:11 +0000 (16:36 -0700)]
ftrace: make output nicely spaced for up to 999 cpus
Currently some of the ftrace output goes skewiff if you have more
than 9 cpus, and some if you have more than 99.
Twiddle with the headers and format strings to make up to 999 cpus
display without causing spacing problems.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 4 Sep 2008 13:04:37 +0000 (15:04 +0200)]
stack tracer: depends on DEBUG_KERNEL
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 29 Aug 2008 20:51:43 +0000 (16:51 -0400)]
ftrace: stack trace add indexes
This patch adds indexes into the stack that the functions in the
stack dump were found at. As an added bonus, I also added a diff
to show which function is the most notorious consumer of the stack.
The output now looks like this:
# cat /debug/tracing/stack_trace
Depth Size Location (48 entries)
----- ---- --------
0) 2476 212 blk_recount_segments+0x39/0x59
1) 2264 12 bio_phys_segments+0x16/0x1d
2) 2252 20 blk_rq_bio_prep+0x23/0xaf
3) 2232 12 init_request_from_bio+0x74/0x77
4) 2220 56 __make_request+0x294/0x331
5) 2164 136 generic_make_request+0x34f/0x37d
6) 2028 56 submit_bio+0xe7/0xef
7) 1972 28 submit_bh+0xd1/0xf0
8) 1944 112 block_read_full_page+0x299/0x2a9
9) 1832 8 blkdev_readpage+0x14/0x16
10) 1824 28 read_cache_page_async+0x7e/0x109
11) 1796 16 read_cache_page+0x11/0x49
12) 1780 32 read_dev_sector+0x3c/0x72
13) 1748 48 read_lba+0x4d/0xaa
14) 1700 168 efi_partition+0x85/0x61b
15) 1532 72 rescan_partitions+0x10e/0x266
16) 1460 40 do_open+0x1c7/0x24e
17) 1420 292 __blkdev_get+0x79/0x84
18) 1128 12 blkdev_get+0x12/0x14
19) 1116 20 register_disk+0xd1/0x11e
20) 1096 28 add_disk+0x34/0x90
21) 1068 52 sd_probe+0x2b1/0x366
22) 1016 20 driver_probe_device+0xa5/0x120
23) 996 8 __device_attach+0xd/0xf
24) 988 32 bus_for_each_drv+0x3e/0x68
25) 956 24 device_attach+0x56/0x6c
26) 932 16 bus_attach_device+0x26/0x4d
27) 916 64 device_add+0x380/0x4b4
28) 852 28 scsi_sysfs_add_sdev+0xa1/0x1c9
29) 824 160 scsi_probe_and_add_lun+0x919/0xa2a
30) 664 36 __scsi_add_device+0x88/0xae
31) 628 44 ata_scsi_scan_host+0x9e/0x21c
32) 584 28 ata_host_register+0x1cb/0x1db
33) 556 24 ata_host_activate+0x98/0xb5
34) 532 192 ahci_init_one+0x9bd/0x9e9
35) 340 20 pci_device_probe+0x3e/0x5e
36) 320 20 driver_probe_device+0xa5/0x120
37) 300 20 __driver_attach+0x3f/0x5e
38) 280 36 bus_for_each_dev+0x40/0x62
39) 244 12 driver_attach+0x19/0x1b
40) 232 28 bus_add_driver+0x9c/0x1af
41) 204 28 driver_register+0x76/0xd2
42) 176 20 __pci_register_driver+0x44/0x71
43) 156 8 ahci_init+0x14/0x16
44) 148 100 _stext+0x42/0x122
45) 48 20 kernel_init+0x175/0x1dc
46) 28 28 kernel_thread_helper+0x7/0x10
The first column is simply an index starting from the inner most function
and counting down to the outer most.
The next column is the location that the function was found on the stack.
The next column is the size of the stack for that function.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Wed, 27 Aug 2008 17:02:01 +0000 (13:02 -0400)]
ftrace: remove warning of old objcopy and local functions
The warning messages about old objcopy and local functions spam the
user quite drastically. Remove the warning until we can find a nicer
way of tell the user to upgrade their objcopy.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 28 Aug 2008 03:24:15 +0000 (23:24 -0400)]
ftrace: remove direct reference to mcount in trace code
The mcount record method of ftrace scans objdump for references to mcount.
Using mcount as the reference to test if the calls to mcount being replaced
are indeed calls to mcount, this use of mcount was also caught as a
location to change. Using a variable that points to the mcount address
moves this reference into the data section that is not scanned, and
we do not use a false location to try and modify.
The warn on code was what was used to detect this bug.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 28 Aug 2008 03:31:01 +0000 (23:31 -0400)]
ftrace: add stack tracer
This is another tracer using the ftrace infrastructure, that examines
at each function call the size of the stack. If the stack use is greater
than the previous max it is recorded.
You can always see (and set) the max stack size seen. By setting it
to zero will start the recording again. The backtrace is also available.
For example:
# cat /debug/tracing/stack_max_size
1856
# cat /debug/tracing/stack_trace
[<
c027764d>] stack_trace_call+0x8f/0x101
[<
c021b966>] ftrace_call+0x5/0x8
[<
c02553cc>] clocksource_get_next+0x12/0x48
[<
c02542a5>] update_wall_time+0x538/0x6d1
[<
c0245913>] do_timer+0x23/0xb0
[<
c0257657>] tick_do_update_jiffies64+0xd9/0xf1
[<
c02576b9>] tick_sched_timer+0x4a/0xad
[<
c0250fe6>] __run_hrtimer+0x3e/0x75
[<
c02518ed>] hrtimer_interrupt+0xf1/0x154
[<
c022c870>] smp_apic_timer_interrupt+0x71/0x84
[<
c021b7e9>] apic_timer_interrupt+0x2d/0x34
[<
c0238597>] finish_task_switch+0x29/0xa0
[<
c05abd13>] schedule+0x765/0x7be
[<
c05abfca>] schedule_timeout+0x1b/0x90
[<
c05ab4d4>] wait_for_common+0xab/0x101
[<
c05ab5ac>] wait_for_completion+0x12/0x14
[<
c033cfc3>] blk_execute_rq+0x84/0x99
[<
c0402470>] scsi_execute+0xc2/0x105
[<
c040250a>] scsi_execute_req+0x57/0x7f
[<
c043afe0>] sr_test_unit_ready+0x3e/0x97
[<
c043bbd6>] sr_media_change+0x43/0x205
[<
c046b59f>] media_changed+0x48/0x77
[<
c046b5ff>] cdrom_media_changed+0x31/0x37
[<
c043b091>] sr_block_media_changed+0x16/0x18
[<
c02b9e69>] check_disk_change+0x1b/0x63
[<
c046f4c3>] cdrom_open+0x7a1/0x806
[<
c043b148>] sr_block_open+0x78/0x8d
[<
c02ba4c0>] do_open+0x90/0x257
[<
c02ba869>] blkdev_open+0x2d/0x56
[<
c0296a1f>] __dentry_open+0x14d/0x23c
[<
c0296b32>] nameidata_to_filp+0x24/0x38
[<
c02a1c68>] do_filp_open+0x347/0x626
[<
c02967ef>] do_sys_open+0x47/0xbc
[<
c02968b0>] sys_open+0x23/0x2b
[<
c021aadd>] sysenter_do_call+0x12/0x26
I've tested this on both x86_64 and i386.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andrew Morton [Wed, 27 Aug 2008 07:08:30 +0000 (09:08 +0200)]
kbuild: ftrace: don't assume that scripts/recordmcount.pl is executable
CHK include/linux/version.h
CHK include/linux/utsrelease.h
CC scripts/mod/empty.o
/bin/sh: /usr/src/25/scripts/recordmcount.pl: Permission denied
We shouldn't assume that files have their `x' bits set. There are various
ways in which file permissions get lost, including use of patch(1).
It might not be correct to assume that perl lives in $PATH?
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Mon, 25 Aug 2008 18:52:11 +0000 (14:52 -0400)]
ftrace: objcopy version test for local symbols
The --globalize-symbols option came out in objcopy version 2.17.
If the kernel is being compiled on a system with a lower version of
objcopy, then we can not use the globalize / localize trick to
link to symbols pointing to local functions.
This patch tests the version of objcopy and will only use the trick
if the version is greater than or equal to 2.17. Otherwise, if an
object has only local functions within a section, it will give a
nice warning and recommend the user to upgrade their objcopy.
Leaving the symbols unrecorded is not that big of a deal, since the
mcount record method changes the actual mcount code to be a simple
"ret" without recording registers or anything.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 25 Aug 2008 06:12:04 +0000 (08:12 +0200)]
ftrace: clean up macro usage
enclose the argument in parenthesis. (especially since we cast it,
which is a high prio operation)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephen Rothwell [Mon, 25 Aug 2008 03:08:44 +0000 (13:08 +1000)]
ftrace: fix build failure
After disabling FTRACE_MCOUNT_RECORD via a patch, a dormant build
failure surfaced:
kernel/trace/ftrace.c: In function 'ftrace_record_ip':
kernel/trace/ftrace.c:416: error: incompatible type for argument 1 of '_spin_lock_irqsave'
kernel/trace/ftrace.c:433: error: incompatible type for argument 1 of '_spin_lock_irqsave'
Introduced by commit
6dad8e07f4c10b17b038e84d29f3ca41c2e55cd0 ("ftrace:
add necessary locking for ftrace records").
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Wed, 20 Aug 2008 16:55:07 +0000 (12:55 -0400)]
ftrace: x86 use copy to and from user functions
The modification of code is performed either by kstop_machine, before
SMP starts, or on module code before the module is executed. There is
no reason to do the modifications from assembly. The copy to and from
user functions are sufficient and produces cleaner and easier to read
code.
Thanks to Benjamin Herrenschmidt for suggesting the idea.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Wed, 20 Aug 2008 14:07:35 +0000 (10:07 -0400)]
ftrace: handle weak symbol functions
During tests and checks, I've discovered that there were failures to
convert mcount callers into nops. Looking deeper into these failures,
code that was attempted to be changed was not an mcount caller.
The current code only updates if the code being changed is what it expects,
but I still investigate any time there is a failure.
What was happening is that a weak symbol was being used as a reference
for other mcount callers. That weak symbol was also referenced elsewhere
so the offsets were using the strong symbol and not the function symbol
that it was referenced from.
This patch changes the setting up of the mcount_loc section to search
for a global function that is not weak. It will pick a local over a weak
but if only a weak is found in a section, a warning is printed and the
mcount location is not recorded (just to be safe).
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 15 Aug 2008 15:40:24 +0000 (11:40 -0400)]
ftrace: update recordmount.pl arch changes
I'm trying to keep all the arch changes in recordmcount.pl in one place.
I moved your code into that area, by adding the flags to the commands
that were passed in.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge [Mon, 18 Aug 2008 22:58:12 +0000 (15:58 -0700)]
ftrace: fix build problem with CONFIG_FTRACE
I'm seeing when I use separate src/build dirs:
make[3]: *** [arch/x86/kernel/time_32.o] Error 1
/bin/sh: scripts/recordmcount.pl: No such file or directory
make[3]: *** [arch/x86/kernel/irq_32.o] Error 1
/bin/sh: scripts/recordmcount.pl: No such file or directory
make[3]: *** [arch/x86/kernel/ldt.o] Error 1
/bin/sh: scripts/recordmcount.pl: No such file or directory
make[3]: *** [arch/x86/kernel/i8259.o] Error 1
/bin/sh: scripts/recordmcount.pl: No such file or directory
This fixes it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Huang Ying [Mon, 18 Aug 2008 08:24:56 +0000 (16:24 +0800)]
ftrace: fix incorrect comment style of __ftrace_enabled_save()
This patch fixes incorrect comment style of __ftrace_enabled_save().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Sat, 16 Aug 2008 01:40:05 +0000 (21:40 -0400)]
ftrace: add necessary locking for ftrace records
The new design of pre-recorded mcounts and updating the code outside of
kstop_machine has changed the way the records themselves are protected.
This patch uses the ftrace_lock to protect the records. Note, the lock
still does not need to be taken within calls that are only called via
kstop_machine, since the that code can not run while the spin lock is held.
Also removed the hash_lock needed for the daemon when MCOUNT_RECORD is
configured. Also did a slight cleanup of an unused variable.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Sat, 16 Aug 2008 01:40:04 +0000 (21:40 -0400)]
ftrace: do not init module on ftrace disabled
If one of the self tests of ftrace has disabled the function tracer,
do not run the code to convert the mcount calls in modules.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frédéric Weisbecker [Fri, 15 Aug 2008 19:08:22 +0000 (21:08 +0200)]
ftrace: fix some mistakes in error messages
This patch fixes some mistakes on the tracer in warning messages when
debugfs fails to create tracing files.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: srostedt@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 9 Jun 2008 18:54:22 +0000 (20:54 +0200)]
ftrace: scripts/recordmcount.pl cross-build hack
hack around:
ld: Relocatable linking with relocations from format elf32-i386 (init/.tmp_gl_calibrate.o) to format elf64-x86-64 (init/.tmp_mx_calibrate.o) i CC arch/x86/mm/extable.o
objcopy: 'init/.tmp_mx_calibrate.o': No such file
rm: cannot remove `init/.tmp_mx_calibrate.o': No such file or directory
ld: Relocatable linking with relocations from format elf32-i386 (arch/x86/mm/extable.o) to format elf64-x86-64 (arch/x86/mm/.tmp_mx_extable.o) is not supported
mv: cannot stat `arch/x86/mm/.tmp_mx_extable.o': No such file or directory
ld: Relocatable linking with relocations from format elf32-i386 (arch/x86/mm/fault.o) to format elf64-x86-64 (arch/x86/mm/.tmp_mx_fault.o) is not supported
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 15 Aug 2008 16:22:09 +0000 (18:22 +0200)]
ftrace: ftrace_kill_atomic() build fix
fix:
kernel/built-in.o: In function `ftrace_dump':
(.text+0x2e2ea): undefined reference to `ftrace_kill_atomic'
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 15 Aug 2008 15:48:02 +0000 (17:48 +0200)]
ftrace: build fix
fix:
In file included from init/main.c:65:
include/linux/ftrace.h:166: error: expected ‘,' or ‘;' before ‘{' token
make[1]: *** [init/main.o] Error 1
make: *** [init/main.o] Error 2
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 31 Jul 2008 02:36:46 +0000 (22:36 -0400)]
ftrace: dump out ftrace buffers to console on panic
At OLS I had a lot of interest to be able to have the ftrace buffers
dumped on panic. Usually one would expect to uses kexec and examine
the buffers after a new kernel is loaded. But sometimes the resources
do not permit kdump and kexec, so having an option to still see the
sequence of events up to the crash is very advantageous.
This patch adds the option to have the ftrace buffers dumped to the
console in the latency_trace format on a panic. When the option is set,
the default entries per CPU buffer are lowered to 16384, since the writing
to the serial (if that is the console) may take an awful long time
otherwise.
[
Changes since -v1:
Got alpine to send correctly (as well as spell check working).
Removed config option.
Moved the static variables into ftrace_dump itself.
Gave printk a log level.
]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 1 Aug 2008 20:45:49 +0000 (16:45 -0400)]
ftrace: ftrace_printk doc moved
Based on Randy Dunlap's suggestion, the ftrace_printk kernel-doc belongs
with the ftrace_printk macro that should be used. Not with the
__ftrace_printk internal function.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 1 Aug 2008 16:26:41 +0000 (12:26 -0400)]
ftrace: printk formatting infrastructure
This patch adds a feature that can help kernel developers debug their
code using ftrace.
int ftrace_printk(const char *fmt, ...);
This records into the ftrace buffer using printf formatting. The entry
size in the buffers are still a fixed length. A new type has been added
that allows for more entries to be used for a single recording.
The start of the print is still the same as the other entries.
It returns the number of characters written to the ftrace buffer.
For example:
Having a module with the following code:
static int __init ftrace_print_test(void)
{
ftrace_printk("jiffies are %ld\n", jiffies);
return 0;
}
Gives me:
insmod-5441 3...1 7569us : ftrace_print_test: jiffies are
4296626666
for the latency_trace file and:
insmod-5441 [03] 1959.370498: ftrace_print_test jiffies are
4296626666
for the trace file.
Note: Only the infrastructure should go into the kernel. It is to help
facilitate debugging for other kernel developers. Calls to ftrace_printk
is not intended to be left in the kernel, and should be frowned upon just
like scattering printks around in the code.
But having this easily at your fingertips helps the debugging go faster
and bugs be solved quicker.
Maybe later on, we can hook this with markers and have their printf format
be sucked into ftrace output.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 1 Aug 2008 16:26:40 +0000 (12:26 -0400)]
ftrace: new continue entry - separate out from trace_entry
Some tracers will need to work with more than one entry. In order to do this
the trace_entry structure was split into two fields. One for the start of
all entries, and one to continue an existing entry.
The trace_entry structure now has a "field" entry that consists of the previous
content of the trace_entry, and a "cont" entry that is just a string buffer
the size of the "field" entry.
Thanks to Andrew Morton for suggesting this idea.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 15 Aug 2008 02:47:19 +0000 (22:47 -0400)]
ftrace: remove old pointers to mcount
When a mcount pointer is recorded into a table, it is used to add or
remove calls to mcount (replacing them with nops). If the code is removed
via removing a module, the pointers still exist. At modifying the code
a check is always made to make sure the code being replaced is the code
expected. In-other-words, the code being replaced is compared to what
it is expected to be before being replaced.
There is a very small chance that the code being replaced just happens
to look like code that calls mcount (very small since the call to mcount
is relative). To remove this chance, this patch adds ftrace_release to
allow module unloading to remove the pointers to mcount within the module.
Another change for init calls is made to not trace calls marked with
__init. The tracing can not be started until after init is done anyway.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 15 Aug 2008 02:47:18 +0000 (22:47 -0400)]
ftrace: move notrace to compiler.h
The notrace define belongs in compiler.h so that it can be used in
init.h
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 15 Aug 2008 02:47:17 +0000 (22:47 -0400)]
ftrace: do not show freed records in available_filter_functions
Seems that freed records can appear in the available_filter_functions list.
This patch fixes that.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 14 Aug 2008 22:05:05 +0000 (18:05 -0400)]
ftrace: use only 5 byte nops for x86
Mathieu Desnoyers revealed a bug in the original code. The nop that is
used to relpace the mcount caller can be a two part nop. This runs the
risk where a process can be preempted after executing the first nop, but
before the second part of the nop.
The ftrace code calls kstop_machine to keep multiple CPUs from executing
code that is being modified, but it does not protect against a task preempting
in the middle of a two part nop.
If the above preemption happens and the tracer is enabled, after the
kstop_machine runs, all those nops will be calls to the trace function.
If the preempted process that was preempted between the two nops is executed
again, it will execute half of the call to the trace function, and this
might crash the system.
This patch instead uses what both the latest Intel and AMD spec suggests.
That is the P6_NOP5 sequence of "0x0f 0x1f 0x44 0x00 0x00".
Note, some older CPUs and QEMU might fault on this nop, so this nop
is executed with fault handling first. If it detects a fault, it will then
use the code "0x66 0x66 0x66 0x66 0x90". If that faults, it will then
default to a simple "jmp 1f; .byte 0x00 0x00 0x00; 1:". The jmp is
not optimal but will do if the first two can not be executed.
TODO: Examine the cpuid to determine the nop to use.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 14 Aug 2008 19:45:12 +0000 (15:45 -0400)]
ftrace: x86 mcount stub
x86 now sets up the mcount locations through the build and no longer
needs to record the ip when the function is executed. This patch changes
the initial mcount to simply return. There's no need to do any other work.
If the ftrace start up test fails, the original mcount will be what everything
will use, so having this as fast as possible is a good thing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 14 Aug 2008 19:45:11 +0000 (15:45 -0400)]
ftrace: enable using mcount recording on x86
Enable the use of the __mcount_loc infrastructure on x86_64 and i386.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 14 Aug 2008 19:45:10 +0000 (15:45 -0400)]
ftrace: rebuild everything on change to FTRACE_MCOUNT_RECORD
When enabling or disabling CONFIG_FTRACE_MCOUNT_RECORD, we want a full
kernel compile to handle the adding of the __mcount_loc sections.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 14 Aug 2008 19:45:09 +0000 (15:45 -0400)]
ftrace: enable mcount recording for modules
This patch enables the loading of the __mcount_section of modules and
changing all the callers of mcount into nops.
The modification is done before the init_module function is called, so
again, we do not need to use kstop_machine to make these changes.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 14 Aug 2008 19:45:08 +0000 (15:45 -0400)]
ftrace: mcount call site on boot nops core
This is the infrastructure to the converting the mcount call sites
recorded by the __mcount_loc section into nops on boot. It also allows
for using these sites to enable tracing as normal. When the __mcount_loc
section is used, the "ftraced" kernel thread is disabled.
This uses the current infrastructure to record the mcount call sites
as well as convert them to nops. The mcount function is kept as a stub
on boot up and not converted to the ftrace_record_ip function. We use the
ftrace_record_ip to only record from the table.
This patch does not handle modules. That comes with a later patch.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 14 Aug 2008 19:45:07 +0000 (15:45 -0400)]
ftrace: create __mcount_loc section
This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.
For example:
objdump -dr init/main.o
[...]
Disassembly of section .text:
0000000000000000 <do_one_initcall>:
0: 55 push %rbp
[...]
000000000000017b <init_post>:
17b: 55 push %rbp
17c: 48 89 e5 mov %rsp,%rbp
17f: 53 push %rbx
180: 48 83 ec 08 sub $0x8,%rsp
184: e8 00 00 00 00 callq 189 <init_post+0xe>
185: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]
We will add a section to point to each function call.
.section __mcount_loc,"a",@progbits
[...]
.quad .text + 0x185
[...]
The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.
.text + 0x185 == init_post + 0xa
We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link. The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless. We need to keep track of where these
.text sections are.
To do this, we use the start of the first function in the section.
do_one_initcall. We can make a tmp.s file with this function as a reference
to the start of the .text section.
.section __mcount_loc,"a",@progbits
[...]
.quad do_one_initcall + 0x185
[...]
Then we can compile the tmp.s into a tmp.o
gcc -c tmp.s -o tmp.o
And link it into back into main.o.
ld -r main.o tmp.o -o tmp_main.o
mv tmp_main.o main.o
But we have a problem. What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it. This case exists in main.o as well.
Disassembly of section .init.text:
0000000000000000 <set_reset_devices>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc
The first function in .init.text is a static function.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.
.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
U set_reset_devices
We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.
To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.
Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Tue, 29 Jul 2008 10:36:02 +0000 (12:36 +0200)]
ftrace: mark lapic_wd_event() notrace
it can be called in the NMI path:
[ 0.645999] calling ftrace_dynamic_init+0x0/0xd6
[ 0.647521] ------------[ cut here ]------------
[ 0.647521] WARNING: at kernel/trace/ftrace.c:348 ftrace_record_ip+0x4e/0x252()
[ 0.647521] Modules linked in:
[ 0.647521] Pid: 15, comm: kstop1 Not tainted 2.6.27-rc1-tip #22686
[ 0.647521]
[ 0.647521] Call Trace:
[ 0.647521] <NMI> [<
ffffffff8024593f>] warn_on_slowpath+0x5d/0x84
[ 0.647521] [<
ffffffff80220b99>] ? lapic_wd_event+0xb/0x5c
[ 0.647521] [<
ffffffff80287b3b>] ftrace_record_ip+0x4e/0x252
[ 0.647521] [<
ffffffff80211274>] mcount_call+0x5/0x31
[ 0.647521] [<
ffffffff80220b9e>] ? lapic_wd_event+0x10/0x5c
[ 0.647521] [<
ffffffff8083f3ec>] nmi_watchdog_tick+0x19d/0x1ad
[ 0.647521] [<
ffffffff8083e875>] default_do_nmi+0x75/0x1e3
[ 0.647521] [<
ffffffff8083f0b3>] do_nmi+0x5d/0x94
[ 0.647521] [<
ffffffff8083e2d2>] nmi+0xa2/0xc2
[ 0.647521] [<
ffffffff802b48c3>] ? check_bytes_and_report+0x11/0xcc
[ 0.647521] <<EOE>> [<
ffffffff80211274>] ? mcount_call+0x5/0x31
[ 0.647521] [<
ffffffff802b49df>] check_object+0x61/0x1b0
[ 0.647521] [<
ffffffff802b502a>] __slab_free+0x169/0x2ae
[ 0.647521] [<
ffffffff80242dbf>] ? __cleanup_sighand+0x25/0x27
[ 0.647521] [<
ffffffff80242dbf>] ? __cleanup_sighand+0x25/0x27
[ 0.647521] [<
ffffffff802b60cd>] kmem_cache_free+0x85/0xb9
[ 0.647521] [<
ffffffff80242dbf>] __cleanup_sighand+0x25/0x27
[ 0.647521] [<
ffffffff80247b3d>] release_task+0x256/0x339
[ 0.647521] [<
ffffffff802490b4>] do_exit+0x764/0x7ef
[ 0.647521] [<
ffffffff8027624c>] __xchg+0x0/0x38
[ 0.647521] [<
ffffffff8027619a>] ? stop_cpu+0x0/0xb2
[ 0.647521] [<
ffffffff8027619a>] ? stop_cpu+0x0/0xb2
[ 0.647521] [<
ffffffff8025922f>] kthread+0x4e/0x7b
[ 0.647521] [<
ffffffff80212979>] child_rip+0xa/0x11
[ 0.647521] [<
ffffffff80211c17>] ? restore_args+0x0/0x30
[ 0.647521] [<
ffffffff802283a5>] ? native_load_tls+0x14/0x2e
[ 0.647521] [<
ffffffff802591e1>] ? kthread+0x0/0x7b
[ 0.647521] [<
ffffffff8021296f>] ? child_rip+0x0/0x11
[ 0.647521]
[ 0.647521] ---[ end trace
4eaa2a86a8e2da22 ]---
[ 0.672032] initcall ftrace_dynamic_init+0x0/0xd6 returned 0 after 19 msecs
also mark it no-kprobes while at it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Tue, 29 Jul 2008 10:00:59 +0000 (12:00 +0200)]
ftrace: ignore functions that cannot be kprobe-ed
kprobes already has an extensive list of annotations for functions
that should not be instrumented. Add notrace annotations to these
functions as well.
This is particularly useful for functions called by the NMI path.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mathieu Desnoyers [Thu, 24 Jul 2008 20:37:23 +0000 (16:37 -0400)]
tracepoints: use TABLE_SIZE macro
Steven Rostedt suggested:
| Wouldn't it look nicer to have: (TRACEPOINT_TABLE_SIZE - 1) ?
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pekka Paalanen [Mon, 21 Jul 2008 15:49:56 +0000 (18:49 +0300)]
x86: fix mmiotrace 8-bit register decoding
When SIL, DIL, BPL or SPL registers were used in MMIO, the datum
was extracted from AH, BH, CH, or DH, which are incorrect.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: "Vegard Nossum" <vegard.nossum@gmail.com>
Cc: "Steven Rostedt" <srostedt@redhat.com>
Cc: proski@gnu.org
Cc: "Pekka Enberg"
<penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 23 Jul 2008 12:15:22 +0000 (14:15 +0200)]
tracing: clean up tracepoints kconfig structure
do not expose users to CONFIG_TRACEPOINTS - tracers can select it
just fine.
update ftrace to select CONFIG_TRACEPOINTS.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 23 Jul 2008 11:48:22 +0000 (13:48 +0200)]
sched: clean up tracepoints
make it a bit more structured hence more readable.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 23 Jul 2008 11:38:00 +0000 (13:38 +0200)]
tracing: disable tracepoints by default
while it's arguably low overhead, we dont enable new features by default.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mathieu Desnoyers [Fri, 18 Jul 2008 16:16:17 +0000 (12:16 -0400)]
ftrace: port to tracepoints
Porting the trace_mark() used by ftrace to tracepoints. (cleanup)
Changelog :
- Change error messages : marker -> tracepoint
[ mingo@elte.hu: conflict resolutions ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mathieu Desnoyers [Fri, 18 Jul 2008 16:16:17 +0000 (12:16 -0400)]
tracing, sched: LTTng instrumentation - scheduler
Instrument the scheduler activity (sched_switch, migration, wakeups,
wait for a task, signal delivery) and process/thread
creation/destruction (fork, exit, kthread stop). Actually, kthread
creation is not instrumented in this patch because it is architecture
dependent. It allows to connect tracers such as ftrace which detects
scheduling latencies, good/bad scheduler decisions. Tools like LTTng can
export this scheduler information along with instrumentation of the rest
of the kernel activity to perform post-mortem analysis on the scheduler
activity.
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added. See the "Tracepoints" patch header for
performance result detail.
Changelog :
- Change instrumentation location and parameter to match ftrace
instrumentation, previously done with kernel markers.
[ mingo@elte.hu: conflict resolutions ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mathieu Desnoyers [Fri, 18 Jul 2008 16:16:16 +0000 (12:16 -0400)]
tracing: tracepoints, samples
Tracepoint example code under samples/.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mathieu Desnoyers [Fri, 18 Jul 2008 16:16:16 +0000 (12:16 -0400)]
tracing: tracepoints, documentation
Documentation of tracepoint usage.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mathieu Desnoyers [Fri, 18 Jul 2008 16:16:16 +0000 (12:16 -0400)]
tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Mon, 13 Oct 2008 16:54:35 +0000 (09:54 -0700)]
Merge phase #5 (misc) of git://git./linux/kernel/git/tip/linux-2.6-tip
Merges oprofile, timers/hpet, x86/traps, x86/time, and x86/core misc items.
* 'x86-core-v4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (132 commits)
x86: change early_ioremap to use slots instead of nesting
x86: adjust dependencies for CONFIG_X86_CMOV
dumpstack: x86: various small unification steps, fix
x86: remove additional_cpus
x86: remove additional_cpus configurability
x86: improve UP kernel when CPU-hotplug and SMP is enabled
dumpstack: x86: various small unification steps
dumpstack: i386: make kstack= an early boot-param and add oops=panic
dumpstack: x86: use log_lvl and unify trace formatting
dumptrace: x86: consistently include loglevel, print stack switch
dumpstack: x86: add "end" parameter to valid_stack_ptr and print_context_stack
dumpstack: x86: make printk_address equal
dumpstack: x86: move die_nmi to dumpstack_32.c
traps: x86: finalize unification of traps.c
traps: x86: make traps_32.c and traps_64.c equal
traps: x86: various noop-changes preparing for unification of traps_xx.c
traps: x86_64: use task_pid_nr(tsk) instead of tsk->pid in do_general_protection
traps: i386: expand clear_mem_error and remove from mach_traps.h
traps: x86_64: make io_check_error equal to the one on i386
traps: i386: use preempt_conditional_sti/cli in do_int3
...
Alan Cox [Mon, 13 Oct 2008 09:46:24 +0000 (10:46 +0100)]
tty: rename the remaining oddly named n_tty functions
Original idea for this from a patch by Rodolfo Giometti which merges various
bits of PPS support
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:46:09 +0000 (10:46 +0100)]
fs3270: Correct error returns
Drop the kernel lock further and also correct cases where we set rc to an
error code, and then return 0
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:45:52 +0000 (10:45 +0100)]
fs3270: remove extra locks
get_current_tty now does internal locking and returns a referenced object,
thus our use of tty_mutex here can go away.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jason Wessel [Mon, 13 Oct 2008 09:45:36 +0000 (10:45 +0100)]
tty: tty_io.c shadows sparse fix
drivers/char/tty_io.c:1413:17: warning: symbol 'buf' shadows an earlier one
drivers/char/tty_io.c:1379:20: originally declared here
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Mon, 13 Oct 2008 09:45:26 +0000 (10:45 +0100)]
serial: fix device name reporting when minor space is shared between drivers
The multiple drivers share the minor space occupied by a particular major
number, the actual index within the device name's space is indicated by
the tty_driver->name_base + uart_port->line
Another usable formula is (uart_driver->minor - MINOR_BASE) + port->line
Use those to print the device names properly in such situations in
serial_core.c and 8250.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:45:17 +0000 (10:45 +0100)]
applicom: Fix an unchecked user ioctl range and an error return
Closes bug #11408 by checking the card index range for command 0
Fixes the ioctl to return ENOTTY which is correct for unknown ioctls
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:45:06 +0000 (10:45 +0100)]
tty: Minor tidyups and document fixes for n_tty
Remove/fix some bogus NULL checks, comment some locking etc
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:44:57 +0000 (10:44 +0100)]
tty: Remove lots of NULL checks
Many tty drivers contain 'can't happen' checks against NULL pointers passed
in by the tty layer. These have never been possible to occur. Even more
importantly if they ever do occur we want to know as it would be a serious
bug.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:44:43 +0000 (10:44 +0100)]
tty: fix up gigaset a bit
Stephen's fixes reminded me that gigaset is still rather broken so fix it up
a bit
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Rothwell [Mon, 13 Oct 2008 09:44:33 +0000 (10:44 +0100)]
tty: Fallout from tty-move-canon-specials
Today's linux-next build (x86_64 allmodconfig) failed like this:
/drivers/char/tty_ioctl.c: In function 'change_termios':
drivers/isdn/capi/capi.c:1234: error: implicit declaration of function 'n_tty_ioctl'
drivers/isdn/gigaset/ser-gigaset.c: In function 'gigaset_tty_ioctl':
drivers/isdn/gigaset/ser-gigaset.c:648: error: implicit declaration of function 'n_tty_ioctl'
Introduced by commit
686b5e4aea05a80e370dc931b7f4a8d03c80da54
("tty-move-canon-specials"). I added the following patch (which may not
be correct).
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:44:17 +0000 (10:44 +0100)]
tty: some ICANON magic is in the wrong places
Move the set up on ldisc change into the ldisc
Move the INQ/OUTQ cases into the driver not in shared ioctl code where it
gives bogus answers for other ldisc values
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:44:08 +0000 (10:44 +0100)]
tty: simplify ktermios allocation
Copy the simplification from the pty unix98 special case to the generic one.
This allows us to kill off driver->termios_locked entirely which is nice. We
have to whack bits of the cris driver as it meddles in places it shouldn't
providing its own arrays that were never used anyway.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:43:58 +0000 (10:43 +0100)]
pty: simplify unix98 allocation
We need both termios and termios_locked so allocate them as one
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:43:48 +0000 (10:43 +0100)]
pty: Fix allocation failure double free
The updating and moving around of the pty code added a bug where both the
helper and caller free the main tty struct (the pty driver must free the
o_tty pair itself however).
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:43:38 +0000 (10:43 +0100)]
pty: Coding style and polish
We've done the heavy lifting now its time to mop up a bit
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sukadev Bhattiprolu [Mon, 13 Oct 2008 09:43:27 +0000 (10:43 +0100)]
Simplify devpts_pty_kill
When creating a new pty, save the pty's inode in the tty->driver_data.
Use this inode in pty_kill() to identify the devpts instance. Since
we now have the inode for the pty, we can skip get_node() lookup and
remove the unused get_node().
TODO:
- check if the mutex_lock is needed in pty_kill().
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sukadev Bhattiprolu [Mon, 13 Oct 2008 09:43:18 +0000 (10:43 +0100)]
Simplify devpts_pty_new()
devpts_pty_new() is called when setting up a new pty and would not
will not have an existing dentry or inode for the pty. So don't bother
looking for an existing dentry - just create a new one.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sukadev Bhattiprolu [Mon, 13 Oct 2008 09:43:08 +0000 (10:43 +0100)]
Simplify devpts_get_tty()
As pointed out by H. Peter Anvin, since the inode for the pty is known,
we don't need to look it up.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sukadev Bhattiprolu [Mon, 13 Oct 2008 09:42:59 +0000 (10:42 +0100)]
Add an instance parameter devpts interfaces
Pass-in 'inode' or 'tty' parameter to devpts interfaces. With multiple
devpts instances, these parameters will be used in subsequent patches
to identify the instance of devpts mounted. The parameters also help
simplify devpts implementation.
Changelog[v3]:
- minor changes due to merge with ttydev updates
- rename parameters to emphasize they are ptmx or pts inodes
- pass-in tty_struct * to devpts_pty_kill() (this will help
cleanup the get_node() call in a subsequent patch)
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sukadev Bhattiprolu [Mon, 13 Oct 2008 09:42:49 +0000 (10:42 +0100)]
Move tty lookup/reopen to caller
Move tty_driver_lookup_tty() and tty_reopen() from tty_init_dev()
into tty_open() (one of the two callers of tty_init_dev()). These
calls are not really required in ptmx_open(), the other caller,
since ptmx_open() would be setting up a new tty.
Changelog[v2]:
- remove the lookup and reopen calls from ptmx_open
- merge with recent changes to ttydev tree
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:42:39 +0000 (10:42 +0100)]
tty: extract the pty init time special cases
The majority of the remaining init_dev code is pty special cases. We
refactor this code into the driver->install method.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:42:29 +0000 (10:42 +0100)]
tty: Finish fixing up the init_dev interface to use ERR_PTR
Original suggestion and proposal from Sukadev Bhattiprolu.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:42:19 +0000 (10:42 +0100)]
tty: More driver operations
We have the lookup operation abstracted which is nice for pty cleanup but
we really want to abstract the add/remove entries as well so that we can
pull the pty code out of the tty core and create a clear defined interface
for the tty driver table.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:42:09 +0000 (10:42 +0100)]
tty: kref the tty driver object
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:42:00 +0000 (10:42 +0100)]
tty: Clean up the tty_init_dev changes further
Fix up the naming, style and extract some bits of code into the driver
specific code
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sukadev Bhattiprolu [Mon, 13 Oct 2008 09:41:51 +0000 (10:41 +0100)]
tty: Move parts of tty_init_dev into new functions
Move the 'find-tty' and 'fast-track-open' parts of init_dev() to
separate functions.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:41:42 +0000 (10:41 +0100)]
tty: Remove more special casing and out of place code
Carry on pushing code out of tty_io when it belongs to other drivers. I'm
not 100% happy with some of this and it will be worth revisiting some of the
exports later when the restructuring work is done.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:41:30 +0000 (10:41 +0100)]
tty: shutdown method
Right now there are various drivers that try to use tty->count to know when
they get the final close. Aristeau Rozanski showed while debugging the vt
sysfs race that this isn't entirely safe.
Instead of driver side tricks to work around this introduce a shutdown which
is called when the tty is being destructed. This also means that the shutdown
method is tied into the refcounting.
Use this to rework the console close/sysfs logic.
Remove lots of special case code from the tty core code. The pty code can now
have a shutdown() method that replaces the special case hackery in the tree
free up paths.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:41:16 +0000 (10:41 +0100)]
vt: remove bogus lock dropping
For hysterical raisins the vt layer drops and retakes locks in the write
method. This is a left over from the days when user/kernel data was passed
directly to the tty not pre-buffered.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:41:03 +0000 (10:41 +0100)]
pty: If the administrator creates a device for a ptmx slave we should not error
The open path for ptmx slaves is via the ptmx device. Opening them any
other way is not allowed. Vegard Nossum found that previously this was not
the case and mknod foo c 128 42; cat foo would produce nasty diagnostics
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:40:53 +0000 (10:40 +0100)]
tty: Fix abusers of current->sighand->tty
Various people outside the tty layer still stick their noses in behind the
scenes. We need to make sure they also obey the locking and referencing rules.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:40:43 +0000 (10:40 +0100)]
tty: Redo current tty locking
Currently it is sometimes locked by the tty mutex and sometimes by the
sighand lock. The latter is in fact correct and now we can hand back referenced
objects we can fix this up without problems around sleeping functions.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:40:30 +0000 (10:40 +0100)]
tty: the vhangup syscall is racy
We now have the infrastructure to sort this out but rather than teaching
the syscall tty lock rules we move the hard work into a tty helper
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:40:19 +0000 (10:40 +0100)]
mxser: Switch to kref tty
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:40:07 +0000 (10:40 +0100)]
stallion: Use krefs
Use tty_port_init and krefs in the stallion drivers to protect us from devices
going away underneath us. As with the other drives some rearranging is done to
pass the tty structure down properly on the user side.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:39:58 +0000 (10:39 +0100)]
tty: kref usage for isicom and moxa
Rather than blindly keep taking krefs we reorder the code in a few places
to pass the tty down to the right place (which is important as from the user
side it is not the case that tty == port->tty in all situations). For the irq
and related paths use the krefs to stop the tty being freed under us.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:39:46 +0000 (10:39 +0100)]
tty: usb-serial krefs
Use kref in the USB serial drivers so that we don't free tty structures
from under the URB receive handlers as has historically been the case if
you were unlucky. This also gives us a framework for general tty drivers to
use tty_port objects and refcount.
Contains two err->dev_err changes merged together to fix clashes in the
-next tree.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:39:23 +0000 (10:39 +0100)]
tty: Move tty_write_message out of kernel/printk
This is pure tty code so put it in the tty layer where it can be with the
locking relevant material it uses
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:39:13 +0000 (10:39 +0100)]
tty: Make get_current_tty use a kref
We now return a kref covered tty reference. That ensures the tty structure
doesn't go away when you have a return from get_current_tty. This is not
enough to protect you from most of the resources being freed behind your
back - yet.
[Updated to include fixes for SELinux problems found by Andrew Morton and
an s390 leak found while debugging the former]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:39:01 +0000 (10:39 +0100)]
tty: compare the tty winsize
We always use the real tty one for stuff so the pty one should not be
compared. As we propagate window changes to both it doesn't currently
matter but will when we tidy up the pty termios logic a bit more
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:38:46 +0000 (10:38 +0100)]
tty: Termios locking - sort out real_tty confusions and lock reads
This moves us towards sanity and should mean our termios locking is now
complete and comprehensive.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:38:18 +0000 (10:38 +0100)]
tty: Add termiox
We need a way to describe the various additional modes and flow control
features that random weird hardware shows up and software such as wine
wants to emulate as Windows supports them.
TCGETX/TCSETX and the termiox ioctl are a SYS5 extension that we might as
well adopt. This patches adds the structures and the basic ioctl interfaces
when the TCGETX etc defines are added for an architecture. Drivers wishing
to use this stuff need to add new methods.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:38:07 +0000 (10:38 +0100)]
tty: ipw need reworking
This came in via another tree and unfortunately is rather broken on
the tty side. Comment the apparent locking problems for someone who knows
the driver to look at.
Fix the termios and other ioctl handling. The driver was calling the wrong
methods for what it wanted to do but the right ones existed so its a simple
fix up.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:37:48 +0000 (10:37 +0100)]
tty: Cris has a nice RS485 ioctl so we should steal it
JP Tosoni observed:
"About a RS485 ioctl: could you consider the attached files which are
already in the Linux kernel (in include/asm-cris). They define a
TIOCSERSETRS485 (ioctl.h), and the data structure (rs485.h)
with allows to specify timings. Sounds just like what we want ?"
and he's right: sort of. Rework the structure to use flag bits and make the
time delay a fixed sized field so we don't get 32/64bit problems. Add the ioctls
to x86 so that people know what to add to their platform of choice.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:37:36 +0000 (10:37 +0100)]
tty: use krefs to protect driver module counts
The tty layer keeps driver module counts that are used so the driver knows
when it can be unloaded. For obvious reasons we want to tie that to the
refcounting properly.
At this point the driver side itself isn't refcounted nicely but we can do
that later and kref the drivers.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:37:26 +0000 (10:37 +0100)]
tty: Add a kref count
Introduce a kref to the tty structure and use it to protect the tty->signal
tty references. For now we don't introduce it for anything else.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:37:17 +0000 (10:37 +0100)]
pps: Reserve a line discipline number for PPS
Add a new line discipline for "pulse per second" devices connected to
a serial port.
Signed-off-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:37:07 +0000 (10:37 +0100)]
tty: Split tty_port into its own file
Not much in it yet but this will grow a lot
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:36:58 +0000 (10:36 +0100)]
tty: split the buffering from tty_io
The two are basically independent chunks of code so lets split them up for
readability and sanity. It also makes the API boundaries much clearer.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:36:49 +0000 (10:36 +0100)]
uml: small cleanups and note bugs to be dealt with by uml authors...
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 13 Oct 2008 09:36:40 +0000 (10:36 +0100)]
tty: move tioclinux from a special case
Right now we have ifdefs and hooks in the core ioctl handler for TIOCLINUX
and then test if its a console. This is brain dead. Instead call the
tioclinux helper from the relevant driver ioctl methods.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>