J. Bruce Fields [Fri, 16 Feb 2007 09:28:36 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: acls: avoid unnecessary denies
We're inserting deny's between some ACEs in order to enforce posix draft acl
semantics which prevent permissions from accumulating across entries in an
acl.
That's fine, but we're doing that by inserting a deny after *every* allow,
which is overkill. We shouldn't be adding them in places where they actually
make no difference.
Also replaced some helper functions for creating acl entries; I prefer just
assigning directly to the struct fields--it takes a few more lines, but the
field names provide some documentation that I think makes the result easier
understand.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:34 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: acls: don't return explicit mask
Return just the effective permissions, and forget about the mask. It isn't
worth the complexity.
WARNING: This breaks backwards compatibility with overly-picky nfsv4->posix
acl translation, as may has been included in some patched versions of libacl.
To our knowledge no such version was every distributed by anyone outside citi.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:34 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: fix error return on unsupported acl
We should be returning ATTRNOTSUPP, not NOTSUPP, when acls are unsupported.
Also fix a comment.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:30 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: fix memory leak on kmalloc failure in savemem
The wrong pointer is being kfree'd in savemem() when defer_free returns with
an error.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:30 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: represent nfsv4 acl with array instead of linked list
Simplify the memory management and code a bit by representing acls with an
array instead of a linked list.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:29 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: simplify nfsv4->posix translation
The code that splits an incoming nfsv4 ACL into inheritable and effective
parts can be combined with the the code that translates each to a posix acl,
resulting in simpler code that requires one less pass through the ACL.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:28 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: relax checking of ACL inheritance bits
The rfc allows us to be more permissive about the ACL inheritance bits we
accept:
"If the server supports a single "inherit ACE" flag that applies to
both files and directories, the server may reject the request
(i.e., requiring the client to set both the file and directory
inheritance flags). The server may also accept the request and
silently turn on the ACE4_DIRECTORY_INHERIT_ACE flag."
Let's take the latter option--the ACL is a complex attribute that could be
rejected for a wide variety of reasons, and the protocol gives us little
ability to explain the reason for the rejection, so erroring out is a
user-unfriendly last resort.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:27 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: fix non-terminated string
The server name is expected to be a null-terminated string, so we can't pass
in the raw client identifier.
What's more, the client identifier is just a binary, not necessarily
printable, blob. Let's just use the ip address instead. The server name
appears to exist just to help debugging by making some printk's more
informative.
Note that the string is copies into the rpc client structure, so the pointer
to the local variable does not outlive the function call.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Fri, 16 Feb 2007 09:28:26 +0000 (01:28 -0800)]
[PATCH] small irq management simplification
Use mask_ack_irq() where possible.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Fri, 16 Feb 2007 09:28:25 +0000 (01:28 -0800)]
[PATCH] IRQ kernel-doc fixes
Fix kernel-doc warnings in IRQ management.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:28:24 +0000 (01:28 -0800)]
[PATCH] genirq: remove IRQ_DISABLED
Now that disable_irq() defaults to delayed-disable semantics, the IRQ_DISABLED
flag is not needed anymore.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:28:24 +0000 (01:28 -0800)]
[PATCH] genirq: do not mask interrupts by default
Never mask interrupts immediately upon request. Disabling interrupts in
high-performance codepaths is rare, and on the other hand this change could
recover lost edges (or even other types of lost interrupts) by conservatively
only masking interrupts after they happen. (NOTE: with this change the
highlevel irq-disable code still soft-disables this IRQ line - and if such an
interrupt happens then the IRQ flow handler keeps the IRQ masked.)
Mark i8529A controllers as 'never loses an edge'.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul E. McKenney [Fri, 16 Feb 2007 09:28:22 +0000 (01:28 -0800)]
[PATCH] posix timers: RCU optimization for clock_gettime()
Use RCU to avoid the need to acquire tasklist_lock in the single-threaded
case of clock_gettime(). It still acquires tasklist_lock when for a
(potentially multithreaded) process. This change allows realtime
applications to frequently monitor CPU consumption of individual tasks, as
requested (and now deployed) by some off-list users.
This has been in Ingo Molnar's -rt patchset since late 2005 with no
problems reported, and tests successfully on 2.6.20-rc6, so I believe that
it is long-since ready for mainline adoption.
[paulmck@linux.vnet.ibm.com: fix exit()/posix_cpu_clock_get() race spotted by Oleg]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
john stultz [Fri, 16 Feb 2007 09:28:21 +0000 (01:28 -0800)]
[PATCH] time: x86_64: re-enable vsyscall support for x86_64
Cleanup and re-enable vsyscall gettimeofday using the generic clocksource
infrastructure.
[akpm@osdl.org: cleanup]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
john stultz [Fri, 16 Feb 2007 09:28:20 +0000 (01:28 -0800)]
[PATCH] time: x86_64: convert x86_64 to use GENERIC_TIME
This patch converts x86_64 to use the GENERIC_TIME infrastructure and adds
clocksource structures for both TSC and HPET (ACPI PM is shared w/ i386).
[akpm@osdl.org: fix printk timestamps]
[akpm@osdl.org: fix printk ckeanups]
[akpm@osdl.org: hpet build fix]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
john stultz [Fri, 16 Feb 2007 09:28:19 +0000 (01:28 -0800)]
[PATCH] time: x86_64: split x86_64/kernel/time.c up
In preparation for the x86_64 generic time conversion, this patch splits out
TSC and HPET related code from arch/x86_64/kernel/time.c into respective
hpet.c and tsc.c files.
[akpm@osdl.org: fix printk timestamps]
[akpm@osdl.org: cleanup]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
john stultz [Fri, 16 Feb 2007 09:28:18 +0000 (01:28 -0800)]
[PATCH] time: x86_64: hpet_address cleanup
In preparation for supporting generic timekeeping, this patch cleans up
x86-64's use of vxtime.hpet_address, changing it to just hpet_address as is
also used in i386. This is necessary since the vxtime structure will be going
away.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
john stultz [Fri, 16 Feb 2007 09:28:17 +0000 (01:28 -0800)]
[PATCH] generic: vsyscall-gtod support for GENERIC_TIME
Provides generic infrastructure for vsyscall-gtod.
[akpm@osdl.org: cleanup]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:28:16 +0000 (01:28 -0800)]
[PATCH] Add SysRq-Q to print timer_list debug info
Add SysRq-Q to print pending timers and other timer info.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:28:15 +0000 (01:28 -0800)]
[PATCH] Add debugging feature /proc/timer_list
add /proc/timer_list, which prints all currently pending (high-res) timers,
all clock-event sources and their parameters in a human-readable form.
Sample output:
Timer List Version: v0.1
HRTIMER_MAX_CLOCK_BASES: 2
now at
4246046273872 nsecs
cpu: 0
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset:
1273998312645738432 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <
f5a90ec8>, hrtimer_sched_tick, hrtimer_stop_sched_tick, swapper/0
# expires at
4246432689566 nsecs [in
386415694 nsecs]
#1: <
f5a90ec8>, hrtimer_wakeup, do_nanosleep, pcscd/2050
# expires at
4247018194689 nsecs [in
971920817 nsecs]
#2: <
f5a90ec8>, hrtimer_wakeup, do_nanosleep, irqbalance/1909
# expires at
4247351358392 nsecs [in
1305084520 nsecs]
#3: <
f5a90ec8>, hrtimer_wakeup, do_nanosleep, crond/2157
# expires at
4249097614968 nsecs [in
3051341096 nsecs]
#4: <
f5a90ec8>, it_real_fn, do_setitimer, syslogd/1888
# expires at
4251329900926 nsecs [in
5283627054 nsecs]
.expires_next :
4246432689566 nsecs
.hres_active : 1
.check_clocks : 0
.nr_events : 31306
.idle_tick :
4246020791890 nsecs
.tick_stopped : 1
.idle_jiffies : 986504
.idle_calls : 40700
.idle_sleeps : 36014
.idle_entrytime :
4246019418883 nsecs
.idle_sleeptime :
4178181972709 nsecs
cpu: 1
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset:
1273998312645738432 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <
f5a90ec8>, hrtimer_sched_tick, hrtimer_restart_sched_tick, swapper/0
# expires at
4246050084568 nsecs [in
3810696 nsecs]
#1: <
f5a90ec8>, hrtimer_wakeup, do_nanosleep, atd/2227
# expires at
4261010635003 nsecs [in
14964361131 nsecs]
#2: <
f5a90ec8>, hrtimer_wakeup, do_nanosleep, smartd/2332
# expires at
5469485798970 nsecs [in
1223439525098 nsecs]
.expires_next :
4246050084568 nsecs
.hres_active : 1
.check_clocks : 0
.nr_events : 24043
.idle_tick :
4246046084568 nsecs
.tick_stopped : 0
.idle_jiffies : 986510
.idle_calls : 26360
.idle_sleeps : 22551
.idle_entrytime :
4246043874339 nsecs
.idle_sleeptime :
4170763761184 nsecs
tick_broadcast_mask:
00000003
event_broadcast_mask:
00000001
CPU#0's local event device:
Clock Event Device: lapic
capabilities:
0000000e
max_delta_ns:
807385544
min_delta_ns: 1443
mult:
44624025
shift: 32
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt
.installed: 1
.expires:
4246432689566 nsecs
CPU#1's local event device:
Clock Event Device: lapic
capabilities:
0000000e
max_delta_ns:
807385544
min_delta_ns: 1443
mult:
44624025
shift: 32
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt
.installed: 1
.expires:
4246050084568 nsecs
Clock Event Device: hpet
capabilities:
00000007
max_delta_ns:
2147483647
min_delta_ns: 3352
mult:
61496110
shift: 32
set_next_event: hpet_next_event
set_mode: hpet_set_mode
event_handler: handle_nextevt_broadcast
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:28:13 +0000 (01:28 -0800)]
[PATCH] Add debugging feature /proc/timer_stat
Add /proc/timer_stats support: debugging feature to profile timer expiration.
Both the starting site, process/PID and the expiration function is captured.
This allows the quick identification of timer event sources in a system.
Sample output:
# echo 1 > /proc/timer_stats
# cat /proc/timer_stats
Timer Stats Version: v0.1
Sample period: 4.010 s
24, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
11, 0 swapper sk_reset_timer (tcp_delack_timer)
6, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
17, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
4, 2050 pcscd do_nanosleep (hrtimer_wakeup)
5, 4179 sshd sk_reset_timer (tcp_write_timer)
4, 2248 yum-updatesd schedule_timeout (process_timeout)
18, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
3, 0 swapper sk_reset_timer (tcp_delack_timer)
1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer)
2, 1 swapper e1000_up (e1000_watchdog)
1, 1 init schedule_timeout (process_timeout)
100 total events, 25.24 events/sec
[ cleanups and hrtimers support from Thomas Gleixner <tglx@linutronix.de> ]
[bunk@stusta.de: nr_entries can become static]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:12 +0000 (01:28 -0800)]
[PATCH] hrtimers: prevent possible itimer DoS
Fix potential setitimer DoS with high-res timers by pushing itimer rearm
processing to process context.
[Fixes from: Ingo Molnar <mingo@elte.hu>]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:11 +0000 (01:28 -0800)]
[PATCH] hrtimers: add high resolution timer support
Implement high resolution timers on top of the hrtimers infrastructure and the
clockevents / tick-management framework. This provides accurate timers for
all hrtimer subsystem users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:28:10 +0000 (01:28 -0800)]
[PATCH] i386: enable dynticks in kconfig
Enable dynamic ticks selection.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:09 +0000 (01:28 -0800)]
[PATCH] i386 prepare nmi watchdog for dynticks
The NMI watchdog implementation assumes that the local APIC timer interrupt is
happening. This assumption is not longer true when high resolution timers and
dynamic ticks come into play, as they may switch off the local APIC timer
completely. Take the PIT/HPET interrupts into account too, to avoid false
positives.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:28:07 +0000 (01:28 -0800)]
[PATCH] i386 prepare for dyntick
Prepare i386 for dyntick: idle handler callbacks.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:06 +0000 (01:28 -0800)]
[PATCH] i386 rework local apic timer calibration
The local apic timer calibration has two problem cases:
1. The calibration is based on readout of the PIT/HPET timer to detect the
wrap of the periodic tick. It happens that a box gets stuck in the
calibration loop due to a PIT with a broken readout function.
2. CoreDuo boxen show a sporadic PIT runs too slow defect, which results
in a wrong lapic calibration. The PIT goes back to normal operation once
the lapic timer is switched to periodic mode.
Both are existing and unfixed problems in the current upstream kernel and
prevent certain laptops and other systems from booting Linux.
Rework the code to address both problems:
- Make the calibration interrupt driven. This removes the wait_timer_tick
magic hackery from lapic.c and time_hpet.c. The clockevents framework
allows easy substitution of the global tick event handler for the
calibration. This is more accurate than monitoring jiffies. At this point
of the boot process, nothing disturbes the interrupt delivery, so the
results are very accurate.
- Verify the calibration against the PM timer, when available by using the
early access function. When the measured calibration period is outside of
an one percent window, then the lapic timer calibration is adjusted to the
pm timer result.
- Verify the calibration by running the lapic timer with the calibration
handler. Disable lapic timer in case of deviation.
This also removes the "synchronization" of the local apic timer to the global
tick. This synchronization never worked, as there is no way to synchronize
PIT(HPET) and local APIC timer. The synchronization by waiting for the tick
just alignes the local APIC timer for the first events, but later the events
drift away due to the different clocks. Removing the "sync" is just
randomizing the asynchronous behaviour at setup time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:04 +0000 (01:28 -0800)]
[PATCH] clockevents: i386 drivers
Add clockevent drivers for i386: lapic (local) and PIT/HPET (global). Update
the timer IRQ to call into the PIT/HPET driver's event handler and the
lapic-timer IRQ to call into the lapic clockevent driver. The assignement of
timer functionality is delegated to the core framework code and replaces the
compile and runtime evalution in do_timer_interrupt_hook()
Use the clockevents broadcast support and implement the lapic_broadcast
function for ACPI.
No changes to existing functionality.
[ kdump fix from Vivek Goyal <vgoyal@in.ibm.com> ]
[ fixes based on review feedback from Arjan van de Ven <arjan@infradead.org> ]
Cleanups-from: Adrian Bunk <bunk@stusta.de>
Build-fixes-from: Andrew Morton <akpm@osdl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:03 +0000 (01:28 -0800)]
[PATCH] tick-management: dyntick / highres functionality
With Ingo Molnar <mingo@elte.hu>
Add functions to provide dynamic ticks and high resolution timers. The code
which keeps track of jiffies and handles the long idle periods is shared
between tick based and high resolution timer based dynticks. The dyntick
functionality can be disabled on the kernel commandline. Provide also the
infrastructure to support high resolution timers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:02 +0000 (01:28 -0800)]
[PATCH] tick-management: broadcast functionality
With Ingo Molnar <mingo@elte.hu>
Add broadcast functionality, so per cpu clock event devices can be registered
as dummy devices or switched from/to broadcast on demand. The broadcast
function distributes the events via the broadcast function of the clock event
device. This is primarily designed to replace the switch apic timer to / from
IPI in power states, where the apic stops.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:01 +0000 (01:28 -0800)]
[PATCH] tick-management: core functionality
With Ingo Molnar <mingo@elte.hu>
The tick-management code is the first user of the clockevents layer. It takes
clock event devices from the clock events core and uses them to provide the
periodic tick.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:00 +0000 (01:28 -0800)]
[PATCH] clockevents: add core functionality
Architectures register their clock event devices, in the clock events core.
Users of the clockevents core can get clock event devices for their use. The
clockevents core code provides notification mechanisms for various clock
related management events.
This allows to control the clock event devices without the architectures
having to worry about the details of function assignment. This is also a
preliminary for high resolution timers and dynamic ticks to allow the core
code to control the clock functionality without intrusive changes to the
architecture code.
[Fixes-by: Ingo Molnar <mingo@elte.hu>]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:58 +0000 (01:27 -0800)]
[PATCH] i386, apic: clean up the APIC code
The apic code is quite unstructured and missing a lot of comments.
- Restructure the code into helper functions, timer, setup/shutdown,
interrupt and power management blocks.
- Fixup comments.
- Namespace fixups
- Inline helpers for version and is_integrated
- Combine the ack_bad_irq functions
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:57 +0000 (01:27 -0800)]
[PATCH] Allow early access to the power management timer
Allow early access to the power management timer by exposing the verified read
function and providing a helper function which checks the pmtmr_ioport
variable and returns either the pm timer readout or 0 in case the pm timer is
not available.
Create a new header file and replace also the ifdef'ed extern definition in
arch/i386/kernel/acpi/boot.c
This is a preperatory patch for the rework of the local apic timer
calibration.
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:55 +0000 (01:27 -0800)]
[PATCH] ACPI keep track of timer broadcasting
This is a preperatory patch for highres/dyntick:
- replace the big #ifdef ARCH_APICTIMER_STOPS_ON_C3 hackery by functions
- remove the double switch in the power verify function (in the worst case
we switched ipi to apic and 20usec later apic to ipi)
- keep track of the the state which stops local APIC timer
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Len Brown <len.brown@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:54 +0000 (01:27 -0800)]
[PATCH] ACPI: fix missing include for UP
apic.h does not get included on UP compiles. That way the
APICTIMER_STOPS_ON_C3 is not there and UP boxen have no support for timer
broadcasting. This was never noticed, because the lapic timer is only used
for profiling on UP.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:53 +0000 (01:27 -0800)]
[PATCH] hrtimers: move and add documentation
Move the initial hrtimers.txt document to the new directory
"Documentation/hrtimers"
Add design notes for the high resolution timer and dynamic tick functionality.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:52 +0000 (01:27 -0800)]
[PATCH] hrtimers: clean up callback tracking
Reintroduce ktimers feature "optimized away" by the ktimers review process:
remove the curr_timer pointer from the cpu-base and use the hrtimer state.
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:51 +0000 (01:27 -0800)]
[PATCH] hrtimers; add state tracking
Reintroduce ktimers feature "optimized away" by the ktimers review process:
multiple hrtimer states to enable the running of hrtimers without holding the
cpu-base-lock.
(The "optimized" rbtree hack carried only 2 states worth of information and we
need 4 for high resolution timers and dynamic ticks.)
No functional changes.
Build-fixes-from: Andrew Morton <akpm@osdl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:50 +0000 (01:27 -0800)]
[PATCH] hrtimers: cleanup locking
Improve kernel/hrtimers.c locking: use a per-CPU base with a lock to control
locking of all clocks belonging to a CPU. This simplifies code that needs to
lock all clocks at once. This makes life easier for high-res timers and
dyntick.
No functional changes.
[ optimization change from Andrew Morton <akpm@osdl.org> ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:49 +0000 (01:27 -0800)]
[PATCH] hrtimers: namespace and enum cleanup
- hrtimers did not use the hrtimer_restart enum and relied on the implict
int representation. Fix the prototypes and the functions using the enums.
- Use seperate name spaces for the enumerations
- Convert hrtimer_restart macro to inline function
- Add comments
No functional changes.
[akpm@osdl.org: fix input driver]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:47 +0000 (01:27 -0800)]
[PATCH] Extend next_timer_interrupt() to use a reference jiffie
For CONFIG_NO_HZ we need to calculate the next timer wheel event based on a
given jiffie value. Extend the existing code to allow the extra 'now'
argument. Provide a compability function for the existing implementations to
call the function with now == jiffies. (This also solves the racyness of the
original code vs. jiffies changing during the iteration.)
No functional changes to existing users of this infrastructure.
[ remove WARN_ON() that triggered on s390, by Carsten Otte <cotte@de.ibm.com> ]
[ made new helper static, Adrian Bunk <bunk@stusta.de> ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:46 +0000 (01:27 -0800)]
[PATCH] Fix cascade lookup of next_timer_interrupt
When searching for the next pending timer in the timer wheel we need to take
the cascade into account. The current code has several problems:
1. it looks into the previous cascade
2. it ignores a pending cascade
3. it ignores multiple cascades
Change the cascade lookup, so it calculates the array index from the point of
the next cascade and always look at the cascade buckets, when the cascade is
pending, i.e. gets executed in the next timer softirq. When multiple
cascades are pending, then lookup the next buckets too.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:27:45 +0000 (01:27 -0800)]
[PATCH] uninline irq_enter()
Uninline irq_enter(). [dynticks adds more stuff to it]
No functional changes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcelo Tosatti [Fri, 16 Feb 2007 09:27:44 +0000 (01:27 -0800)]
[PATCH] Mark TSC on GeodeLX reliable
The Geode can safely use the TSC for highres, since:
1) Does not support frequency scaling,
2) The TSC _does_ count when the CPU is halted. Furthermore, the Geode
supports a mode called "suspension on halt", where Suspend mode (which
interacts with the power management states) is entered. TSC counting
during suspend mode is controlled by bit 8 of the Bus Controller
Configuration Register #0 (thanks Tom!).
3) no SMP :)
Check if "RTSC counts during suspension" and remove the requirement for
verification, so the clocksource code can safely select it as an timesource
for the highres timers subsystem.
Signed-off-by: Marcelo Tosatti <marcelo@kvack.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:43 +0000 (01:27 -0800)]
[PATCH] clocksource: Add verification (watchdog) helper
The TSC needs to be verified against another clocksource. Instead of using
hardwired assumptions of available hardware, provide a generic verification
mechanism. The verification uses the best available clocksource and handles
the usability for high resolution timers / dynticks of the clocksource which
needs to be verified.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:42 +0000 (01:27 -0800)]
[PATCH] clocksource: Remove the update callback
The clocksource code allows direct updates of the rating of a given
clocksource now. Change TSC unstable tracking to use this interface and
remove the update callback.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:40 +0000 (01:27 -0800)]
[PATCH] clocksource: fixup is_continous changes on MIPS
Fixup the is_contionous replacement by a flag field.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:39 +0000 (01:27 -0800)]
[PATCH] clocksource: fixup is_continous changes on S390
Fixup the is_contionous replacement by a flag field.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:38 +0000 (01:27 -0800)]
[PATCH] clocksource: fixup is_continous changes on AVR32
Fixup the is_contionous replacement by a flag field.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:37 +0000 (01:27 -0800)]
[PATCH] clocksource: fixup is_continous changes on ARM
Fixup the is_contionous replacement by a flag field.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:36 +0000 (01:27 -0800)]
[PATCH] clocksource: replace is_continuous by a flag field
Using a flag filed allows to encode more than one information into a variable.
Preparatory patch for the generic clocksource verification.
[mingo@elte.hu: convert vmitime.c to the new clocksource flag]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:27:34 +0000 (01:27 -0800)]
[PATCH] x86: rewrite SMP TSC sync code
make the TSC synchronization code more robust, and unify it between x86_64 and
i386.
The biggest change is the removal of the 'fix up TSCs' code on x86_64 and
i386, in some rare cases it was /causing/ time-warps on SMP systems.
The new code only checks for TSC asynchronity - and if it can prove a
time-warp (if it can observe the TSC going backwards when going from one CPU
to another within a critical section), then the TSC clock-source is turned
off.
The TSC synchronization-checking code also got moved into a separate file.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:33 +0000 (01:27 -0800)]
[PATCH] Simplify the registration of clocksources
Enqueue clocksources in rating order to make selection of the clocksource
easier. Also check the match with an user override at enqueue time.
Preparatory patch for the generic clocksource verification.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:32 +0000 (01:27 -0800)]
[PATCH] i386 Remove useless code in tsc.c
The delayed work code in arch/i386/kernel/tsc.c is an unused leftover of the
GTOD conversion. Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Stultz [Fri, 16 Feb 2007 09:27:31 +0000 (01:27 -0800)]
[PATCH] i386: use GTOD persistent clock support
Persistent clock support: do proper timekeeping across suspend/resume, i386
arch support.
[bunk@stusta.de: cleanup]
Build-fixes-from: Andrew Morton <akpm@osdl.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Stultz [Fri, 16 Feb 2007 09:27:30 +0000 (01:27 -0800)]
[PATCH] GTOD: persistent clock support
Persistent clock support: do proper timekeeping across suspend/resume.
[bunk@stusta.de: cleanup]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:27:29 +0000 (01:27 -0800)]
[PATCH] Fix timeout overflow with jiffies
Prevent timeout overflow if timer ticks are behind jiffies (due to high
softirq load or due to dyntick), by limiting the valid timeout range to
MAX_LONG/2.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:27:28 +0000 (01:27 -0800)]
[PATCH] Fix multiple conversion bugs in msecs_to_jiffies
Fix multiple conversion bugs in msecs_to_jiffies().
The main problem is that this condition:
if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
overflows if HZ is smaller than 1000!
This change is user-visible: for HZ=250 SUS-compliant poll()-timeout
value of -20 is mistakenly converted to 'immediate timeout'.
(The new dyntick code also triggered this, that's how we noticed.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:27:27 +0000 (01:27 -0800)]
[PATCH] Uninline jiffies.h functions
There are loads of fat functions hidden in jiffies.h. Uninline them. No code
changes.
[jeremy@goop.org: export fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
john stultz [Fri, 16 Feb 2007 09:27:26 +0000 (01:27 -0800)]
[PATCH] HZ free ntp
Distangle the NTP update from HZ. This is necessary for dynamic tick enabled
kernels.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:25 +0000 (01:27 -0800)]
[PATCH] Add a function to handle interrupt affinity setting
Provide funtions to:
- check, whether an interrupt can set the affinity
- pin the interrupt to a given cpu
Necessary for the ability to setup clocksources more flexible (e.g. use the
different HPET channels per CPU)
[akpm@osdl.org: alpha build fix]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:24 +0000 (01:27 -0800)]
[PATCH] Add irq flag to disable balancing for an interrupt
Add a flag so we can prevent the irq balancing of an interrupt. Move the
bits, so we have room for more :)
Necessary for the ability to setup clocksources more flexible (e.g. use the
different HPET channels per CPU)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 16 Feb 2007 09:27:23 +0000 (01:27 -0800)]
[PATCH] vmi-versus-hrtimers
arch/i386/kernel/built-in.o: In function `vmi_stop_hz_timer':
: undefined reference to `next_timer_interrupt'
If CONFIG_NO_HZ, next_timer_interrupt() doesn't exist (and presumably doesn't
make sense).
Perhaps VMI shouildn't be playing with timer internals at this level.
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 16 Feb 2007 09:27:22 +0000 (01:27 -0800)]
[PATCH] correct CONFIG_GIGASET_M101 Makefile entry
Advanced Mathematics, lesson 1:
101 != 105
;-)
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Fri, 16 Feb 2007 09:27:21 +0000 (01:27 -0800)]
[PATCH] uml: fix 2.6.20 hang
A previous cleanup misused need_poll, which had a fairly broken interface.
It implemented a growable array, changing the used elements count itself,
but leaving it up to the caller to fill in the actual elements, including
the entire array if the array had to be reallocated. This worked because
the previous users were switching between two such structures, and the
elements were copied from the inactive array to the active array after
making sure the active array had enough room.
maybe_sigio_broken was made to use need_poll, but it was operating on a
single array, so when the buffer was reallocated, the previous contents
were lost.
This patch makes need_poll implement more sane semantics. It merely
assures that the array is of the proper size and that the contents are
preserved. It is up to the caller to adjust the used elements count and to
ensure that the proper elements are resent.
This manifested itself as a hang in 2.6.20 as the uninitialized buffer
convinced UML that one of its own file descriptors didn't support SIGIO and
needed to be watched by poll in a separate thread. The result was an
interrupt flood as control traffic over this descriptor sparked interrupts,
which resulted in more control traffic, ad nauseum.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitriy Monakhov [Fri, 16 Feb 2007 09:27:18 +0000 (01:27 -0800)]
[PATCH] __page_symlink retry loop error code fix
If prepare_write or commit_write return AOP_TRUNCATED_PAGE we jump to
"retry" label and than if find_or_create_page() failed function return
incorrect error code.
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frederik Deweerdt [Fri, 16 Feb 2007 09:27:15 +0000 (01:27 -0800)]
[PATCH] pci_iomap_regions() error handling fix
It appears that the pcim_iomap_regions() function doesn't get the error
handling right. It BUGs early at boot with a backtrace along the lines of:
ahci_init
pci_register_driver
driver_register
[...]
ahci_init_one
pcim_iomap_region
pcim_iounmap
The following patch allows me to boot. Only the if(mask..) continue;
part fixes the problem actually, the gotos where changed so that we
don't try to unmap something we couldn't map anyway.
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Fri, 16 Feb 2007 09:27:14 +0000 (01:27 -0800)]
[PATCH] GPIO core documentation
Small updates to the GPIO documentation, addressing feedback and
fixing a few spelling errors.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nate Dailey [Thu, 15 Feb 2007 23:13:46 +0000 (18:13 -0500)]
sata_vsc: use default cache line size if non-zero
This modifies drivers/ata/sata_vsc.c to only set the cache line size
to 0x80 if the default value is zero. Apparently zero isn't allowed
due to a bug in the chip, but I've found performance is much better
with the (non-zero) default instead of 0x80.
[note1: "default" means BIOS-programmed value, in this context -jgarzik]
[note2: superfluous braces were removed from the patch -jg]
Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Jeremy Higdon <jeremy@sgi.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Robert Hancock [Mon, 12 Feb 2007 00:36:56 +0000 (18:36 -0600)]
sata_nv: handle SError status indication
ADMA-capable controllers provide a bit in the status register that appears
to indicate that the controller detected an SError condition. Update sata_nv
to detect this and trigger error handling in order to handle the fault.
Signed-off-by: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Olaf Hering [Sat, 10 Feb 2007 20:36:14 +0000 (21:36 +0100)]
add delay around sl82c105_reset_engine calls
The hald media changed polling does really confuse things.
Noone knows why the delays are needed, but they give us access to the CD.
An udelay(50) will give reliable access to the drive, but there is still
one (or more) EH reset. The drive works without EH resets with udelay(100).
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Zhang, Yanmin [Fri, 9 Feb 2007 03:29:51 +0000 (11:29 +0800)]
ATA convert GSI to irq on ia64
If an ATA drive uses legacy mode, ata driver will choose 14 and 15
as the fixed irq number. On ia64 platform, such numbers are GSI and
should be converted to irq vector.
Below patch against kernel 2.6.20 fixes it.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Wed, 7 Feb 2007 20:37:41 +0000 (12:37 -0800)]
libata: clear TF before IDENTIFYing
Some devices chock if Feature is not clear when IDENTIFY is issued.
Set ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE for IDENTIFY such that whole
TF is cleared when reading ID data.
Kudos to Art Haas for testing various futile patches over several
months and Mark Lord for pointing out the fix.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Art Haas <ahaas@airmail.net>
Cc: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Wed, 7 Feb 2007 21:46:00 +0000 (13:46 -0800)]
libata: Add a host flag to indicate lack of IORDY capability
This is the first preparation to doing the !IORDY cases properly. Further
diffs will then add the needed logic to do it right.
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Mon, 5 Feb 2007 14:21:19 +0000 (23:21 +0900)]
libata: fix drive side 80c cable check, take 3
The 80c wire bit is bit 13, not 14. Bit 14 is always 1 if word93 is
implemented. This increases the chance of incorrect wire detection
especially because host side cable detection is often unreliable and
we sometimes soley depend on drive side cable detection. Fix the test
and add word93 validity check.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Mikael Pettersson [Sun, 11 Feb 2007 22:19:53 +0000 (23:19 +0100)]
sata_promise: new EH conversion for 20619 chips, take 2
This patch updates the sata_promise driver to use new-style
libata error handling for 20619 (TX4000) chips. sata_promise
already uses new EH for the other chips it supports, so the
patch is quite simple:
* remove ->phy_reset and ->eng_timeout ops from pdc_pata_ops,
and instead bind ->freeze, ->thaw, ->error_handler, and
->post_internal_cmd to existing new EH functions
* drop ATA_FLAG_SRST from board_20619's flags
* remove now unused pdc_pata_phy_reset() and pdc_eng_timeout()
Tested on a TX4000 with both modern working disks and old/quirky
disks. Also used a CD-RW drive to test reading and writing CDs.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Mikael Pettersson [Wed, 7 Feb 2007 21:29:56 +0000 (22:29 +0100)]
sata_promise: fix missing PATA cable detection
This patch fixes an oversight which caused sata_promise to
not perform cable detection on the TX2plus chips' PATA ports.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Linus Torvalds [Thu, 15 Feb 2007 18:01:15 +0000 (10:01 -0800)]
Merge /pub/scm/linux/kernel/git/lethal/sh-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (35 commits)
sh: rts7751r2d board updates.
sh: Kill off dead bigsur and ec3104 boards.
sh: Fixup r7780rp pata_platform for devres conversion.
sh: Revert TLB miss fast-path changes that broke PTEA parts.
sh: Compile fix for heartbeat consolidation.
sh: heartbeat consolidation for banked LEDs.
sh: define dma noncoherent API functions.
sh: Missing flush_dcache_all() proto in cacheflush.h.
sh: Kill dead/unused ISA code from __ioremap().
sh: Add cpu-features header to asm/Kbuild.
sh: Move __KERNEL__ up in asm/page.h.
sh: Fix syscall numbering breakage.
sh: dcache write-back for R7780RP PIO.
sh: Switch to local TLB flush variants in additional callsites.
sh: Local TLB flushing variants for SMP prep.
sh: Fixup cpu_data references for the non-boot CPUs.
sh: Use a per-cpu ASID cache.
sh: add SH_CLK_MD Kconfig default.
sh: Fixup SHMIN INTC register definitions.
sh: SH-DMAC compile fixes
...
Nick Piggin [Wed, 14 Feb 2007 11:39:01 +0000 (12:39 +0100)]
[PATCH] mincore: vma crossing fix
My mincore also forgot about crossing vmas.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Wed, 14 Feb 2007 11:36:32 +0000 (12:36 +0100)]
[PATCH] mincore: fill in results properly
Paper bag time. Thanks to Randy for noticing that I didn't actually assign
'present' to anything.
Unfortunately my original patch passed the few simple test cases I gave it,
purely by coincidence.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Wed, 14 Feb 2007 11:35:02 +0000 (12:35 +0100)]
[PATCH] mincore: CONFIG_SWAP=n fix
Fix mincore-anon patch to compile with CONFIG_SWAP=n
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Mundt [Thu, 15 Feb 2007 09:20:52 +0000 (18:20 +0900)]
sh: rts7751r2d board updates.
This tidies up some of the rts7751r2d mess and gets it booting
again. Update the defconfig, too.
Signed-off-by: Masayuki Hosokawa <hosokawa@ace-jp.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Linus Torvalds [Wed, 14 Feb 2007 17:51:20 +0000 (09:51 -0800)]
Merge branch 'linus' of /linux/kernel/git/perex/alsa
* 'linus' of master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa:
[ALSA] version 1.0.14rc2
[ALSA] Fix a typo in __dev* changes in portman2x4.c
[ALSA] Change AT91 PDC register defines for 2.6.20 kernel
[ALSA] SoC codecs - fix Kconfig - depends -> depends on
[ALSA] Fix __devinit and __devexit issues with sound drivers
[ALSA] hda-codec - Patch for enabling LFE on more Dell laptops
[ALSA] hda-codec - More fixes for Conexant HD Audio support
[ALSA] usb-audio: add PCR-A PCM support
[ALSA] emu10k1: fix typo
[ALSA] usbaudio - remove urb->bandwidth reference
[ALSA] ac97 - Fix silent output problem with Cx20551 codec
[ALSA] hda-codec - Fix Oops with probing sigmatel codec chips
Linus Torvalds [Wed, 14 Feb 2007 17:46:06 +0000 (09:46 -0800)]
Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits)
[PATCH] x86-64: Remove mk_pte_phys()
[PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386
[PATCH] i386: fix 32-bit ioctls on x64_32
[PATCH] x86: Unify pcspeaker platform device code between i386/x86-64
[PATCH] i386: Remove extern declaration from mm/discontig.c, put in header.
[PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c
[PATCH] i386: Move mce_disabled to asm/mce.h
[PATCH] i386: paravirt unhandled fallthrough
[PATCH] x86_64: Wire up compat epoll_pwait
[PATCH] x86: Don't require the vDSO for handling a.out signals
[PATCH] i386: Fix Cyrix MediaGX detection
[PATCH] i386: Fix warning in cpu initialization
[PATCH] i386: Fix warning in microcode.c
[PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs
[PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo
[PATCH] i386: Remove fastcall in paravirt.[ch]
[PATCH] x86-64: Fix wrong gcc check in bitops.h
[PATCH] x86-64: survive having no irq mapping for a vector
[PATCH] i386: geode configuration fixes
[PATCH] i386: add option to show more code in oops reports
...
Eric W. Biederman [Wed, 14 Feb 2007 08:34:16 +0000 (00:34 -0800)]
[PATCH] sysctl: hide the sysctl proc inodes from selinux
Since the security checks are applied on each read and write of a sysctl file,
just like they are applied when calling sys_sysctl, they are redundant on the
standard VFS constructs. Since it is difficult to compute the security labels
on the standard VFS constructs we just mark the sysctl inodes in proc private
so selinux won't even bother with them.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Smalley [Wed, 14 Feb 2007 08:34:16 +0000 (00:34 -0800)]
[PATCH] selinux: enhance selinux to always ignore private inodes
Hmmm...turns out to not be quite enough, as the /proc/sys inodes aren't truly
private to the fs, so we can run into them in a variety of security hooks
beyond just the inode hooks, such as security_file_permission (when reading
and writing them via the vfs helpers), security_sb_mount (when mounting other
filesystems on directories in proc like binfmt_misc), and deeper within the
security module itself (as in flush_unauthorized_files upon inheritance across
execve). So I think we have to add an IS_PRIVATE() guard within SELinux, as
below. Note however that the use of the private flag here could be confusing,
as these inodes are _not_ private to the fs, are exposed to userspace, and
security modules must implement the sysctl hook to get any access control over
them.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:15 +0000 (00:34 -0800)]
[PATCH] sysctl: fix the selinux_sysctl_get_sid
I goofed and when reenabling the fine grained selinux labels for
sysctls and forgot to add the "/sys" prefix before consulting
the policy database. When computing the same path using
proc_dir_entries we got the "/sys" for free as it was part
of the tree, but it isn't true for clt_table trees.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:14 +0000 (00:34 -0800)]
[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables
It isn't needed anymore, all of the users are gone, and all of the ctl_table
initializers have been converted to use explicit names of the fields they are
initializing.
[akpm@osdl.org: NTFS fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:13 +0000 (00:34 -0800)]
[PATCH] sysctl: add a parent entry to ctl_table and set the parent entry
Add a parent entry into the ctl_table so you can walk the list of parents and
find the entire path to a ctl_table entry.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:12 +0000 (00:34 -0800)]
[PATCH] sysctl: reimplement the sysctl proc support
With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.
For a cost of 2K code we will save about 4K of static tables (when we remove
de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
about half that on a 32bit arch.
The speed feels about the same, even though we can now cache the sysctl
dentries :(
We get the core advantage that we don't need to have a 1 to 1 mapping between
ctl table entries and proc files. Making it possible to have /proc/sys vary
depending on the namespace you are in. The currently merged namespaces don't
have an issue here but the network namespace under /proc/sys/net needs to have
different directories depending on which network adapters are visible. By
simply being a cache different directories being visible depending on who you
are is trivial to implement.
[akpm@osdl.org: fix uninitialised var]
[akpm@osdl.org: fix ARM build]
[bunk@stusta.de: make things static]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:11 +0000 (00:34 -0800)]
[PATCH] sysctl: allow sysctl_perm to be called from outside of sysctl.c
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:11 +0000 (00:34 -0800)]
[PATCH] sysctl: factor out sysctl_head_next from do_sysctl
The current logic to walk through the list of sysctl table headers is slightly
painful and implement in a way it cannot be used by code outside sysctl.c
I am in the process of implementing a version of the sysctl proc support that
instead of using the proc generic non-caching monster, just uses the existing
sysctl data structure as backing store for building the dcache entries and for
doing directory reads. To use the existing data structures however I need a
way to get at them.
[akpm@osdl.org: warning fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:09 +0000 (00:34 -0800)]
[PATCH] sysctl: remove insert_at_head from register_sysctl
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.
I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.
So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:08 +0000 (00:34 -0800)]
[PATCH] sysctl: remove support for directory strategy routines
parse_table has support for calling a strategy routine when descending into a
directory. To date no one has used this functionality and the /proc/sys
interface has no analog to it.
So no one is using this functionality kill it and make the binary sysctl code
easier to follow.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:07 +0000 (00:34 -0800)]
[PATCH] sysctl: remove support for CTL_ANY
There are currently no users in the kernel for CTL_ANY and it only has effect
on the binary interface which is practically unused.
So this complicates sysctl lookups for no good reason so just remove it.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:07 +0000 (00:34 -0800)]
[PATCH] sysctl: create sys/fs/binfmt_misc as an ordinary sysctl entry
binfmt_misc has a mount point in the middle of the sysctl and that mount point
is created as a proc_generic directory.
Doing it that way gets in the way of cleaning up the sysctl proc support as it
continues the existence of a horrible hack. So instead simply create the
directory as an ordinary sysctl directory. At least that removes the magic
special case.
[akpm@osdl.org: warning fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:34:06 +0000 (00:34 -0800)]
[PATCH] sysctl: move SYSV IPC sysctls to their own file
This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
with special cases, and by keeping all of the ipc logic to together it makes
the code a little more readable.
[gcoady.lk@gmail.com: build fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Grant Coady <gcoady.lk@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:33:58 +0000 (00:33 -0800)]
[PATCH] sysctl: move utsname sysctls to their own file
This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
with special cases, and by keeping all of the utsname logic to together it
makes the code a little more readable.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric W. Biederman [Wed, 14 Feb 2007 08:33:57 +0000 (00:33 -0800)]
[PATCH] sysctl: move init_irq_proc into init/main where it belongs
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>