Peter P Waskiewicz Jr [Fri, 21 Mar 2008 10:43:19 +0000 (03:43 -0700)]
[NET]: Add per-connection option to set max TSO frame size
Update: My mailer ate one of Jarek's feedback mails... Fixed the
parameter in netif_set_gso_max_size() to be u32, not u16. Fixed the
whitespace issue due to a patch import botch. Changed the types from
u32 to unsigned int to be more consistent with other variables in the
area. Also brought the patch up to the latest net-2.6.26 tree.
Update: Made gso_max_size container 32 bits, not 16. Moved the
location of gso_max_size within netdev to be less hotpath. Made more
consistent names between the sock and netdev layers, and added a
define for the max GSO size.
Update: Respun for net-2.6.26 tree.
Update: changed max_gso_frame_size and sk_gso_max_size from signed to
unsigned - thanks Stephen!
This patch adds the ability for device drivers to control the size of
the TSO frames being sent to them, per TCP connection. By setting the
netdevice's gso_max_size value, the socket layer will set the GSO
frame size based on that value. This will propogate into the TCP
layer, and send TSO's of that size to the hardware.
This can be desirable to help tune the bursty nature of TSO on a
per-adapter basis, where one may have 1 GbE and 10 GbE devices
coexisting in a system, one running multiqueue and the other not, etc.
This can also be desirable for devices that cannot support full 64 KB
TSO's, but still want to benefit from some level of segmentation
offloading.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Mar 2008 10:42:24 +0000 (03:42 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/davem/net-2.6
Jarek Poplawski [Fri, 21 Mar 2008 00:05:13 +0000 (17:05 -0700)]
[NET] ifb: set separate lockdep classes for queue locks
[ 10.536424] =======================================================
[ 10.536424] [ INFO: possible circular locking dependency detected ]
[ 10.536424] 2.6.25-rc3-devel #3
[ 10.536424] -------------------------------------------------------
[ 10.536424] swapper/0 is trying to acquire lock:
[ 10.536424] (&dev->queue_lock){-+..}, at: [<
c0299b4a>]
dev_queue_xmit+0x175/0x2f3
[ 10.536424]
[ 10.536424] but task is already holding lock:
[ 10.536424] (&p->tcfc_lock){-+..}, at: [<
f8a67154>] tcf_mirred+0x20/0x178
[act_mirred]
[ 10.536424]
[ 10.536424] which lock already depends on the new lock.
lockdep warns of locking order while using ifb with sch_ingress and
act_mirred: ingress_lock, tcfc_lock, queue_lock (usually queue_lock
is at the beginning). This patch is only to tell lockdep that ifb is
a different device (e.g. from eth) and has its own pair of queue
locks. (This warning is a false-positive in common scenario of using
ifb; yet there are possible situations, when this order could be
dangerous; lockdep should warn in such a case.) (With suggestions by
David S. Miller)
Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki [Thu, 20 Mar 2008 23:13:58 +0000 (16:13 -0700)]
[IPV6] KCONFIG: Fix description about IPV6_TUNNEL.
Based on notice from "Colin" <colins@sjtu.edu.cn>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 20 Mar 2008 23:11:27 +0000 (16:11 -0700)]
[TCP]: Fix shrinking windows with window scaling
When selecting a new window, tcp_select_window() tries not to shrink
the offered window by using the maximum of the remaining offered window
size and the newly calculated window size. The newly calculated window
size is always a multiple of the window scaling factor, the remaining
window size however might not be since it depends on rcv_wup/rcv_nxt.
This means we're effectively shrinking the window when scaling it down.
The dump below shows the problem (scaling factor 2^7):
- Window size of 557 (71296) is advertised, up to
3111907257:
IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack
3111835961 win 557 <...>
- New window size of 514 (65792) is advertised, up to
3111907217, 40 bytes
below the last end:
IP 172.2.2.3.33000 > 172.2.2.2.33000: .
3113575668:
3113577116(1448) ack
3111841425 win 514 <...>
The number 40 results from downscaling the remaining window:
3111907257 -
3111841425 = 65832
65832 / 2^7 = 514
65832 % 2^7 = 40
If the sender uses up the entire window before it is shrunk, this can have
chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
will notice that the window has been shrunk since tcp_wnd_end() is before
tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
This will fail the receivers checks in tcp_sequence() however since it
is before it's tp->rcv_wup, making it respond with a dupack.
If both sides are in this condition, this leads to a constant flood of
ACKs until the connection times out.
Make sure the window is never shrunk by aligning the remaining window to
the window scaling factor.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski [Thu, 20 Mar 2008 23:07:27 +0000 (16:07 -0700)]
netpoll: zap_completion_queue: adjust skb->users counter
zap_completion_queue() retrieves skbs from completion_queue where they have
zero skb->users counter. Before dev_kfree_skb_any() it should be non-zero
yet, so it's increased now.
Reported-and-tested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Checconi [Thu, 20 Mar 2008 22:54:58 +0000 (15:54 -0700)]
bridge: use time_before() in br_fdb_cleanup()
In br_fdb_cleanup() next_timer and this_timer are in jiffies, so they
should be compared using the time_after() macro.
Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it>
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Mar 2008 22:53:15 +0000 (15:53 -0700)]
[TG3]: Fix build warning on sparc32.
Sparc MAC address support should be protected consistently
with CONFIG_SPARC, but there was a stray CONFIG_SPARC64
case.
Bump driver version and release date.
Reported by Andrew Morton.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Machek [Thu, 20 Mar 2008 22:41:02 +0000 (15:41 -0700)]
MAINTAINERS: bluez-devel is subscribers-only
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 20 Mar 2008 22:39:41 +0000 (15:39 -0700)]
audit: netlink socket can be auto-bound to pid other than current->pid (v2)
From: Pavel Emelyanov <xemul@openvz.org>
This patch is based on the one from Thomas.
The kauditd_thread() calls the netlink_unicast() and passes
the audit_pid to it. The audit_pid, in turn, is received from
the user space and the tool (I've checked the audit v1.6.9)
uses getpid() to pass one in the kernel. Besides, this tool
doesn't bind the netlink socket to this id, but simply creates
it allowing the kernel to auto-bind one.
That's the preamble.
The problem is that netlink_autobind() _does_not_ guarantees
that the socket will be auto-bound to the current pid. Instead
it uses the current pid as a hint to start looking for a free
id. So, in case of conflict, the audit messages can be sent
to a wrong socket. This can happen (it's unlikely, but can be)
in case some task opens more than one netlink sockets and then
the audit one starts - in this case the audit's pid can be busy
and its socket will be bound to another id.
The proposal is to introduce an audit_nlk_pid in audit subsys,
that will point to the netlink socket to send packets to. It
will most often be equal to audit_pid. The socket id can be
got from the skb's netlink CB right in the audit_receive_msg.
The audit_nlk_pid reset to 0 is not required, since all the
decisions are taken based on audit_pid value only.
Later, if the audit tools will bind the socket themselves, the
kernel will have to provide a way to setup the audit_nlk_pid
as well.
A good side effect of this patch is that audit_pid can later
be converted to struct pid, as it is not longer safe to use
pid_t-s in the presence of pid namespaces. But audit code still
uses the tgid from task_struct in the audit_signal_info and in
the audit_filter_syscall.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andre Noll [Thu, 20 Mar 2008 22:27:28 +0000 (15:27 -0700)]
[NET]: Fix permissions of /proc/net
commit
e9720ac ([NET]: Make /proc/net a symlink on /proc/self/net (v3))
broke ganglia and probably other applications that read /proc/net/dev.
This is due to the change of permissions of /proc/net that was
introduced in that commit.
Before: dr-xr-xr-x 5 root root 0 Mar 19 11:30 /proc/net
After: dr-xr--r-- 5 root root 0 Mar 19 11:29 /proc/self/net
This patch restores the permissions to the old value which makes
ganglia happy again.
Pavel Emelyanov says:
This also broke the postfix, as it was reported in bug #10286
and described in detail by Benjamin.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Thu, 20 Mar 2008 22:17:14 +0000 (15:17 -0700)]
[SCTP]: Fix a race between module load and protosw access
There is a race is SCTP between the loading of the module
and the access by the socket layer to the protocol functions.
In particular, a list of addresss that SCTP maintains is
not initialized prior to the registration with the protosw.
Thus it is possible for a user application to gain access
to SCTP functions before everything has been initialized.
The problem shows up as odd crashes during connection
initializtion when we try to access the SCTP address list.
The solution is to refactor how we do registration and
initialize the lists prior to registering with the protosw.
Care must be taken since the address list initialization
depends on some other pieces of SCTP initialization. Also
the clean-up in case of failure now also needs to be refactored.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Hokka Zakrisson [Thu, 20 Mar 2008 22:07:10 +0000 (15:07 -0700)]
[NETFILTER]: ipt_recent: sanity check hit count
If a rule using ipt_recent is created with a hit count greater than
ip_pkt_list_tot, the rule will never match as it cannot keep track
of enough timestamps. This patch makes ipt_recent refuse to create such
rules.
With ip_pkt_list_tot's default value of 20, the following can be used
to reproduce the problem.
nc -u -l 0.0.0.0 1234 &
for i in `seq 1 100`; do echo $i | nc -w 1 -u 127.0.0.1 1234; done
This limits it to 20 packets:
iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
--rsource
iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
60 --hitcount 20 --name test --rsource -j DROP
While this is unlimited:
iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
--rsource
iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
60 --hitcount 21 --name test --rsource -j DROP
With the patch the second rule-set will throw an EINVAL.
Reported-by: Sean Kennedy <skennedy@vcn.com>
Signed-off-by: Daniel Hokka Zakrisson <daniel@hozac.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roel Kluin [Thu, 20 Mar 2008 22:06:23 +0000 (15:06 -0700)]
[NETFILTER]: nf_conntrack_h323: logical-bitwise & confusion in process_setup()
logical-bitwise & confusion
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Wed, 19 Mar 2008 00:15:58 +0000 (17:15 -0700)]
[RT2X00] drivers/net/wireless/rt2x00/rt2x00dev.c: remove dead code, fix warning
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert P. J. Day [Tue, 18 Mar 2008 07:59:23 +0000 (00:59 -0700)]
[NET]: Add debugging names to __RW_LOCK_UNLOCKED macros.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Mar 2008 07:37:55 +0000 (00:37 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/davem/net-2.6
Conflicts:
drivers/net/wireless/rt2x00/rt2x00dev.c
net/8021q/vlan_dev.c
David S. Miller [Tue, 18 Mar 2008 06:44:31 +0000 (23:44 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/torvalds/linux-2.6
Al Viro [Tue, 18 Mar 2008 05:50:23 +0000 (22:50 -0700)]
[IPV4]: esp_output() misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Tue, 18 Mar 2008 05:49:48 +0000 (22:49 -0700)]
[8021Q]: vlan_dev misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Tue, 18 Mar 2008 05:49:16 +0000 (22:49 -0700)]
xfrm: ->eth_proto is __be16
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Tue, 18 Mar 2008 05:48:46 +0000 (22:48 -0700)]
[IPV4]: ipv4_is_lbcast() misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Tue, 18 Mar 2008 05:48:03 +0000 (22:48 -0700)]
[SUNRPC]: net/* NULL noise
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Tue, 18 Mar 2008 05:47:32 +0000 (22:47 -0700)]
[SCTP]: fix misannotated __sctp_rcv_asconf_lookup()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Tue, 18 Mar 2008 05:46:46 +0000 (22:46 -0700)]
[PKT_SCHED]: annotate cls_u32
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Tue, 18 Mar 2008 05:44:53 +0000 (22:44 -0700)]
[NET] endianness noise: INADDR_ANY
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Mar 2008 19:06:33 +0000 (12:06 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6
Linus Torvalds [Mon, 17 Mar 2008 16:52:24 +0000 (09:52 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ahci: Add Marvell 6121 SATA support
pata_ali: use atapi_cmd_type() to determine cmd type instead of transfer size
ahci: implement skip_host_reset parameter
ahci: request all PCI BARs
devres: implement pcim_iomap_regions_request_all()
libata-acpi: improve dock event handling
Linus Torvalds [Mon, 17 Mar 2008 16:52:19 +0000 (09:52 -0700)]
Merge git://git./linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
virtio: fix race in enable_cb
virtio: Enable netpoll interface for netconsole logging
virtio: handle > 2 billion page balloon targets
virtio: Fix sysfs bits to have proper block symlink
virtio: Use spin_lock_irqsave/restore for virtio-pci
Al Viro [Sun, 16 Mar 2008 22:48:08 +0000 (22:48 +0000)]
hfs_bnode_find() can fail, resulting in hfs_bnode_split() breakage
oops and fs corruption; the latter can happen even on valid fs in case of oom.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jose Alberto Reguero [Thu, 13 Mar 2008 22:22:24 +0000 (23:22 +0100)]
ahci: Add Marvell 6121 SATA support
Signed-off-by: Jose Alberto Reguero <jareguero@telefonica.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Tue, 11 Mar 2008 02:35:00 +0000 (11:35 +0900)]
pata_ali: use atapi_cmd_type() to determine cmd type instead of transfer size
pata_ali was using qc->nbytes to determine whether a command is
data transfer type or not. As now qc->nbytes can be extended by
padding and draining buffers, these tests are not useful anymore.
Use atapi_cmd_type() instead.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Mon, 10 Mar 2008 01:25:25 +0000 (10:25 +0900)]
ahci: implement skip_host_reset parameter
Under certain circumstances (SSP turned off by the BIOS) and for
debugging purposes, skipping global controller reset is helpful. Add
a kernel parameter for it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Tue, 11 Mar 2008 10:52:31 +0000 (19:52 +0900)]
ahci: request all PCI BARs
ahci is often implemented with accompanying SFF compatible interface
and legacy IDE driver may attach to the legacy IO ports when the
controller is already claimed by ahci and vice-versa. This patch
makes ahci use pcim_iomap_regions_request_all() so that all IO regions
are claimed on attach.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Wed, 12 Mar 2008 06:26:34 +0000 (15:26 +0900)]
devres: implement pcim_iomap_regions_request_all()
Some drivers need to reserve all PCI BARs to prevent other drivers
misusing unoccupied BARs. pcim_iomap_regions_request_all() requests
all BARs and iomap specified BARs.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Wed, 12 Mar 2008 05:24:43 +0000 (14:24 +0900)]
libata-acpi: improve dock event handling
Improve ACPI hotplug handling such that dock event is handled properly.
* Register handlers for dock events.
* Directly detach device on EJECT_REQUEST instead of signaling hotplug
event. This prevents libata from accessing severed controller
and/or device.
* While at it, use named constants for ACPI events and move uevent
signaling inside host lock.
Original patch and testing by Holger Macht.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Holger Macht <hmacht@suse.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Harvey Harrison [Thu, 6 Mar 2008 15:55:09 +0000 (15:55 +0000)]
ioc3.c: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
drivers/sn/ioc3.c | 22 +++++++++++-----------
1 files changed, 11 insertions(+), 11 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Joakim Tjernlund [Thu, 6 Mar 2008 10:48:46 +0000 (18:48 +0800)]
ucc_geth: use correct thread number for 10/100Mbps link
Use thread number of 1 for 10/100Mbps link instead of 4.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Mitch Williams [Fri, 7 Mar 2008 18:32:13 +0000 (10:32 -0800)]
igb: Correctly get protocol information
We can't look at the socket to get protocol information. We should
instead look directly at the packet, and hope there are no IPv6
option headers.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ralf Baechle [Sat, 8 Mar 2008 16:58:33 +0000 (16:58 +0000)]
[IOC3] Fix section missmatch
LD drivers/net/built-in.o
WARNING: drivers/net/built-in.o(.text+0x3468): Section mismatch in reference fro
m the function ioc3_probe() to the function .devinit.text:ioc3_serial_probe()
The function ioc3_probe() references
the function __devinit ioc3_serial_probe().
This is often because ioc3_probe lacks a __devinit
annotation or the annotation of ioc3_serial_probe is wrong.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Grant Grundler [Sun, 9 Mar 2008 01:33:16 +0000 (18:33 -0700)]
2.6.25-rc4 de_stop_rxtx polling wrong
This untested patch _should_ fix:
"(net de2104x) Kernel panic with de2104x tulip driver on boot"
http://bugzilla.kernel.org/show_bug.cgi?id=3156
But the bug submitter isn't responding. Same fix has been applied
to tulip.c (several years ago) and uli526x.c (Feb 2008) drivers.
[ The panic reported in the bug report was removed in a recently
(march 2008) accepted patch from Ondrej Zary. ]
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ayaz Abdulla [Mon, 10 Mar 2008 19:58:21 +0000 (14:58 -0500)]
forcedeth: limit tx to 16
This is a critical patch which adds a workaround for a HW bug. The patch
will limit the number of outstanding tx packets to 16. Otherwise, the HW
could send out packets with bad checksums.
The driver will still setup the tx packets into the ring, however, will
only set the Valid bit on 16 packets at a time.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Mon, 10 Mar 2008 21:57:20 +0000 (21:57 +0000)]
3c501: Further coding style fixes
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Divy Le Ray [Thu, 13 Mar 2008 07:13:30 +0000 (00:13 -0700)]
cxgb3: Fix transmit queue stop mechanism
The last change in the Tx queue stop mechanism opens a window
where the Tx queue might be stopped after pending credits
returned.
Tx credits are returned via a control message generated by the HW.
It returns tx credits on demand, triggered by a completion bit
set in selective transmit packet headers.
The current code can lead to the Tx queue stopped
with all pending credits returned, and the current frame
not triggering a credit return. The Tx queue will then never be
awaken.
The driver could alternatively request a completion for packets
that stop the queue. It's however safer at this point to go back
to the pre-existing behaviour.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stefan Roese [Thu, 13 Mar 2008 15:59:43 +0000 (16:59 +0100)]
NEWEMAC: Add compatible "ibm,tah" to tah matching table
Add "ibm,tah" to the compatible matching table of the ibm_newemac
tah driver. The type "tah" is still preserved for compatibility reasons.
New dts files should use the compatible property though.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jean-Christophe Dubois [Thu, 13 Mar 2008 22:56:36 +0000 (14:56 -0800)]
rndis_host: fix transfer size negotiation
This patch should resolve a problem that's troubled support for
some RNDIS peripherals. It seems to have boiled down to using a
variable to establish transfer size limits before it was assigned,
which caused those devices to fallback to a default "jumbogram"
mode we don't support. Fix by assigning it earlier for RNDIS.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
[ cleanups ]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Pravin M. Bathija [Fri, 14 Mar 2008 09:52:28 +0000 (10:52 +0100)]
NEWEMAC: fix support for pause packets
Problem Description and Fix
---------------------------
When a pause packet(with destination as reserved Multicast address) is
received by the EMAC hardware to control the flow of frames being
transmitted by it, it is dropped by the hardware unless the reserved
Multicast address is hashed in to the GAHT[1-4] registers. This code fix
adds the default reserved multicast address to the GAHT[1-4] registers
in the EMAC(s) present on the chip. The flow control with Pause packets
will only work if the following register bits are programmed in EMAC:
EMACx_MR1[APP] = 1
EMACx_RMR[BAE] = 1
EMACx_RMR[MAE] = 1
Behavior that may be observed in a running system
-------------------------------------------------
A host transferring data from a PPC based system may send a Pause packet
to the PPC EMAC requesting it to slow down the flow of packets. If the
default reserved multicast MAC address is not programmed into the
GAHT[1-4] registers this Pause packet will be dropped by PPC EMAC and no
Flow Control will be done.
Signed-off-by: Pravin M. Bathija <pbathija@amcc.com>
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Christian Borntraeger [Fri, 14 Mar 2008 13:17:05 +0000 (14:17 +0100)]
virtio: fix race in enable_cb
There is a race in virtio_net, dealing with disabling/enabling the callback.
I saw the following oops:
kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:218!
illegal operation: 0001 [#1] SMP
Modules linked in: sunrpc dm_mod
CPU: 2 Not tainted
2.6.25-rc1zlive-host-10623-gd358142-dirty #99
Process swapper (pid: 0, task:
000000000f85a610, ksp:
000000000f873c60)
Krnl PSW :
0404300180000000 00000000002b81a6 (vring_disable_cb+0x16/0x20)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
Krnl GPRS:
0000000000000001 0000000000000001 0000000010005800 0000000000000001
000000000f3a0900 000000000f85a610 0000000000000000 0000000000000000
0000000000000000 000000000f870000 0000000000000000 0000000000001237
000000000f3a0920 000000000010ff74 00000000002846f6 000000000fa0bcd8
Krnl Code:
00000000002b819a:
a7110001 tmll %r1,1
00000000002b819e:
a7840004 brc 8,2b81a6
00000000002b81a2:
a7f40001 brc 15,2b81a4
>
00000000002b81a6:
a51b0001 oill %r1,1
00000000002b81aa:
40102000 sth %r1,0(%r2)
00000000002b81ae: 07fe bcr 15,%r14
00000000002b81b0:
eb7ff0380024 stmg %r7,%r15,56(%r15)
00000000002b81b6:
a7f13e00 tmll %r15,15872
Call Trace:
([<
000000000fa0bcd0>] 0xfa0bcd0)
[<
00000000002b8350>] vring_interrupt+0x5c/0x6c
[<
000000000010ab08>] do_extint+0xb8/0xf0
[<
0000000000110716>] ext_no_vtime+0x16/0x1a
[<
0000000000107e72>] cpu_idle+0x1c2/0x1e0
The problem can be triggered with a high amount of host->guest traffic.
I think its the following race:
poll says netif_rx_complete
poll calls enable_cb
enable_cb opens the interrupt mask
a new packet comes, an interrupt is triggered----\
enable_cb sees that there is more work |
enable_cb disables the interrupt |
. V
. interrupt is delivered
. skb_recv_done does atomic napi test, ok
some waiting disable_cb is called->check fails->bang!
.
poll would do napi check
poll would do disable_cb
The fix is to let enable_cb not disable the interrupt again, but expect the
caller to do the cleanup if it returns false. In that case, the interrupt is
only disabled, if the napi test_set_bit was successful.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned up doco)
Amit Shah [Fri, 29 Feb 2008 10:54:50 +0000 (16:24 +0530)]
virtio: Enable netpoll interface for netconsole logging
Add a new poll_controller handler that the netpoll interface needs.
This enables netconsole logging from a kvm guest over the virtio
net interface.
Signed-off-by: Amit Shah <amitshah@gmx.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2008 03:58:15 +0000 (22:58 -0500)]
virtio: handle > 2 billion page balloon targets
If the host asks for a huge target towards_target() can overflow, and
we up oops as we try to release more pages than we have. The simple
fix is to use a 64-bit value.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Jeremy Katz [Sun, 2 Mar 2008 22:00:15 +0000 (17:00 -0500)]
virtio: Fix sysfs bits to have proper block symlink
Fix up so that the virtio_blk devices in sysfs link correctly to their
block device. This then allows them to be detected by hal, etc
Signed-off-by: Jeremy Katz <katzj@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Anthony Liguori [Sun, 2 Mar 2008 22:37:48 +0000 (16:37 -0600)]
virtio: Use spin_lock_irqsave/restore for virtio-pci
virtio-pci acquires its spin lock in an interrupt context so it's necessary
to use spin_lock_irqsave/restore variants. This patch fixes guest SMP when
using virtio devices in KVM.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Julia Lawall [Sat, 15 Mar 2008 16:05:02 +0000 (17:05 +0100)]
drivers/net/atl1/atl1_main.c: remove unused variable
The variable update_rx is initialized but never used otherwise.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
type T;
identifier i;
constant C;
@@
(
extern T i;
|
- T i;
<+... when != i
- i = C;
...+>
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Julia Lawall [Sat, 15 Mar 2008 16:04:39 +0000 (17:04 +0100)]
drivers/net/ipg.c: remove unused variable
The variable gig is initialized but never used otherwise.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
type T;
identifier i;
constant C;
@@
(
extern T i;
|
- T i;
<+... when != i
- i = C;
...+>
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Al Viro [Sun, 16 Mar 2008 22:22:04 +0000 (22:22 +0000)]
epic100 endianness annotations and fixes
* "powerpc or sparc" is not the same as "big-endian", fix the ifdef
* since we tell the card to byteswap the descriptors on big-endian,
we ought to leave them host-endian...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Al Viro [Sun, 16 Mar 2008 22:22:14 +0000 (22:22 +0000)]
ipg fix
spurious cpu_to_le64()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Al Viro [Sun, 16 Mar 2008 22:22:34 +0000 (22:22 +0000)]
more misannotations: ne2k-pci
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Al Viro [Sun, 16 Mar 2008 22:23:04 +0000 (22:23 +0000)]
fore2000 - fix misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Al Viro [Sun, 16 Mar 2008 22:22:44 +0000 (22:22 +0000)]
wan/farsync: copy_from_user() to iomem is wrong
kmalloc intermediate buffer(), do copy_from_user() + memcpy_toio()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Al Viro [Sun, 16 Mar 2008 22:43:06 +0000 (22:43 +0000)]
r6040 endianness fixes
pci_unmap_single() on little-endian address
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Linus Torvalds [Sun, 16 Mar 2008 23:32:14 +0000 (16:32 -0700)]
Linux 2.6.25-rc6
Linus Torvalds [Sun, 16 Mar 2008 17:48:23 +0000 (10:48 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/kyle/parisc-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
[PARISC] make ptr_to_pide() static
[PARISC] head.S: section mismatch fixes
[PARISC] add back Crestone Peak cpu
[PARISC] futex: special case cmpxchg NULL in kernel space
[PARISC] clean up show_stack
[PARISC] add pa8900 CPUs to hardware inventory
[PARISC] clean up include/asm-parisc/elf.h
[PARISC] move defconfig to arch/parisc/configs/
[PARISC] add back AD1889 MAINTAINERS entry
[PARISC] pdc_console: fix bizarre panic on boot
[PARISC] dump_stack in show_regs
[PARISC] pdc_stable: fix compile errors
[PARISC] remove unused pdc_iodc_printf function
[PARISC] bump __NR_syscalls
[PARISC] unbreak pgalloc.h
[PARISC] move VMALLOC_* definitions to fixmap.h
[PARISC] wire up timerfd syscalls
[PARISC] remove old timerfd syscall
FUJITA Tomonori [Mon, 10 Mar 2008 11:43:24 +0000 (20:43 +0900)]
[PARISC] make ptr_to_pide() static
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Helge Deller [Wed, 26 Dec 2007 17:07:01 +0000 (18:07 +0100)]
[PARISC] head.S: section mismatch fixes
- move boot_args[] into the init section
- move $global$ into the read_mostly section
- fix the following two section mismatches:
WARNING: vmlinux.o(.text+0x9c): Section mismatch: reference to .init.text:start_kernel (between '$pgt_fill_loop' and '$is_pa20')
WARNING: vmlinux.o(.text+0xa0): Section mismatch: reference to .init.text:start_kernel (between '$pgt_fill_loop' and '$is_pa20')
Signed-off-by: Helge Deller <deller@gmx.de>
SIgned-off-by: Kyle McMartin <kyle@mcmartin.ca>
Kyle McMartin [Sat, 1 Mar 2008 19:40:43 +0000 (11:40 -0800)]
[PARISC] add back Crestone Peak cpu
Crestone Peak Slow is the 800MHz PA-8800 cpu in the C8000.
0x88B is probably the Crestone Peak Fast.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Kyle McMartin [Sat, 1 Mar 2008 18:25:52 +0000 (10:25 -0800)]
[PARISC] futex: special case cmpxchg NULL in kernel space
Commit
a0c1e9073ef7428a14309cba010633a6cd6719ea added code to futex.c
to detect whether futex_atomic_cmpxchg_inatomic was implemented at run
time:
+ curval = cmpxchg_futex_value_locked(NULL, 0, 0);
+ if (curval == -EFAULT)
+ futex_cmpxchg_enabled = 1;
This is bogus on parisc, since page zero in kernel virtual space is the
gateway page for syscall entry, and should not be read from the kernel.
(That, and we really don't like the kernel faulting on its own address
space...)
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Kyle McMartin [Sat, 1 Mar 2008 18:30:19 +0000 (10:30 -0800)]
[PARISC] clean up show_stack
When we show_regs, we obviously have a struct pt_regs of the calling
frame. Use these in show_stack so we don't have the entire bogus call trace
up to the show_stack call.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
James Bottomley [Wed, 20 Feb 2008 21:53:02 +0000 (15:53 -0600)]
[PARISC] add pa8900 CPUs to hardware inventory
This patch adds the known pa8900 CPUs to the inventory list and removes
the Crestone Peak one which apparently never escaped into the wild.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Randolph Chung [Sun, 24 Feb 2008 18:44:21 +0000 (10:44 -0800)]
[PARISC] clean up include/asm-parisc/elf.h
Cleanup some cruft. No functionality changes.
Signed-off-by: Randolph Chung <tausq@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Adrian Bunk [Tue, 26 Feb 2008 19:55:17 +0000 (21:55 +0200)]
[PARISC] move defconfig to arch/parisc/configs/
This patch moves the default parisc defconfig to
arch/parisc/configs/generic_defconfig where it belongs and selects it as
the default defconfig through KBUILD_DEFCONFIG.
Signed-off-by: Adrian Bunk <adrian.bunk@movial.fi>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Thibaut VARENE [Wed, 20 Feb 2008 20:05:56 +0000 (21:05 +0100)]
[PARISC] add back AD1889 MAINTAINERS entry
Signed-off-by: Thibaut VARENE <T-Bone@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Kyle McMartin [Tue, 19 Feb 2008 07:34:34 +0000 (23:34 -0800)]
[PARISC] pdc_console: fix bizarre panic on boot
Commit
721fdf34167580ff98263c74cead8871d76936e6 introduced a subtle bug
by accidently removing the "static" from iodc_dbuf. This resulted in, what
appeared to be, a trap without *current set to a task. Probably the result of
a trap in real mode while calling firmware.
Also do other misc clean ups. Since the only input from firmware is non
blocking, share iodc_dbuf between input and output, and spinlock the
only callers.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Kyle McMartin [Tue, 19 Feb 2008 07:26:46 +0000 (23:26 -0800)]
[PARISC] dump_stack in show_regs
Originally, show_stack was used in BUG() output. However, a recent commit
changed it to print register state (no idea what that's supposed to help,
really...) and parisc was missing a backtrace because of it.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Joel Soete [Tue, 19 Feb 2008 02:26:11 +0000 (18:26 -0800)]
[PARISC] pdc_stable: fix compile errors
Signed-off-by: Joel Soete <rubisher@scarlet.be>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Kyle McMartin [Mon, 18 Feb 2008 22:26:41 +0000 (14:26 -0800)]
[PARISC] remove unused pdc_iodc_printf function
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Kyle McMartin [Mon, 18 Feb 2008 22:21:17 +0000 (14:21 -0800)]
[PARISC] bump __NR_syscalls
oops, forgot this in the previous commit.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Kyle McMartin [Mon, 18 Feb 2008 22:16:26 +0000 (14:16 -0800)]
[PARISC] unbreak pgalloc.h
Commit
2f569afd9ced9ebec9a6eb3dbf6f83429be0a7b4 broke the compile
rather spectacularly. Fix code errors.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Kyle McMartin [Mon, 18 Feb 2008 22:13:43 +0000 (14:13 -0800)]
[PARISC] move VMALLOC_* definitions to fixmap.h
They make way more sense here, really...
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Kyle McMartin [Mon, 18 Feb 2008 22:00:18 +0000 (14:00 -0800)]
[PARISC] wire up timerfd syscalls
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Kyle McMartin [Mon, 18 Feb 2008 21:57:26 +0000 (13:57 -0800)]
[PARISC] remove old timerfd syscall
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Linus Torvalds [Sat, 15 Mar 2008 18:53:32 +0000 (11:53 -0700)]
ACPI: Remove ACPI_CUSTOM_DSDT_INITRD option
This essentially reverts commit
71fc47a9adf8ee89e5c96a47222915c5485ac437
("ACPI: basic initramfs DSDT override support"), because the code simply
isn't ready.
It did ugly things to the init sequence to populate the rootfs image
early, but that just ended up showing other problems with the whole
approach. The fact is, the VFS layer simply isn't initialized this
early, and the relevant ACPI code should either run much later, or this
shouldn't be done at all.
For 2.6.25, we'll just pick the latter option. We can revisit this
concept later if necessary.
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Eric Piel <eric.piel@tremplin-utc.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Markus Gaugusch <dsdt@gaugusch.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Sat, 15 Mar 2008 15:00:38 +0000 (16:00 +0100)]
tifm_sd: DATA_CARRY is not boolean in tifm_sd_transfer_data()
DATA_CARRY is not boolean
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 15 Mar 2008 16:21:04 +0000 (09:21 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
[NET]: Fix tbench regression in 2.6.25-rc1
Ingo Molnar [Fri, 14 Mar 2008 21:17:08 +0000 (22:17 +0100)]
sched: simplify sched_slice()
Use the existing calc_delta_mine() calculation for sched_slice(). This
saves a divide and simplifies the code because we share it with the
other /cfs_rq->load users.
It also improves code size:
text data bss dec hex filename
42659 2740 144 45543 b1e7 sched.o.before
42093 2740 144 44977 afb1 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Ingo Molnar [Fri, 14 Mar 2008 21:16:08 +0000 (22:16 +0100)]
sched: fix fair sleepers
Fair sleepers need to scale their latency target down by runqueue
weight. Otherwise busy systems will gain ever larger sleep bonus.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Peter Zijlstra [Fri, 14 Mar 2008 20:12:12 +0000 (21:12 +0100)]
sched: fix overload performance: buddy wakeups
Currently we schedule to the leftmost task in the runqueue. When the
runtimes are very short because of some server/client ping-pong,
especially in over-saturated workloads, this will cycle through all
tasks trashing the cache.
Reduce cache trashing by keeping dependent tasks together by running
newly woken tasks first. However, by not running the leftmost task first
we could starve tasks because the wakee can gain unlimited runtime.
Therefore we only run the wakee if its within a small
(wakeup_granularity) window of the leftmost task. This preserves
fairness, but does alternate server/client task groups.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 14 Mar 2008 21:20:01 +0000 (22:20 +0100)]
sched: fix calc_delta_mine()
lw->weight can be 0 for a short time during bootup.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Ingo Molnar [Fri, 14 Mar 2008 22:48:28 +0000 (23:48 +0100)]
sched: fix update_load_add()/sub()
Clear the cached inverse value when updating load. This is needed for
calc_delta_mine() to work correctly when using the rq load.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Peter Zijlstra [Fri, 14 Mar 2008 19:55:51 +0000 (20:55 +0100)]
sched: min_vruntime fix
Current min_vruntime tracking is incorrect and will cause serious
problems when we don't run the leftmost task for some reason.
min_vruntime does two things; 1) it's used to determine a forward
direction when the u64 vruntime wraps, 2) it's used to track the
leftmost vruntime to position newly enqueued tasks from.
The current logic advances min_vruntime whenever the current task's
vruntime advance. Because the current task may pass the leftmost task
still waiting we're failing the second goal. This causes new tasks to be
placed too far ahead and thus penalizes their runtime.
Fix this by making min_vruntime the min_vruntime of the waiting tasks by
tracking it in enqueue/dequeue, and compare against current's vruntime
to obtain the absolute minimum when placing new tasks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Hiroshi Shimamoto [Mon, 10 Mar 2008 18:01:20 +0000 (11:01 -0700)]
sched: fix race in schedule()
Fix a hard to trigger crash seen in the -rt kernel that also affects
the vanilla scheduler.
There is a race condition between schedule() and some dequeue/enqueue
functions; rt_mutex_setprio(), __setscheduler() and sched_move_task().
When scheduling to idle, idle_balance() is called to pull tasks from
other busy processor. It might drop the rq lock. It means that those 3
functions encounter on_rq=0 and running=1. The current task should be
put when running.
Here is a possible scenario:
CPU0 CPU1
| schedule()
| ->deactivate_task()
| ->idle_balance()
| -->load_balance_newidle()
rt_mutex_setprio() |
| --->double_lock_balance()
*get lock *rel lock
* on_rq=0, ruuning=1 |
* sched_class is changed |
*rel lock *get lock
: |
:
->put_prev_task_rt()
->pick_next_task_fair()
=> panic
The current process of CPU1(P1) is scheduling. Deactivated P1, and the
scheduler looks for another process on other CPU's runqueue because CPU1
will be idle. idle_balance(), load_balance_newidle() and
double_lock_balance() are called and double_lock_balance() could drop
the rq lock. On the other hand, CPU0 is trying to boost the priority of
P1. The result of boosting only P1's prio and sched_class are changed to
RT. The sched entities of P1 and P1's group are never put. It makes
cfs_rq invalid, because the cfs_rq has curr and no leaf, but
pick_next_task_fair() is called, then the kernel panics.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Fri, 14 Mar 2008 23:49:41 +0000 (16:49 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: fw-ohci: shut up false compiler warning on PPC32
firewire: fw-ohci: use dma_alloc_coherent for ar_buffer
ieee1394: sbp2: fix for SYM13FW500 bridge (Datafab disk)
firewire: fw-sbp2: fix for SYM13FW500 bridge (Datafab disk)
firewire: update Kconfig help text
firewire: warn on fatal condition in topology code
firewire: fw-sbp2: set single-phase retry_limit
firewire: fw-ohci: Apple UniNorth 1st generation support
firewire: fw-ohci: PPC PMac platform code
firewire: endianess annotations
firewire: endianess fix
J. Bruce Fields [Fri, 14 Mar 2008 23:37:11 +0000 (19:37 -0400)]
nfsd: fix oops on access from high-numbered ports
This bug was always here, but before my commit
6fa02839bf9412e18e77
("recheck for secure ports in fh_verify"), it could only be triggered by
failure of a kmalloc(). After that commit it could be triggered by a
client making a request from a non-reserved port for access to an export
marked "secure". (Exports are "secure" by default.)
The result is a struct svc_export with a reference count one too low,
resulting in likely oopses next time the export is accessed.
The reference counting here is not straightforward; a later patch will
clean up fh_verify().
Thanks to Lukas Hejtmanek for the bug report and followup.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marc Dionne [Fri, 14 Mar 2008 13:11:29 +0000 (13:11 +0000)]
struct export_operations: adjust comments to match current members
The comments in the definition of struct export_operations don't match the
current members.
Add a comment for the 2 new functions and remove 2 comments for unused ones.
Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stefan Richter [Thu, 13 Mar 2008 23:27:49 +0000 (00:27 +0100)]
firewire: fw-ohci: shut up false compiler warning on PPC32
Shut up "may be used uninitialised in this function" warnings due to
PPC32's implementation of dma_alloc_coherent().
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Jarod Wilson [Wed, 12 Mar 2008 21:43:26 +0000 (17:43 -0400)]
firewire: fw-ohci: use dma_alloc_coherent for ar_buffer
Currently, we do nothing to guarantee we have a consistent DMA buffer for
asynchronous receive packets. Rather than doing several sync's following a
dma_map_single() to get consistent buffers, just switch to using
dma_alloc_coherent().
Resolves constant buffer failures on my own x86_64 laptop w/4GB of RAM and
likely to fix a number of other failures witnessed on x86_64 systems with
4GB of RAM or more.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Tue, 11 Mar 2008 21:32:52 +0000 (22:32 +0100)]
ieee1394: sbp2: fix for SYM13FW500 bridge (Datafab disk)
Fix I/O errors due to SYM13FW500's inability to handle larger request
sizes. Reported by Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> for
firewire-sbp2 in https://bugzilla.redhat.com/show_bug.cgi?id=436879
This fix is necessary because sbp2's default request size limit has been
lifted since 2.6.25-rc1.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Stefan Richter [Tue, 11 Mar 2008 21:32:03 +0000 (22:32 +0100)]
firewire: fw-sbp2: fix for SYM13FW500 bridge (Datafab disk)
Fix I/O errors due to SYM13FW500's inability to handle larger request
sizes. Reported by Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> in
https://bugzilla.redhat.com/show_bug.cgi?id=436879
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Stefan Richter [Sat, 8 Mar 2008 23:27:20 +0000 (00:27 +0100)]
firewire: update Kconfig help text
Remove some less necessary information, point out that video1394 and
dv1394 should be blacklisted along with ohci1394.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Stefan Richter [Sat, 8 Mar 2008 21:38:16 +0000 (22:38 +0100)]
firewire: warn on fatal condition in topology code
If this ever happens to anybody, we want to have it in his log.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Jarod Wilson [Fri, 7 Mar 2008 06:43:01 +0000 (01:43 -0500)]
firewire: fw-sbp2: set single-phase retry_limit
Per the SBP-2 specification, all SBP-2 target devices must have a BUSY_TIMEOUT
register. Per the 1394-1995 specification, the retry_limt portion of the
register should be set to 0x0 initially, and set on the target by a logged in
initiator (i.e., a Linux host w/firewire controller(s)).
Well, as it turns out, lots of devices these days have actually moved on to
starting to implement SBP-3 compliance, which says that retry_limit should
default to 0xf instead (yes, SBP-3 stomps directly on 1394-1995, oops).
Prior to this change, the firewire driver stack didn't touch retry_limit, and
any SBP-3 compliant device worked fine, while SBP-2 compliant ones were unable
to retransmit when the host returned an ack_busy_X, which resulted in stalled
out I/O, eventually causing the SCSI layer to give up and offline the device.
The simple fix is for us to set retry_limit to 0xf in the register for all
devices (which actually matches what the old ieee1394 stack did).
Prior to this change, a hard disk behind an SBP-2 Prolific PL-3507 bridge chip
would routinely encounter buffer I/O errors and wind up offlined by the SCSI
layer. With this change, I've encountered zero I/O failures moving tens of GB
of data around.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>