Eric Dumazet [Wed, 23 Feb 2011 10:56:17 +0000 (10:56 +0000)]
net_sched: SFB flow scheduler
This is the Stochastic Fair Blue scheduler, based on work from :
W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.
http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
This implementation is based on work done by Juliusz Chroboczek
General SFB algorithm can be found in figure 14, page 15:
B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
if (B[i][h{i}].qlen > bin_size)
B[i][h{i}].p_mark += p_increment;
else if (B[i][h{i}].qlen == 0)
B[i][h{i}].p_mark -= p_decrement;
p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
if (p_min == 1.0)
ratelimit();
else
mark/drop with probabilty p_min;
I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :
http://thread.gmane.org/gmane.linux.network/90225
http://thread.gmane.org/gmane.linux.network/90375
Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.
tc qdisc add dev $DEV parent 1:11 handle 11: \
est 0.5sec 2sec sfb limit 128
tc filter add dev $DEV protocol ip parent 11: handle 3 \
flow hash keys dst divisor 1024
Notes:
1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.
2) ECN is enabled by default, unlike RED/CHOKe/GRED
With help from Patrick McHardy & Andi Kleen
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:44:31 +0000 (18:44 -0800)]
net: Make flow cache paths use a const struct flowi.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:38:51 +0000 (18:38 -0800)]
xfrm: Mark flowi arg to xfrm_resolve_and_create_bundle() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:38:14 +0000 (18:38 -0800)]
xfrm: Mark flowi arg to xfrm_dst_{alloc_copy,update_origin}() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:36:50 +0000 (18:36 -0800)]
xfrm: Mark flowi arg to xfrm_bundle_create() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:35:39 +0000 (18:35 -0800)]
xfrm: Mark flowi arg to xfrm_tmpl_resolve{,_one}() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:33:42 +0000 (18:33 -0800)]
xfrm: Mark flowi arg to xfrm_expand_policies() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:31:08 +0000 (18:31 -0800)]
xfrm: Mark flowi arg to xfrm_policy_{lookup_by_type,match}() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:29:20 +0000 (18:29 -0800)]
xfrm: Kill strict arg to xfrm_bundle_ok().
Always set to "0".
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:27:22 +0000 (18:27 -0800)]
net: Mark flowi arg to flow_cache_uli_match() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:24:19 +0000 (18:24 -0800)]
xfrm: Mark flowi arg to xfrm_state_find() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:22:34 +0000 (18:22 -0800)]
xfrm: Mark flowi arg to xfrm_init_tempstate() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:21:31 +0000 (18:21 -0800)]
xfrm: Mark flowi arg to xfrm_state_look_at() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:13:15 +0000 (18:13 -0800)]
xfrm: Mark flowi arg to security_xfrm_state_pol_flow_match() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:07:39 +0000 (18:07 -0800)]
xfrm: Mark flowi arg to xfrm_selector_match() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 02:02:12 +0000 (18:02 -0800)]
xfrm: Mark token args to addr_match() const.
Also, make it return a real bool.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 01:59:59 +0000 (17:59 -0800)]
xfrm: Mark flowi arg to xfrm_type->reject() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 01:51:44 +0000 (17:51 -0800)]
xfrm: Mark flowi arg to ->init_tempsel() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 01:48:57 +0000 (17:48 -0800)]
xfrm: Mark flowi arg to ->fill_dst() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 01:47:10 +0000 (17:47 -0800)]
xfrm: Mark flowi arg to ->get_tos() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 23 Feb 2011 01:42:56 +0000 (17:42 -0800)]
xfrm: Mark flowi arg const in key extraction helpers.
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Sun, 20 Feb 2011 16:14:23 +0000 (16:14 +0000)]
cls_u32: fix sparse warnings
The variable _data is used in asm-generic to define sections
which causes sparse warnings, so just rename the variable.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 22 Feb 2011 19:15:29 +0000 (11:15 -0800)]
Merge branch 'net/ax88796' of git://git.pengutronix.de/git/mkl/linux-2.6
Ajit Khaparde [Sun, 20 Feb 2011 11:42:22 +0000 (11:42 +0000)]
be2net: use hba_port_num instead of port_num
Use hba_port_num for phy loopback and ethtool phy identification.
From: Suresh R <suresh.reddy@emulex.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ajit Khaparde [Sun, 20 Feb 2011 11:42:07 +0000 (11:42 +0000)]
be2net: add code to display temperature of ASIC
Add support to display temperature of ASIC via ethtool -S
From: Somnath K <somnath.kotur@emulex.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ajit Khaparde [Sun, 20 Feb 2011 11:41:53 +0000 (11:41 +0000)]
be2net: fix to ignore transparent vlan ids wrongly indicated by NIC
With transparent VLAN tagging, the ASIC wrongly indicates packets with VLAN ID.
Strip them off in the driver. The VLAN Tag to be stripped will be given to the host
as an async message.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ajit Khaparde [Sun, 20 Feb 2011 11:41:39 +0000 (11:41 +0000)]
be2net: variable name change
change occurances of stats_ioctl_sent to stats_cmd_sent
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ajit Khaparde [Sun, 20 Feb 2011 11:41:20 +0000 (11:41 +0000)]
be2net: fixes in ethtool selftest
> add missing separator between items in ethtool self_test array
> fix reporting of test resluts when link is down and
when selftest command fails.
From: Suresh R <suresh.reddy@emulex.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ajit Khaparde [Sun, 20 Feb 2011 11:41:04 +0000 (11:41 +0000)]
be2net: add new counters to display via ethtool stats
New counters:
> jabber frame stats
> red drop stats
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 22 Feb 2011 18:21:36 +0000 (10:21 -0800)]
Merge branch 'for-davem' of git://git./linux/kernel/git/bwh/sfc-next-2.6
Eric Dumazet [Fri, 18 Feb 2011 03:26:36 +0000 (03:26 +0000)]
net: add __rcu annotations to sk_wq and wq
Add proper RCU annotations/verbs to sk_wq and wq members
Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)
Fix sunrpc sk_sleep() abuse too
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 22 Feb 2011 18:15:59 +0000 (10:15 -0800)]
Merge branch 'fec' of git://git.pengutronix.de/git/ukl/linux-2.6
Nobuhiro Iwamatsu [Tue, 15 Feb 2011 21:17:32 +0000 (21:17 +0000)]
sh: sh_eth: Add support ethtool
This commit supports following functions.
- get_settings
- set_settings
- nway_reset
- get_msglevel
- set_msglevel
- get_link
- get_strings
- get_ethtool_stats
- get_sset_count
About other function, the device does not support.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde [Mon, 21 Feb 2011 11:41:55 +0000 (12:41 +0100)]
ax88796: use generic mdio_bitbang driver
..instead of using hand-crafted and not proper working version.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Mon, 21 Feb 2011 10:24:39 +0000 (11:24 +0100)]
ax88796: clean up probe and remove function
This way we can remove the struct resource pointers from the private data.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Mon, 21 Feb 2011 09:43:31 +0000 (10:43 +0100)]
ax88796: make pointer to platform data const
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Sun, 20 Feb 2011 18:13:20 +0000 (19:13 +0100)]
ax88796: remove platform_device member from struct ax_device
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Sun, 20 Feb 2011 16:46:18 +0000 (17:46 +0100)]
ax88796: use netdev_<LEVEL> instead of dev_<LEVEL> and pr_<LEVEL>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Sat, 19 Feb 2011 22:07:09 +0000 (23:07 +0100)]
ax88796: remove first_init parameter from ax_init_dev()
ax_init_dev() is always called with first_init=1.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Sat, 19 Feb 2011 21:12:27 +0000 (22:12 +0100)]
ax88796: remove memset of private data
It's allocated via alloc_netdev, thus already zeroed.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Sat, 19 Feb 2011 21:08:33 +0000 (22:08 +0100)]
ax88796: don't use magic ei_status to acces private data
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Sat, 19 Feb 2011 16:07:40 +0000 (17:07 +0100)]
ax88796: fix codingstyle and checkpatch warnings
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Shan Wei [Sat, 19 Feb 2011 21:57:26 +0000 (21:57 +0000)]
sctp: fix compile warnings in sctp_tsnmap_num_gabs
net/sctp/tsnmap.c: In function ‘sctp_tsnmap_num_gabs’:
net/sctp/tsnmap.c:347: warning: ‘start’ may be used uninitialized in this function
net/sctp/tsnmap.c:347: warning: ‘end’ may be used uninitialized in this function
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shan Wei [Sat, 19 Feb 2011 21:55:45 +0000 (21:55 +0000)]
tcp: Remove debug macro of TCP_CHECK_TIMER
Now, TCP_CHECK_TIMER is not used for debuging, it does nothing.
And, it has been there for several years, maybe 6 years.
Remove it to keep code clearer.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sat, 19 Feb 2011 21:52:41 +0000 (21:52 +0000)]
tcp: document tcp_max_ssthresh (Limited Slow-Start)
Base on Ilpo's patch about documenting tcp_max_ssthresh.
(see http://marc.info/?l=linux-netdev&m=
117950581307310&w=2)
According to errata of RFC3742, fix the number of segments increased
during RTT time.
Just to state the occasion to use this parameter, But
about how to set parameter value, maybe some others can do it.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 20 Feb 2011 03:17:35 +0000 (19:17 -0800)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6
Conflicts:
Documentation/feature-removal-schedule.txt
drivers/net/e1000e/netdev.c
net/xfrm/xfrm_policy.c
Jiri Bohac [Thu, 17 Feb 2011 13:12:08 +0000 (13:12 +0000)]
sctp: fix reporting of unknown parameters
commit
5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809 re-worked the
handling of unknown parameters. sctp_init_cause_fixed() can now
return -ENOSPC if there is not enough tailroom in the error
chunk skb. When this happens, the error header is not appended to
the error chunk. In that case, the payload of the unknown parameter
should not be appended either.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Fri, 18 Feb 2011 13:30:17 +0000 (13:30 +0000)]
net: dcb: match dcb_app protocol field with 802.1Qaz spec
The dcb_app protocol field is a __u32 however the 802.1Qaz
specification defines it as a 16 bit field. This patch brings
the structure inline with the spec making it a __u16.
CC: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 18 Feb 2011 22:35:56 +0000 (22:35 +0000)]
tcp: fix inet_twsk_deschedule()
Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule()
This is caused by inet_twsk_purge(), run from process context,
and commit
575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.)
removed the BH disabling that was necessary.
Add the BH disabling but fine grained, right before calling
inet_twsk_deschedule(), instead of whole function.
With help from Linus Torvalds and Eric W. Biederman
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Lezcano <daniel.lezcano@free.fr>
CC: Pavel Emelyanov <xemul@openvz.org>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: stable <stable@kernel.org> (# 2.6.33+)
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 20 Feb 2011 00:42:37 +0000 (16:42 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/kaber/nf-2.6
Linus Torvalds [Fri, 18 Feb 2011 22:20:46 +0000 (14:20 -0800)]
Merge branch 'rtc-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
RTC: Re-enable UIE timer/polling emulation
RTC: Revert UIE emulation removal
RTC: Release mutex in error path of rtc_alarm_irq_enable
Linus Torvalds [Fri, 18 Feb 2011 22:15:05 +0000 (14:15 -0800)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
net: deinit automatic LIST_HEAD
net: dont leave active on stack LIST_HEAD
net: provide default_advmss() methods to blackhole dst_ops
tg3: Restrict phy ioctl access
drivers/net: Call netif_carrier_off at the end of the probe
ixgbe: work around for DDP last buffer size
ixgbe: fix panic due to uninitialised pointer
e1000e: flush all writebacks before unload
e1000e: check down flag in tasks
isdn: hisax: Use l2headersize() instead of dup (and buggy) func.
arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS.
cxgb4vf: Use defined Mailbox Timeout
cxgb4vf: Quiesce Virtual Interfaces on shutdown ...
cxgb4vf: Behave properly when CONFIG_DEBUG_FS isn't defined ...
cxgb4vf: Check driver parameters in the right place ...
pch_gbe: Fix the MAC Address load issue.
iwlwifi: Delete iwl3945_good_plcp_health.
net/can/softing: make CAN_SOFTING_CS depend on CAN_SOFTING
netfilter: nf_iterate: fix incorrect RCU usage
pch_gbe: Fix the issue that the receiving data is not normal.
...
Linus Torvalds [Fri, 18 Feb 2011 20:44:41 +0000 (12:44 -0800)]
Merge branch 'for-linus/bugfixes' of git://xenbits.xen.org/people/ianc/linux-2.6
* 'for-linus/bugfixes' of git://xenbits.xen.org/people/ianc/linux-2.6:
xen: suspend and resume system devices when running PVHVM
David S. Miller [Fri, 18 Feb 2011 20:43:09 +0000 (12:43 -0800)]
ipv4: Implement __ip_dev_find using new interface address hash.
Much quicker than going through the FIB tables.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Feb 2011 20:42:28 +0000 (12:42 -0800)]
ipv4: Add hash table of interface addresses.
This will be used to optimize __ip_dev_find() and friends.
With help from Eric Dumazet.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 18 Feb 2011 20:36:06 +0000 (12:36 -0800)]
Merge branch 'fixes-2.6.38' of git://git./linux/kernel/git/tj/wq
* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long
workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
workqueue: wake up a worker when a rescuer is leaving a gcwq
Eric Dumazet [Thu, 17 Feb 2011 22:59:19 +0000 (22:59 +0000)]
net: deinit automatic LIST_HEAD
commit
9b5e383c11b08784 (net: Introduce
unregister_netdevice_many()) left an active LIST_HEAD() in
rollback_registered(), with possible memory corruption.
Even if device is freed without touching its unreg_list (and therefore
touching the previous memory location holding LISTE_HEAD(single), better
close the bug for good, since its really subtle.
(Same fix for default_device_exit_batch() for completeness)
Reported-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Eric W. Biderman <ebiderman@xmission.com>
Tested-by: Eric W. Biderman <ebiderman@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Octavian Purdila <opurdila@ixiacom.com>
CC: stable <stable@kernel.org> [.33+]
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 17 Feb 2011 22:54:38 +0000 (22:54 +0000)]
net: dont leave active on stack LIST_HEAD
Eric W. Biderman and Michal Hocko reported various memory corruptions
that we suspected to be related to a LIST head located on stack, that
was manipulated after thread left function frame (and eventually exited,
so its stack was freed and reused).
Eric Dumazet suggested the problem was probably coming from commit
443457242beb (net: factorize
sync-rcu call in unregister_netdevice_many)
This patch fixes __dev_close() and dev_close() to properly deinit their
respective LIST_HEAD(single) before exiting.
References: https://lkml.org/lkml/2011/2/16/304
References: https://lkml.org/lkml/2011/2/14/223
Reported-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Eric W. Biderman <ebiderman@xmission.com>
Tested-by: Eric W. Biderman <ebiderman@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 18 Feb 2011 19:39:01 +0000 (11:39 -0800)]
net: provide default_advmss() methods to blackhole dst_ops
Commit
0dbaee3b37e118a (net: Abstract default ADVMSS behind an
accessor.) introduced a possible crash in tcp_connect_init(), when
dst->default_advmss() is called from dst_metric_advmss()
Reported-by: George Spelvin <linux@horizon.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 18 Feb 2011 19:32:28 +0000 (11:32 -0800)]
Expand CONFIG_DEBUG_LIST to several other list operations
When list debugging is enabled, we aim to readably show list corruption
errors, and the basic list_add/list_del operations end up having extra
debugging code in them to do some basic validation of the list entries.
However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
the debug code due to how they were written. This fixes that.
So the _next_ time we have list_move() problems with stale list entries,
we'll hopefully have an easier time finding them..
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 18 Feb 2011 01:52:36 +0000 (17:52 -0800)]
Merge branch 'pm-fixes' of git://git./linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM / Hibernate: Return error code when alloc_image_page() fails
Linus Torvalds [Fri, 18 Feb 2011 01:52:17 +0000 (17:52 -0800)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: add missing frac fb div flag for dce4+
drm/radeon/kms: do not reject X16 and Y16X16 floating-point formats on r300
drm/nouveau: fix suspend/resume on GPUs that don't have PM support
drm/nouveau: flips/flipd need to always set 'evict' for move_accel_cleanup()
drm/nv40: fix tiling-related setup for a number of chipsets
drm/nouveau: fix non-EDIDful native mode selection
drm/nouveau: Fix detection of DDC-based LVDS on DCB15 boards.
drm/nv04-nv40: Fix NULL dereference when we fail to find an LVDS native mode.
drm/nv10: Fix crash when allocating a BO larger than half the available VRAM.
Linus Torvalds [Fri, 18 Feb 2011 01:51:52 +0000 (17:51 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/qib: Prevent double completions after a timeout or RNR error
IB/qib: Fix double add_timer()
RDMA/nes: Don't generate async events for unregistered devices
Linus Torvalds [Fri, 18 Feb 2011 01:51:27 +0000 (17:51 -0800)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Fix NMI startup bug which also breaks perf.
sparc: fix size argument to find_next_zero_bit()
sparc: use bitmap_set()
sparc32: unaligned memory access (MNA) trap handler bug
Timo Warns [Thu, 17 Feb 2011 21:27:40 +0000 (22:27 +0100)]
fs/partitions: Validate map_count in Mac partition tables
Validate number of blocks in map and remove redundant variable.
Signed-off-by: Timo Warns <warns@pre-sense.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vasanthy Kolluri [Thu, 17 Feb 2011 13:57:19 +0000 (13:57 +0000)]
enic: Always use single transmit and single receive hardware queues per device
We believe that our earlier patch for supporting multiple hardware
receive queues per enic device requires more internal testing. At this
point, we think that it's best to disable the use of multiple receive
queues. The current patch provides an effective means for the same.
Also, we continue to disallow multiple hardware transmit queues per
device. But change the way we enforce this in order to maintain
consistency with the way receive queues are handled.
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Danny Guo <dannguo@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Feb 2011 05:44:24 +0000 (21:44 -0800)]
ipv4: Use const'ify fib_result deep in the route call chains.
The only troublesome bit here is __mkroute_output which wants
to override res->fi and res->type, compute those in local
variables instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Feb 2011 06:04:57 +0000 (22:04 -0800)]
ipv4: Mark fib_combine_itag()'s 'res' arg as const.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Feb 2011 22:56:22 +0000 (14:56 -0800)]
ipv4: Avoid use of signed integers in fib_trie code.
GCC emits all kinds of crazy zero extensions when we go from signed
int, to unsigned short, etc. etc.
This transformation has to be legal because:
1) In tkey_extract_bits() in mask_pfx(), the values are used to
perform shifts, on which negative values are undefined by C.
2) In fib_table_lookup() we perform comparisons with unsigned
values, constants, and additions. None of which should
encounter negative values.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Feb 2011 22:08:44 +0000 (14:08 -0800)]
net: Add initial_ref arg to dst_alloc().
This allows avoiding multiple writes to the initial __refcnt.
The most simplest cases of wanting an initial reference of "1"
in ipv4 and ipv6 have been converted, the rest have been left
along and kept at the existing "0".
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Feb 2011 23:42:37 +0000 (15:42 -0800)]
ipv4: Consolidate ipv4 dst allocation logic.
This also allows us to combine all the dst->flags settings and avoid
read/modify/write sequences to this struct member.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Feb 2011 23:37:09 +0000 (15:37 -0800)]
ipv4: Move rcu_read_{lock,unlock}() into ip_route_output_slow().
Simplifies tail of __ip_route_output_key().
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Feb 2011 23:29:00 +0000 (15:29 -0800)]
ipv4: Simplify output route creation call sequence.
There's a lot of redundancy and unnecessary stack frames
in the output route creation path.
1) Make __mkroute_output() return error pointers.
2) Eliminate ip_mkroute_output() entirely, made possible by #1.
3) Call __mkroute_output() directly and handling the returning error
pointers in ip_route_output_slow().
Signed-off-by: David S. Miller <davem@davemloft.net>
John Stultz [Sat, 12 Feb 2011 02:15:23 +0000 (18:15 -0800)]
RTC: Re-enable UIE timer/polling emulation
This patch re-enables UIE timer/polling emulation for rtc devices
that do not support alarm irqs.
CC: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
John Stultz [Sat, 12 Feb 2011 01:45:40 +0000 (17:45 -0800)]
RTC: Revert UIE emulation removal
Uwe pointed out that my alarm based UIE emulation is not sufficient
to replace the older timer/polling based UIE emulation on devices
where there is no alarm irq. This causes rtc devices without alarms
to return -EINVAL to UIE ioctls. The fix is to re-instate the old
timer/polling method for devices without alarm irqs.
This patch reverts the following commits:
042620a018afcfba1d678062b62e46 - Remove UIE emulation
1daeddd5962acad1bea55e524fc0fa - Cleanup removed UIE emulation declaration
b5cc8ca1c9c3a37eaddf709b2fd3e1 - Remove Kconfig symbol for UIE emulation
The emulation mode will still need to be wired-in with a following
patch before it will work.
CC: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Uwe Kleine-König [Mon, 14 Feb 2011 10:33:17 +0000 (11:33 +0100)]
RTC: Release mutex in error path of rtc_alarm_irq_enable
On hardware that doesn't support alarm interrupts, rtc_alarm_irq_enable
could return without releasing the ops_lock mutex.
This was introduced in
aa0be0f (RTC: Propagate error handling via rtc_timer_enqueue properly)
This patch corrects the issue by only returning once the mutex is
released.
[john.stultz: Reworded the commit log]
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Michał Mirosław [Tue, 15 Feb 2011 16:59:18 +0000 (16:59 +0000)]
loopback: convert to hw_features
This also enables TSOv6, TSO-ECN, and UFO as loopback clearly can handle them.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Feb 2011 16:59:18 +0000 (16:59 +0000)]
net: introduce NETIF_F_RXCSUM
Introduce NETIF_F_RXCSUM to replace device-private flags for RX checksum
offload. Integrate it with ndo_fix_features.
ethtool_op_get_rx_csum() is removed altogether as nothing in-tree uses it.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Feb 2011 16:59:18 +0000 (16:59 +0000)]
net: use ndo_fix_features for ethtool_ops->set_flags
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Feb 2011 16:59:17 +0000 (16:59 +0000)]
net: ethtool: use ndo_fix_features for offload setting
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Feb 2011 16:59:17 +0000 (16:59 +0000)]
net: Introduce new feature setting ops
This introduces a new framework to handle device features setting.
It consists of:
- new fields in struct net_device:
+ hw_features - features that hw/driver supports toggling
+ wanted_features - features that user wants enabled, when possible
- new netdev_ops:
+ feat = ndo_fix_features(dev, feat) - API checking constraints for
enabling features or their combinations
+ ndo_set_features(dev) - API updating hardware state to match
changed dev->features
- new ethtool commands:
+ ETHTOOL_GFEATURES/ETHTOOL_SFEATURES: get/set dev->wanted_features
and trigger device reconfiguration if resulting dev->features
changed
+ ETHTOOL_GSTRINGS(ETH_SS_FEATURES): get feature bits names (meaning)
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Feb 2011 16:59:17 +0000 (16:59 +0000)]
ethtool: factorize get/set_one_feature
This allows to enable GRO even if RX csum is disabled. GRO will not
be used for packets without hardware checksum anyway.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Feb 2011 16:59:16 +0000 (16:59 +0000)]
ethtool: factorize ethtool_get_strings() and ethtool_get_sset_count()
This is needed for unified offloads patch.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Feb 2011 16:59:16 +0000 (16:59 +0000)]
ethtool: enable GSO and GRO by default
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Feb 2011 16:59:16 +0000 (16:59 +0000)]
ethtool: move EXPORT_SYMBOL(ethtool_op_set_tx_csum) to correct place
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 17 Feb 2011 08:53:17 +0000 (08:53 +0000)]
enic: Clean up: Remove a not needed #ifdef
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Danny Guo <dannguo@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 17 Feb 2011 08:53:12 +0000 (08:53 +0000)]
enic: Bug fix: Reset driver count of registered unicast addresses to zero during device reset
During a device reset, clear the counter for the no. of unicast addresses registered.
Also, rename the routines that update unicast and multicast address lists.
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Danny Guo <dannguo@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Tue, 15 Feb 2011 12:51:10 +0000 (12:51 +0000)]
tg3: Restrict phy ioctl access
If management firmware is present and the device is down, the firmware
will assume control of the phy. If a phy access were allowed from the
host, it will collide with firmware phy accesses, resulting in
unpredictable behavior. This patch fixes the problem by disallowing phy
accesses during the problematic condition.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Tue, 15 Feb 2011 02:08:39 +0000 (02:08 +0000)]
drivers/net: Call netif_carrier_off at the end of the probe
Without calling of netif_carrier_off at the end of the probe the operstate
is unknown when the device is initially opened. By default the carrier is
on so when the device is opened and netif_carrier_on is called the link
watch event is not fired and operstate remains zero (unknown).
This patch fixes this behavior in forcedeth and r8169.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roland Dreier [Thu, 17 Feb 2011 22:04:59 +0000 (14:04 -0800)]
Merge branches 'nes' and 'qib' into for-next
Mike Marciniszyn [Wed, 16 Feb 2011 15:48:25 +0000 (15:48 +0000)]
IB/qib: Prevent double completions after a timeout or RNR error
There is a double completion associated with error handling for RC QPs.
The sequence is:
- The do_rc_ack() routine fields an RNR nack and there are 0
rnr_retries configured on the QP.
- qib_error_qp() stops the pending timer
- qib_rc_send_complete() is called from sdma_complete()
- qib_rc_send_complete() starts the timer because the msb of the psn
just completed says an ack is needed.
- a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
- rc_timeout() calls qib_restart_rc()
- qib_restart_rc() calls qib_send_complete() with a
IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
past
The fix avoids starting the timer since another packet will never
arrive.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Ben Hutchings [Wed, 5 Jan 2011 00:50:41 +0000 (00:50 +0000)]
sfc: Implement hardware acceleration of RFS
Use the existing filter management functions to insert TCP/IPv4 and
UDP/IPv4 4-tuple filters for Receive Flow Steering.
For each channel, track how many RFS filters are being added during
processing of received packets and scan the corresponding number of
table entries for filters that may be reclaimed. Do this in batches
to reduce lock overhead.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Tom Herbert [Wed, 16 Feb 2011 10:27:02 +0000 (10:27 +0000)]
bnx2x: Support for managing RX indirection table
Support fetching and retrieving RX indirection table via ethtool.
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Thu, 17 Feb 2011 20:14:15 +0000 (21:14 +0100)]
net/fec: remove unused driver data
Apart from not being used the first argument isn't even a struct
platform_device *.
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Joerg Marx [Thu, 17 Feb 2011 15:23:40 +0000 (16:23 +0100)]
netfilter: ip6t_LOG: fix a flaw in printing the MAC
The flaw was in skipping the second byte in MAC header due to increasing
the pointer AND indexed access starting at '1'.
Signed-off-by: Joerg Marx <joerg.marx@secunet.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Florian Westphal [Thu, 17 Feb 2011 10:32:38 +0000 (11:32 +0100)]
netfilter: tproxy: do not assign timewait sockets to skb->sk
Assigning a socket in timewait state to skb->sk can trigger
kernel oops, e.g. in nfnetlink_log, which does:
if (skb->sk) {
read_lock_bh(&skb->sk->sk_callback_lock);
if (skb->sk->sk_socket && skb->sk->sk_socket->file) ...
in the timewait case, accessing sk->sk_callback_lock and sk->sk_socket
is invalid.
Either all of these spots will need to add a test for sk->sk_state != TCP_TIME_WAIT,
or xt_TPROXY must not assign a timewait socket to skb->sk.
This does the latter.
If a TW socket is found, assign the tproxy nfmark, but skip the skb->sk assignment,
thus mimicking behaviour of a '-m socket .. -j MARK/ACCEPT' re-routing rule.
The 'SYN to TW socket' case is left unchanged -- we try to redirect to the
listener socket.
Cc: Balazs Scheidler <bazsi@balabit.hu>
Cc: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Ian Campbell [Thu, 17 Feb 2011 10:31:20 +0000 (10:31 +0000)]
xen: suspend and resume system devices when running PVHVM
Otherwise we fail to properly suspend/resume all of the emulated devices.
Something between 2.6.38-rc2 and rc3 appears to have exposed this
issue, but it's always been wrong not to do this.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Amir Hanania [Tue, 15 Feb 2011 09:11:31 +0000 (09:11 +0000)]
ixgbe: work around for DDP last buffer size
A HW limitation was recently discovered where the last buffer in a DDP offload
cannot be a full buffer size in length. Fix the issue with a work around by
adding another buffer with size = 1.
Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Andy Gospodarek [Thu, 17 Feb 2011 09:13:13 +0000 (01:13 -0800)]
ixgbe: fix panic due to uninitialised pointer
Systems containing an
82599EB and running a backported driver from
upstream were panicing on boot. It turns out hw->mac.ops.setup_sfp is
only set for 82599, so one should check to be sure that pointer is set
before continuing in ixgbe_sfp_config_module_task. I verified by
inspection that the upstream driver has the same issue and also added a
check before the call in ixgbe_sfp_link_config.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Wed, 2 Feb 2011 10:19:50 +0000 (10:19 +0000)]
e1000e: flush all writebacks before unload
The driver was not flushing all writebacks before unloading, possibly
causing memory to be written by the hardware after the driver had
reinitialized the rings.
This adds missing functionality to flush any pending writebacks and is
called in all spots where descriptors should be completed before the driver
begins processing.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>