Eliad Peller [Sun, 16 Oct 2011 08:57:31 +0000 (10:57 +0200)]
mac80211: call set_wmm_default only for valid vifs
mac80211 calls ieee80211_set_wmm_default (which in turn
calls drv_conf_tx()) for every new interface, including
"internal" ones (e.g. monitor interface, which the low-level
driver doesn't know about).
Limit this call only to valid interfaces.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Fri, 14 Oct 2011 19:54:48 +0000 (12:54 -0700)]
iwlagn: use 6 Mbps rate for no-CCK scans
When userspace requested that a scan not be
done with CCK rates, use 6 Mbps. This is used
for example for P2P scanning.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Don Fry [Fri, 14 Oct 2011 19:54:46 +0000 (12:54 -0700)]
iwlagn: simplify iwl_alloc_all
The iwl_alloc_all routine is only called once. Delete the argument
and print an error in the calling routine if needed.
Signed-off-by: Don Fry <donald.h.fry@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Wey-Yi Guy [Fri, 14 Oct 2011 19:54:45 +0000 (12:54 -0700)]
iwlwifi: HW rev for 105 and 135 series
Set the HW rev. for both 105 and 135 series
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Don Fry [Fri, 14 Oct 2011 19:54:44 +0000 (12:54 -0700)]
iwlagn: remove unnecessary type for tracing operations
The device tracing routines only use the priv pointer as an opaque
value. Change from a typed iwl_priv pointer to a null pointer and
eliminate the need to include iwl_priv.h. CMD_ASYNC is defined in
iwl_shared.h which is the only reason it is included.
Signed-off-by: Don Fry <donald.h.fry@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Fri, 14 Oct 2011 19:54:43 +0000 (12:54 -0700)]
iwlagn: update wowlan API
The WoWLAN API changed due to netdetect and
we now have a more generic "D3 configuration"
command that enables the sysassert & rfkill
wakeup triggers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Richard Cochran [Fri, 21 Oct 2011 00:49:17 +0000 (00:49 +0000)]
dp83640: free packet queues on remove
If the PHY should disappear (for example, on an USB Ethernet MAC), then
the driver would leak any undelivered time stamp packets. This commit
fixes the issue by calling the appropriate functions to free any packets
left in the transmit and receive queues.
The driver first appeared in v3.0.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Fri, 21 Oct 2011 00:49:16 +0000 (00:49 +0000)]
dp83640: use proper function to free transmit time stamping packets
The previous commit enforces a new rule for handling the cloned packets
for transmit time stamping. These packets must not be freed using any other
function than skb_complete_tx_timestamp. This commit fixes the one and only
driver using this API.
The driver first appeared in v3.0.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Hofmeister [Mon, 24 Oct 2011 23:13:15 +0000 (19:13 -0400)]
ipv6: Do not use routes from locally generated RAs
When hybrid mode is enabled (accept_ra == 2), the kernel also sees RAs
generated locally. This is useful since it allows the kernel to auto-configure
its own interface addresses.
However, if 'accept_ra_defrtr' and/or 'accept_ra_rtr_pref' are set and the
locally generated RAs announce the default route and/or other route information,
the kernel happily inserts bogus routes with its own address as gateway.
With this patch, adding routes from an RA will be skiped when the RAs source
address matches any local address, just as if 'accept_ra_defrtr' and
'accept_ra_rtr_pref' were set to 0.
Signed-off-by: Andreas Hofmeister <andi@collax.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 24 Oct 2011 07:53:03 +0000 (07:53 +0000)]
|PATCH net-next] tg3: add tx_dropped counter
If a frame cant be transmitted, it is silently discarded.
Add a counter to report these errors to user.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 24 Oct 2011 02:45:03 +0000 (02:45 +0000)]
be2net: don't create multiple RX/TX rings in multi channel mode
When the HW is in multi-channel mode based on the skew/IPL, there are
4 functions per port and so not enough resources to create multiple
RX/TX rings for each function.
Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 24 Oct 2011 02:45:02 +0000 (02:45 +0000)]
be2net: don't create multiple TXQs in BE2
Multiple TXQ support is partially broken in BE2. It is fully
supported BE3 onwards and in Lancer.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 24 Oct 2011 02:45:01 +0000 (02:45 +0000)]
be2net: refactor VF setup/teardown code into be_vf_setup/clear()
Currently the code for VF setup/teardown done by a PF (if_create,
mac_add_config, link_status_query etc) is scattered; this patch
refactors this code into be_vf_setup() and be_vf_clear(). The
if_create/if_destroy/mac_addr_query cmds are now called after the MCCQ
is created; so these cmds are now modified to use the MCCQ instead of
MBOX.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 24 Oct 2011 02:45:00 +0000 (02:45 +0000)]
be2net: add vlan/rx-mode/flow-control config to be_setup()
When a card is reset due to EEH error recovery or due to a suspend,
rx-mode config (promisc/mc) is not being sent to the FW. be_setup() is
called in these flows and is the best place for such config/re-config
cmds. Hence include rx-mode, vlan and flow-control config in
be_setup().
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 23 Oct 2011 17:59:41 +0000 (17:59 +0000)]
net_sched: cls_flow: use skb_header_pointer()
Dan Siemon would like to add tunnelling support to cls_flow
This preliminary patch introduces use of skb_header_pointer() to help
this task, while avoiding skb head reallocation because of deep packet
inspection.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Wed, 19 Oct 2011 15:34:09 +0000 (15:34 +0000)]
ipv4: avoid useless call of the function check_peer_pmtu
In func ipv4_dst_check,check_peer_pmtu should be called only when peer is updated.
So,if the peer is not updated in ip_rt_frag_needed,we can not inc __rt_peer_genid.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Oct 2011 22:18:09 +0000 (18:18 -0400)]
Merge branch 'master' of ra./linux/kernel/git/davem/net
Flavio Leitner [Mon, 24 Oct 2011 08:15:10 +0000 (08:15 +0000)]
TCP: remove TCP_DEBUG
It was enabled by default and the messages guarded
by the define are useful.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk Eibach [Tue, 18 Oct 2011 03:04:11 +0000 (03:04 +0000)]
net: Fix driver name for mdio-gpio.c
Since commit
"
7488876... dt/net: Eliminate users of of_platform_{,un}register_driver"
there are two platform drivers named "mdio-gpio" registered.
I renamed the of variant to "mdio-ofgpio".
Signed-off-by: Dirk Eibach <eibach@gdsys.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Oct 2011 07:12:49 +0000 (03:12 -0400)]
Merge git://git./linux/kernel/git/jkirsher/net-next
Eric Dumazet [Mon, 24 Oct 2011 07:06:21 +0000 (03:06 -0400)]
ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
There is a long standing bug in linux tcp stack, about ACK messages sent
on behalf of TIME_WAIT sockets.
In the IP header of the ACK message, we choose to reflect TOS field of
incoming message, and this might break some setups.
Example of things that were broken :
- Routing using TOS as a selector
- Firewalls
- Trafic classification / shaping
We now remember in timewait structure the inet tos field and use it in
ACK generation, and route lookup.
Notes :
- We still reflect incoming TOS in RST messages.
- We could extend MuraliRaja Muniraju patch to report TOS value in
netlink messages for TIME_WAIT sockets.
- A patch is needed for IPv6
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 21 Oct 2011 06:24:20 +0000 (06:24 +0000)]
rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
Renato Westphal noticed that since commit
a2835763e130c343ace5320c20d33c281e7097b7
"rtnetlink: handle rtnl_link netlink notifications manually" was merged
we no longer send a netlink message when a networking device is moved
from one network namespace to another.
Fix this by adding the missing manual notification in dev_change_net_namespaces.
Since all network devices that are processed by dev_change_net_namspaces are
in the initialized state the complicated tests that guard the manual
rtmsg_ifinfo calls in rollback_registered and register_netdevice are
unnecessary and we can just perform a plain notification.
Cc: stable@kernel.org
Tested-by: Renato Westphal <renatowestphal@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yan, Zheng [Sat, 22 Oct 2011 21:58:20 +0000 (21:58 +0000)]
ipv4: fix ipsec forward performance regression
There is bug in commit
5e2b61f(ipv4: Remove flowi from struct rtable).
It makes xfrm4_fill_dst() modify wrong data structure.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reported-by: Kim Phillips <kim.phillips@freescale.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clemens Buchacher [Sat, 22 Oct 2011 02:56:20 +0000 (02:56 +0000)]
jme: fix irq storm after suspend/resume
If the device is down during suspend/resume, interrupts are enabled
without a registered interrupt handler, causing a storm of
unhandled interrupts until the IRQ is disabled because "nobody
cared".
Instead, check that the device is up before touching it in the
suspend/resume code.
Fixes https://bugzilla.kernel.org/show_bug.cgi?id=39112
Helped-by: Adrian Chadd <adrian@freebsd.org>
Helped-by: Mohammed Shafi <shafi.wireless@gmail.com>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flavio Leitner [Mon, 24 Oct 2011 06:56:38 +0000 (02:56 -0400)]
route: fix ICMP redirect validation
The commit
f39925dbde7788cfb96419c0f092b086aa325c0f
(ipv4: Cache learned redirect information in inetpeer.)
removed some ICMP packet validations which are required by
RFC 1122, section 3.2.2.2:
...
A Redirect message SHOULD be silently discarded if the new
gateway address it specifies is not on the same connected
(sub-) net through which the Redirect arrived [INTRO:2,
Appendix A], or if the source of the Redirect is not the
current first-hop gateway for the specified destination (see
Section 3.3.1).
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Fri, 21 Oct 2011 00:49:15 +0000 (00:49 +0000)]
net: hold sock reference while processing tx timestamps
The pair of functions,
* skb_clone_tx_timestamp()
* skb_complete_tx_timestamp()
were designed to allow timestamping in PHY devices. The first
function, called during the MAC driver's hard_xmit method, identifies
PTP protocol packets, clones them, and gives them to the PHY device
driver. The PHY driver may hold onto the packet and deliver it at a
later time using the second function, which adds the packet to the
socket's error queue.
As pointed out by Johannes, nothing prevents the socket from
disappearing while the cloned packet is sitting in the PHY driver
awaiting a timestamp. This patch fixes the issue by taking a reference
on the socket for each such packet. In addition, the comments
regarding the usage of these function are expanded to highlight the
rule that PHY drivers must use skb_complete_tx_timestamp() to release
the packet, in order to release the socket reference, too.
These functions first appeared in v2.6.36.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@vger.kernel.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 24 Oct 2011 06:46:04 +0000 (02:46 -0400)]
tcp: md5: add more const attributes
Now tcp_md5_hash_header() has a const tcphdr argument, we can add more
const attributes to callers.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Wed, 19 Oct 2011 08:10:59 +0000 (08:10 +0000)]
Add ethtool -g support to virtio_net
Add support for reporting ring sizes via ethtool -g to the virtio_net
driver.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 24 Oct 2011 05:52:35 +0000 (01:52 -0400)]
tcp: md5: dont write skb head in tcp_md5_hash_header()
tcp_md5_hash_header() writes into skb header a temporary zero value,
this might confuse other users of this area.
Since tcphdr is small (20 bytes), copy it in a temporary variable and
make the change in the copy.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 21 Oct 2011 22:43:07 +0000 (22:43 +0000)]
bonding: Add a forgetten sysfs_attr_init on class_attr_bonding_masters
When I made class_attr_bonding_matters per network namespace and dynamically
allocated I overlooked the need for calling sysfs_attr_init. Oops.
This fixes the following lockdep splat:
[ 5.749651] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[ 5.749655] bonding: MII link monitoring set to 100 ms
[ 5.749676] BUG: key
f49a831c not in .data!
[ 5.749677] ------------[ cut here ]------------
[ 5.749752] WARNING: at kernel/lockdep.c:2897 lockdep_init_map+0x1c3/0x460()
[ 5.749809] Hardware name: ProLiant BL460c G1
[ 5.749862] Modules linked in: bonding(+)
[ 5.749978] Pid: 3177, comm: modprobe Not tainted
3.1.0-rc9-02177-gf2d1a4e-dirty #1157
[ 5.750066] Call Trace:
[ 5.750120] [<
c1352c2f>] ? printk+0x18/0x21
[ 5.750176] [<
c103112d>] warn_slowpath_common+0x6d/0xa0
[ 5.750231] [<
c1060133>] ? lockdep_init_map+0x1c3/0x460
[ 5.750287] [<
c1060133>] ? lockdep_init_map+0x1c3/0x460
[ 5.750342] [<
c103117d>] warn_slowpath_null+0x1d/0x20
[ 5.750398] [<
c1060133>] lockdep_init_map+0x1c3/0x460
[ 5.750453] [<
c1355ddd>] ? _raw_spin_unlock+0x1d/0x20
[ 5.750510] [<
c11255c8>] ? sysfs_new_dirent+0x68/0x110
[ 5.750565] [<
c1124d4b>] sysfs_add_file_mode+0x8b/0xe0
[ 5.750621] [<
c1124db3>] sysfs_add_file+0x13/0x20
[ 5.750675] [<
c1124e7c>] sysfs_create_file+0x1c/0x20
[ 5.750737] [<
c1208f09>] class_create_file+0x19/0x20
[ 5.750794] [<
c12c186f>] netdev_class_create_file+0xf/0x20
[ 5.750853] [<
f85deaf4>] bond_create_sysfs+0x44/0x90 [bonding]
[ 5.750911] [<
f8410947>] ? bond_create_proc_dir+0x1e/0x3e [bonding]
[ 5.750970] [<
f841007e>] bond_net_init+0x7e/0x87 [bonding]
[ 5.751026] [<
f8410000>] ? 0xf840ffff
[ 5.751080] [<
c12abc7a>] ops_init.clone.4+0xba/0x100
[ 5.751135] [<
c12abdb2>] ? register_pernet_subsys+0x12/0x30
[ 5.751191] [<
c12abd03>] register_pernet_operations.clone.3+0x43/0x80
[ 5.751249] [<
c12abdb9>] register_pernet_subsys+0x19/0x30
[ 5.751306] [<
f84108b9>] bonding_init+0x832/0x8a2 [bonding]
[ 5.751363] [<
c10011f0>] do_one_initcall+0x30/0x160
[ 5.751420] [<
f8410087>] ? bond_net_init+0x87/0x87 [bonding]
[ 5.751477] [<
c106d5cf>] sys_init_module+0xef/0x1890
[ 5.751533] [<
c1356490>] sysenter_do_call+0x12/0x36
[ 5.751588] ---[ end trace
89f492d83a7f5006 ]---
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 22 Oct 2011 07:29:53 +0000 (03:29 -0400)]
tg3: fix tigon3_dma_hwbug_workaround()
Ari got kernel panics using tg3 NIC, and bisected to
2669069aacc9 "tg3:
enable transmit time stamping."
This is because tigon3_dma_hwbug_workaround() might alloc a new skb and
free the original. We panic when skb_tx_timestamp() is called on freed
skb.
Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 22 Oct 2011 05:25:23 +0000 (01:25 -0400)]
inet: add rfc 3168 extract in front of INET_ECN_encapsulate()
INET_ECN_encapsulate() is better understood if we can read the official
statement.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maciej Żenczykowski [Sat, 22 Oct 2011 04:07:47 +0000 (00:07 -0400)]
net: use INET_ECN_MASK instead of hardcoded 3
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Carolyn Wyborny [Fri, 14 Oct 2011 00:13:49 +0000 (00:13 +0000)]
igb: VFTA Table Fix for i350 devices
Due to a hardware problem, writes to the VFTA register can
theoretically fail. Although the likelihood of this is very low.
This patch adds a shadow vfta in the adapter struct for reading
and adds new write functions for these devices to work around the problem.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Carolyn Wyborny [Thu, 13 Oct 2011 17:29:59 +0000 (17:29 +0000)]
igb: Move DMA Coalescing init code to separate function.
This patch moves the DMA Coalescing feature initialization code from
igb_reset to a new function and replaces it with a call to the new
function.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Carolyn Wyborny [Thu, 13 Oct 2011 17:28:39 +0000 (17:28 +0000)]
igb: Fix for Alt MAC Address feature on 82580 and later devices
In 82580 and later devices, the alternate MAC address feature is
completely handled by the option ROM and software does not handle
it anymore. This patch changes the check_alt_mac_addr function to
exit immediately if device is 82580 or later.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Williams, Mitch A [Tue, 18 Oct 2011 06:39:43 +0000 (06:39 +0000)]
igbvf: Bump version number
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Williams, Mitch A [Tue, 18 Oct 2011 06:39:37 +0000 (06:39 +0000)]
igbvf: Update module identification strings
Update adapter identification strings to properly indicate i350 VF devices
in the VF driver. Change the driver ID string to remove 82576-specific
wording. Update copyright date.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eric Dumazet [Fri, 21 Oct 2011 09:22:42 +0000 (05:22 -0400)]
tcp: add const qualifiers where possible
Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.
For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mihai Maruseac [Thu, 20 Oct 2011 20:45:10 +0000 (20:45 +0000)]
dev: use name hash for dev_seq_ops
Instead of using the dev->next chain and trying to resync at each call to
dev_seq_start, use the name hash, keeping the bucket and the offset in
seq->private field.
Tests revealed the following results for ifconfig > /dev/null
* 1000 interfaces:
* 0.114s without patch
* 0.089s with patch
* 3000 interfaces:
* 0.489s without patch
* 0.110s with patch
* 5000 interfaces:
* 1.363s without patch
* 0.250s with patch
* 128000 interfaces (other setup):
* ~100s without patch
* ~30s with patch
Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 20 Oct 2011 04:29:24 +0000 (04:29 +0000)]
macvtap: Fix the minor device number allocation
On systems that create and delete lots of dynamic devices the
31bit linux ifindex fails to fit in the 16bit macvtap minor,
resulting in unusable macvtap devices. I have systems running
automated tests that that hit this condition in just a few days.
Use a linux idr allocator to track which mavtap minor numbers
are available and and to track the association between macvtap
minor numbers and macvtap network devices.
Remove the unnecessary unneccessary check to see if the network
device we have found is indeed a macvtap device. With macvtap
specific data structures it is impossible to find any other
kind of networking device.
Increase the macvtap minor range from 65536 to the full 20 bits
that is supported by linux device numbers. It doesn't solve the
original problem but there is no penalty for a larger minor
device range.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 20 Oct 2011 04:28:46 +0000 (04:28 +0000)]
macvtap: Rewrite macvtap_newlink so the error handling works.
Place macvlan_common_newlink at the end of macvtap_newlink because
failing in newlink after registering your network device is not
supported.
Move device_create into a netdevice creation notifier. The network device
notifier is the only hook that is called after the network device has been
registered with the device layer and before register_network_device returns
success.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 20 Oct 2011 04:27:24 +0000 (04:27 +0000)]
macvtap: Don't leak unreceived packets when we delete a macvtap device.
To avoid leaking packets in the receive queue. Add a socket destructor
that will run whenever destroy a macvtap socket.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 20 Oct 2011 04:26:39 +0000 (04:26 +0000)]
macvtap: Fix macvtap_open races in the zero copy enable code.
To see if it is appropriate to enable the macvtap zero copy feature
don't test the lowerdev network device flags. Instead test the
macvtap network device flags which are a direct copy of the lowerdev
flags. This is important because nothing holds a reference to lowerdev
and on a very bad day we lowerdev could be a pointer to stale memory.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 20 Oct 2011 04:26:01 +0000 (04:26 +0000)]
macvtap: Close a race between macvtap_open and macvtap_dellink.
There is a small window in macvtap_open between looking up a
networking device and calling macvtap_set_queue in which
macvtap_del_queues called from macvtap_dellink. After
calling macvtap_del_queues it is totally incorrect to
allow macvtap_set_queue to proceed so prevent success by
reporting that all of the available queues are in use.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Oct 2011 23:14:46 +0000 (23:14 +0000)]
virtio_net: fix truesize underestimation
We must account in skb->truesize, the size of the fragments, not the
used part of them.
Doing this work is important to avoid unexpected OOM situations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: virtualization@lists.linux-foundation.org
CC: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Oct 2011 23:00:23 +0000 (23:00 +0000)]
bnx2x: fix skb truesize underestimation
bnx2x allocates a full page per fragment.
We must account in skb->truesize, the size of the fragment, not the used
part of it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Wed, 19 Oct 2011 23:01:49 +0000 (23:01 +0000)]
net: add opaque struct around skb frag page
I've split this bit out of the skb frag destructor patch since it helps enforce
the use of the fragment API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Wed, 19 Oct 2011 23:01:48 +0000 (23:01 +0000)]
cxgbi: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: Karen Xie <kxie@chelsio.com>
Cc: linux-scsi@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Wed, 19 Oct 2011 23:01:47 +0000 (23:01 +0000)]
cxgb4vf: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Wed, 19 Oct 2011 23:01:46 +0000 (23:01 +0000)]
cxgb4: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Dimitris Michailidis <dm@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Wed, 19 Oct 2011 23:01:45 +0000 (23:01 +0000)]
mlx4: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Maciej Żenczykowski [Thu, 20 Oct 2011 22:21:36 +0000 (18:21 -0400)]
net: allow CAP_NET_RAW to set socket options IP{,V6}_TRANSPARENT
Up till now the IP{,V6}_TRANSPARENT socket options (which actually set
the same bit in the socket struct) have required CAP_NET_ADMIN
privileges to set or clear the option.
- we make clearing the bit not require any privileges.
- we allow CAP_NET_ADMIN to set the bit (as before this change)
- we allow CAP_NET_RAW to set this bit, because raw
sockets already pretty much effectively allow you
to emulate socket transparency.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2011 21:45:43 +0000 (17:45 -0400)]
net: constify skbuff and Qdisc elements
Preliminary patch before tcp constification
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2011 21:44:03 +0000 (17:44 -0400)]
tcp: remove unused tcp_fin() parameters
tcp_fin() only needs socket pointer, we can remove skb and th params.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Oct 2011 21:40:43 +0000 (17:40 -0400)]
Merge branch 'batman-adv/maint' of git://git.open-mesh.org/linux-merge
Ricardo [Tue, 18 Oct 2011 21:35:25 +0000 (21:35 +0000)]
ll_temac: Add support for ethtool
This patch enables the ethtool interface. The implementation is done
using the libphy helper functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
RongQing Li [Tue, 18 Oct 2011 22:52:35 +0000 (22:52 +0000)]
igb: fix a compile warning
control these three function declarations and
definitions with same macro CONFIG_PCI_IOV
drivers/net/ethernet/intel/igb/igb_main.c:165:
warning: ‘igb_vf_configure’ declared ‘static’ but never defined
drivers/net/ethernet/intel/igb/igb_main.c:166:
warning: ‘igb_find_enabled_vfs’ declared ‘static’ but never defined
drivers/net/ethernet/intel/igb/igb_main.c:167:
warning: ‘igb_check_vf_assignment’ declared ‘static’ but never defined
Signed-off-by: RongQing Li <roy.qing.li@gmail.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2011 10:10:03 +0000 (10:10 +0000)]
myri10ge: fix truesize underestimation
skb->truesize must account for allocated memory, not the used part of
it. Doing this work is important to avoid unexpected OOM situations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jon Mason <mason@myri.com>
Acked-by: Jon Mason <mason@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2011 09:22:18 +0000 (09:22 +0000)]
igbvf: fix truesize underestimation
igbvf allocates half a page per skb fragment. We must account
PAGE_SIZE/2 increments on skb->truesize, not the actual frag length.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2011 21:00:21 +0000 (17:00 -0400)]
pktgen: remove ndelay() call
Daniel Turull reported inaccuracies in pktgen when using low packet
rates, because we call ndelay(val) with values bigger than 20000.
Instead of calling ndelay() for delays < 100us, we can instead loop
calling ktime_now() only.
Reported-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2011 20:53:56 +0000 (16:53 -0400)]
tcp: use TCP_DEFAULT_INIT_RCVWND in tcp_fixup_rcvbuf()
Since commit
356f039822b (TCP: increase default initial receive
window.), we allow sender to send 10 (TCP_DEFAULT_INIT_RCVWND) segments.
Change tcp_fixup_rcvbuf() to reflect this change, even if no real change
is expected, since sysctl_tcp_rmem[1] = 87380 and this value
is bigger than tcp_fixup_rcvbuf() computed rcvmem (~23720)
Note: Since commit
356f039822b limited default window to maximum of
10*1460 and 2*MSS, we use same heuristic in this patch.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 14 Oct 2011 04:57:46 +0000 (04:57 +0000)]
ip_gre: dont increase dev->needed_headroom on a live device
It seems ip_gre is able to change dev->needed_headroom on the fly.
Its is not legal unfortunately and triggers a BUG in raw_sendmsg()
skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev)
< another cpu change dev->needed_headromm (making it bigger)
...
skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));
We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE()
-> we crash later because skb head is exhausted.
Bug introduced in commit
243aad83 in 2.6.34 (ip_gre: include route
header_len in max_headroom calculation)
Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Timo Teräs <timo.teras@iki.fi>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 20 Oct 2011 19:16:28 +0000 (22:16 +0300)]
Merge git://git./linux/kernel/git/davem/sparc
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Add alignment flag to PCI expansion resources
sparc: Avoid calling sigprocmask()
sparc: Use set_current_blocked()
sparc32,leon: SRMMU MMU Table probe fix
Linus Torvalds [Thu, 20 Oct 2011 19:15:20 +0000 (22:15 +0300)]
Merge git://git./linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
fib_rules: fix unresolved_rules counting
r8169: fix wrong eee setting for rlt8111evl
r8169: fix driver shutdown WoL regression.
ehea: Change maintainer to me
pptp: pptp_rcv_core() misses pskb_may_pull() call
tproxy: copy transparent flag when creating a time wait
pptp: fix skb leak in pptp_xmit()
bonding: use local function pointer of bond->recv_probe in bond_handle_frame
smsc911x: Add support for SMSC LAN89218
tg3: negate USE_PHYLIB flag check
netconsole: enable netconsole can make net_device refcnt incorrent
bluetooth: Properly clone LSM attributes to newly created child connections
l2tp: fix a potential skb leak in l2tp_xmit_skb()
bridge: fix hang on removal of bridge via netlink
x25: Prevent skb overreads when checking call user data
x25: Handle undersized/fragmented skbs
x25: Validate incoming call user data lengths
udplite: fast-path computation of checksum coverage
IPVS netns shutdown/startup dead-lock
netfilter: nf_conntrack: fix event flooding in GRE protocol tracker
Ian Campbell [Thu, 20 Oct 2011 08:58:32 +0000 (04:58 -0400)]
mm: add a "struct page_frag" type containing a page, offset and length
A few network drivers currently use skb_frag_struct for this purpose but I have
patches which add additional fields and semantics there which these other uses
do not want.
A structure for reference sub-page regions seems like a generally useful thing
so do so instead of adding a network subsystem specific structure.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Oct 2011 18:49:52 +0000 (18:49 +0000)]
mlx4_en: fix skb truesize underestimation
skb->truesize must account for allocated memory, not the used part of
it. Doing this work is important to avoid unexpected OOM situations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Krishna Kumar [Wed, 19 Oct 2011 22:17:27 +0000 (22:17 +0000)]
virtio_net: Clean up set_skb_frag()
Remove manual initialization in set_skb_frag, and instead
use __skb_fill_page_desc() to do the same. Patch tested
on net-next.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hugh Dickins [Wed, 19 Oct 2011 19:50:35 +0000 (12:50 -0700)]
mm: fix race between mremap and removing migration entry
I don't usually pay much attention to the stale "? " addresses in
stack backtraces, but this lucky report from Pawel Sikora hints that
mremap's move_ptes() has inadequate locking against page migration.
3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page():
kernel BUG at include/linux/swapops.h:105!
RIP: 0010:[<
ffffffff81127b76>] [<
ffffffff81127b76>]
migration_entry_wait+0x156/0x160
[<
ffffffff811016a1>] handle_pte_fault+0xae1/0xaf0
[<
ffffffff810feee2>] ? __pte_alloc+0x42/0x120
[<
ffffffff8112c26b>] ? do_huge_pmd_anonymous_page+0xab/0x310
[<
ffffffff81102a31>] handle_mm_fault+0x181/0x310
[<
ffffffff81106097>] ? vma_adjust+0x537/0x570
[<
ffffffff81424bed>] do_page_fault+0x11d/0x4e0
[<
ffffffff81109a05>] ? do_mremap+0x2d5/0x570
[<
ffffffff81421d5f>] page_fault+0x1f/0x30
mremap's down_write of mmap_sem, together with i_mmap_mutex or lock,
and pagetable locks, were good enough before page migration (with its
requirement that every migration entry be found) came in, and enough
while migration always held mmap_sem; but not enough nowadays, when
there's memory hotremove and compaction.
The danger is that move_ptes() lets a migration entry dodge around
behind remove_migration_pte()'s back, so it's in the old location when
looking at the new, then in the new location when looking at the old.
Either mremap's move_ptes() must additionally take anon_vma lock(), or
migration's remove_migration_pte() must stop peeking for is_swap_entry()
before it takes pagetable lock.
Consensus chooses the latter: we prefer to add overhead to migration
than to mremapping, which gets used by JVMs and by exec stack setup.
Reported-and-tested-by: Paweł Sikora <pluto@agmk.net>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Campbell [Tue, 18 Oct 2011 22:55:11 +0000 (22:55 +0000)]
net: do not take an additional reference in skb_frag_set_page
I audited all of the callers in the tree and only one of them (pktgen) expects
it to do so. Taking this reference is pretty obviously confusing and error
prone.
In particular I looked at the following commits which switched callers of
(__)skb_frag_set_page to the skb paged fragment api:
6a930b9f163d7e6d9ef692e05616c4ede65038ec cxgb3: convert to SKB paged frag API.
5dc3e196ea21e833128d51eb5b788a070fea1f28 myri10ge: convert to SKB paged frag API.
0e0634d20dd670a89af19af2a686a6cce943ac14 vmxnet3: convert to SKB paged frag API.
86ee8130a46769f73f8f423f99dbf782a09f9233 virtionet: convert to SKB paged frag API.
4a22c4c919c201c2a7f4ee09e672435a3072d875 sfc: convert to SKB paged frag API.
18324d690d6a5028e3c174fc1921447aedead2b8 cassini: convert to SKB paged frag API.
b061b39e3ae18ad75466258cf2116e18fa5bbd80 benet: convert to SKB paged frag API.
b7b6a688d217936459ff5cf1087b2361db952509 bnx2: convert to SKB paged frag API.
804cf14ea5ceca46554d5801e2817bba8116b7e5 net: xfrm: convert to SKB frag APIs
ea2ab69379a941c6f8884e290fdd28c93936a778 net: convert core to skb paged frag APIs
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
roy.qing.li@gmail.com [Mon, 17 Oct 2011 22:32:42 +0000 (22:32 +0000)]
neigh: fix rcu splat in neigh_update()
when use dst_get_neighbour to get neighbour, we need
rcu_read_lock to protect, since dst_get_neighbour uses
rcu_dereference.
The bug was reported by Ari Savolainen <ari.m.savolainen@gmail.com>
[ 105.612095]
[ 105.612096] ===================================================
[ 105.612100] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 105.612101] ---------------------------------------------------
[ 105.612103] include/net/dst.h:91 invoked rcu_dereference_check()
without protection!
[ 105.612105]
[ 105.612106] other info that might help us debug this:
[ 105.612106]
[ 105.612108]
[ 105.612108] rcu_scheduler_active = 1, debug_locks = 0
[ 105.612110] 1 lock held by dnsmasq/2618:
[ 105.612111] #0: (rtnl_mutex){+.+.+.}, at: [<
ffffffff815df8c7>]
rtnl_lock+0x17/0x20
[ 105.612120]
[ 105.612121] stack backtrace:
[ 105.612123] Pid: 2618, comm: dnsmasq Not tainted 3.1.0-rc1 #41
[ 105.612125] Call Trace:
[ 105.612129] [<
ffffffff810ccdcb>] lockdep_rcu_dereference+0xbb/0xc0
[ 105.612132] [<
ffffffff815dc5a9>] neigh_update+0x4f9/0x5f0
[ 105.612135] [<
ffffffff815da001>] ? neigh_lookup+0xe1/0x220
[ 105.612139] [<
ffffffff81639298>] arp_req_set+0xb8/0x230
[ 105.612142] [<
ffffffff8163a59f>] arp_ioctl+0x1bf/0x310
[ 105.612146] [<
ffffffff810baa40>] ? lock_hrtimer_base.isra.26+0x30/0x60
[ 105.612150] [<
ffffffff8163fb75>] inet_ioctl+0x85/0x90
[ 105.612154] [<
ffffffff815b5520>] sock_do_ioctl+0x30/0x70
[ 105.612157] [<
ffffffff815b55d3>] sock_ioctl+0x73/0x280
[ 105.612162] [<
ffffffff811b7698>] do_vfs_ioctl+0x98/0x570
[ 105.612165] [<
ffffffff811a5c40>] ? fget_light+0x340/0x3a0
[ 105.612168] [<
ffffffff811b7bbf>] sys_ioctl+0x4f/0x80
[ 105.612172] [<
ffffffff816fdcab>] system_call_fastpath+0x16/0x1b
Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 17 Oct 2011 21:04:20 +0000 (21:04 +0000)]
filter: use unsigned int to silence static checker warning
This is just a cleanup.
My testing version of Smatch warns about this:
net/core/filter.c +380 check_load_and_stores(6)
warn: check 'flen' for negative values
flen comes from the user. We try to clamp the values here between 1
and BPF_MAXINSNS but the clamp doesn't work because it could be
negative. This is a bug, but it's not exploitable.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grant Grundler [Mon, 17 Oct 2011 05:51:06 +0000 (05:51 +0000)]
NET: asix: fix ethtool -e for AX88178 USB dongle
"ethtool -e ethX" dumps EEPROM data. Patch sets EEPROM length for device.
Ethtool works alot better when the kernel believes the length is > 0.
From: Allan Chou <allan@asix.com.tw>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Wilson [Sun, 16 Oct 2011 05:21:57 +0000 (05:21 +0000)]
cleanup: remove unnecessary include.
This cleanup patch removes unnecessary include from net/ipv6/ip6_fib.c.
Signed-off-by: Kevin Wilson <wkevils@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Sat, 15 Oct 2011 09:26:56 +0000 (09:26 +0000)]
ipv4: compat_ioctl is local to af_inet.c, make it static
ipv4: compat_ioctl is local to af_inet.c, make it static
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Tue, 18 Oct 2011 01:39:55 +0000 (01:39 +0000)]
stmmac: limit max_mtu in case of 4KiB and use __netdev_alloc_skb (V2)
Problem using big mtu around 4096 bytes is you end allocating (4096
+NET_SKB_PAD + NET_IP_ALIGN + sizeof(struct skb_shared_info) bytes ->
8192 bytes : order-1 pages
It's better to limit the mtu to SKB_MAX_HEAD(NET_SKB_PAD),
to have no more than one page per skb.
Also the patch changes the netdev_alloc_skb_ip_align() done in
init_dma_desc_rings() and uses a variant allowing GFP_KERNEL allocations
allowing the driver to load even in case of memory pressure.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Tue, 18 Oct 2011 00:01:24 +0000 (00:01 +0000)]
stmmac: add CHAINED descriptor mode support (V4)
This patch enhances the STMMAC driver to support CHAINED mode of
descriptor.
STMMAC supports DMA descriptor to operate both in dual buffer(RING)
and linked-list(CHAINED) mode. In RING mode (default) each descriptor
points to two data buffer pointers whereas in CHAINED mode they point
to only one data buffer pointer.
In CHAINED mode each descriptor will have pointer to next descriptor in
the list, hence creating the explicit chaining in the descriptor itself,
whereas such explicit chaining is not possible in RING mode.
First version of this work has been done by Rayagond.
Then the patch has been reworked avoiding ifdef inside the C code.
A new header file has been added to define all the functions needed for
managing enhanced and normal descriptors.
In fact, these have to be specialized according to the ring/chain usage.
Two new C files have been also added to implement the helper routines
needed to manage: jumbo frames, chain and ring setup (i.e. desc3).
Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Tue, 18 Oct 2011 00:01:23 +0000 (00:01 +0000)]
stmmac: allow mmc usage only if feature actually available (V4)
Enable the MMC support if it is actually available from the
HW capability register.
Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rayagond Kokatanur [Tue, 18 Oct 2011 00:01:22 +0000 (00:01 +0000)]
stmmac: use predefined macros for HW cap register fields (V4)
Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Tue, 18 Oct 2011 00:01:21 +0000 (00:01 +0000)]
stmmac: allow mtu bigger than 1500 in case of normal desc (V4)
This patch allows to set the mtu bigger than 1500
in case of normal descriptors.
This is helping some SPEAr customers.
Signed-off-by: Deepak SIKRI <deepak.sikri@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Tue, 18 Oct 2011 00:01:20 +0000 (00:01 +0000)]
stmmac: update the driver version and doc (V4)
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Tue, 18 Oct 2011 00:01:19 +0000 (00:01 +0000)]
stmmac: protect tx process with lock (V4)
This patch fixes a problem raised on Orly ARM SMP platform
where, in case of fragmented frames, the descriptors
in the TX ring resulted broken. This was due to a missing lock
protection in the tx process.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Srinivas Kandagatla [Tue, 18 Oct 2011 00:01:18 +0000 (00:01 +0000)]
stmmac: Stop advertising 1000Base capabilties for non GMII iface (V4).
This patch stops advertising 1000Base capablities if GMAC is either
configured for MII or RMII mode and on board there is a GPHY plugged on.
Without this patch if an GBit switch is connected on MII interface,
Ethernet stops working at all.
Discovered as part of
https://bugzilla.stlinux.com/show_bug.cgi?id=14148 triage
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 12 Oct 2011 22:02:43 +0000 (22:02 +0000)]
sysfs: Reject with a warning invalid uses of tagged directories.
sysfs is a core piece of ifrastructure that many people use and
few people have all of the rules in their head on how to use
it correctly. Add warnings for people using tagged directories
improperly to that any misuses can be caught and diagnosed quickly.
A single inexpensive test in sysfs_find_dirent is almost sufficient
to catch all possible misuses. An additional warning is needed
in sysfs_add_dirent so that we actually fail when attempting to
add an untagged dirent in a tagged directory.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 12 Oct 2011 22:01:34 +0000 (22:01 +0000)]
sysfs: Remove support for tagged directories with untagged members.
Now that /sys/class/net/bonding_masters is implemented as a tagged sysfs
file we can remove support for untagged files in tagged directories.
This change removes any ambiguity of what a NULL namespace value
means. A NULL namespace parameter after this patch means
that we are talking about an untagged sysfs dirent.
This makes the sysfs code much less prone to mistakes when during
maintenance.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 12 Oct 2011 21:56:25 +0000 (21:56 +0000)]
bonding: Use a per netns implementation of /sys/class/net/bonding_masters.
This fixes a network namespace misfeature that bonding_masters looked at
current instead of the remembering the context where in which
/sys/class/net/bonding_masters was opened in to see which network
namespace to act upon.
This removes the need for sysfs to handle tagged directories with
untagged members allowing for a conceptually simpler sysfs
implementation.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 12 Oct 2011 21:55:08 +0000 (21:55 +0000)]
class: Implement support for class attrs in tagged sysfs directories.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Wed, 12 Oct 2011 21:53:38 +0000 (21:53 +0000)]
sysfs: Implement support for tagged files in sysfs.
Looking up files in sysfs is hard to understand and analyize because we
currently allow placing untagged files in tagged directories. In the
implementation of that we have two subtly different meanings of NULL.
NULL meaning there is no tag on a directory entry and NULL meaning
we don't care which namespace the lookup is performed for. This
multiple uses of NULL have resulted in subtle bugs (since fixed)
in the code.
Currently it is only the bonding driver that needs to have an untagged
file in a tagged directory.
To untagle this mess I am adding support for tagged files to sysfs.
Modifying the bonding driver to implement bonding_masters as a tagged
file. Registering bonding_masters once for each network namespace.
Then I am removing support for untagged entries in tagged sysfs
directories.
Resulting in code that is much easier to reason about.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flavio Leitner [Thu, 13 Oct 2011 07:21:23 +0000 (07:21 +0000)]
bonding: fix wrong port enabling in 802.3ad
The port shouldn't be enabled unless its current MUX
state is DISTRIBUTING which is correctly handled by
ad_mux_machine(), otherwise the packet sent can be
lost because the other end may not be ready.
The issue happens on every port initialization, but
as the ports are expected to move quickly to DISTRIBUTING,
it doesn't cause much problem. However, it does cause
constant packet loss if the other peer has the port
configured to stay in STANDBY (i.e. SYNC set to OFF).
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kjetil Oftedal [Wed, 19 Oct 2011 23:20:50 +0000 (16:20 -0700)]
sparc: Add alignment flag to PCI expansion resources
Currently no type of alignment is specified for PCI expansion roms while
parsing the openfirmware tree. This causes calls to pci_map_rom() to fail.
IORESOURCE_SIZEALIGN is the default alignment used for rom resouces in
pci/probe.c, and has been verified to work with various cards on a ultra 10.
Signed-off-By: Kjetil Oftedal <oftedal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yan, Zheng [Mon, 17 Oct 2011 15:20:28 +0000 (15:20 +0000)]
fib_rules: fix unresolved_rules counting
we should decrease ops->unresolved_rules when deleting a unresolved rule.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 13 Oct 2011 20:14:37 +0000 (20:14 +0000)]
r8169: fix wrong eee setting for rlt8111evl
Correct the wrong parameter for setting EEE for RTL8111E-VL.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
françois romieu [Fri, 14 Oct 2011 00:57:45 +0000 (00:57 +0000)]
r8169: fix driver shutdown WoL regression.
Due to commit
92fc43b4159b518f5baae57301f26d770b0834c9 ("r8169: modify the
flow of the hw reset."), rtl8169_hw_reset stomps during driver shutdown on
RxConfig bits which are needed for WOL on some versions of the hardware.
As these bits were formerly set from the r81{0x, 68}_pll_power_down methods,
factor them out for use in the driver shutdown (rtl_shutdown) handler.
I favored __rtl8169_get_wol() -hardware state indication- over
RTL_FEATURE_WOL as the latter has become a good candidate for removal.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes <hayeswang@realtek.com>
Tested-by: Marc Ballarin <ballarin.marc@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Wed, 19 Oct 2011 21:00:35 +0000 (17:00 -0400)]
net: validate HWTSTAMP ioctl parameters
This patch adds a sanity check on the values provided by user space for
the hardware time stamping configuration. If the values lie outside of
the absolute limits, then the ioctl request will be denied.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 13 Oct 2011 22:25:23 +0000 (22:25 +0000)]
net: Move rcu_barrier from rollback_registered_many to netdev_run_todo.
This patch moves the rcu_barrier from rollback_registered_many
(inside the rtnl_lock) into netdev_run_todo (just outside the rtnl_lock).
This allows us to gain the full benefit of sychronize_net calling
synchronize_rcu_expedited when the rtnl_lock is held.
The rcu_barrier in rollback_registered_many was originally a synchronize_net
but was promoted to be a rcu_barrier() when it was found that people were
unnecessarily hitting the 250ms wait in netdev_wait_allrefs(). Changing
the rcu_barrier back to a synchronize_net is therefore safe.
Since we only care about waiting for the rcu callbacks before we get
to netdev_wait_allrefs() it is also safe to move the wait into
netdev_run_todo.
This was tested by creating and destroying 1000 tap devices and observing
/proc/lock_stat. /proc/lock_stat reports this change reduces the hold
times of the rtnl_lock by a factor of 10. There was no observable
difference in the amount of time it takes to destroy a network device.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 13 Oct 2011 18:24:42 +0000 (18:24 +0000)]
tcp: use TCP_INIT_CWND in tcp_fixup_sndbuf()
Initial cwnd being 10 (TCP_INIT_CWND) instead of 3, change
tcp_fixup_sndbuf() to get more than 16384 bytes (sysctl_tcp_wmem[1]) in
initial sk_sndbuf
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thadeu Lima de Souza Cascardo [Thu, 13 Oct 2011 09:56:19 +0000 (09:56 +0000)]
ehea: Change maintainer to me
Breno Leitao has passed the maintainership to me.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Breno Leitao <leitao@linux.vnet.ibm.com>
Acked-by: Breno Leitão <leitao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Thu, 13 Oct 2011 04:33:55 +0000 (04:33 +0000)]
phylib: Modify Vitesse RGMII skew settings
The Vitesse driver was using the RGMII_ID interface type to determine if
skew was necessary. However, we want to move away from using that
interface type, as it's really a property of the board's PHY connection.
However, some boards depend on it, so we want to support it, while
allowing new boards to use the more flexible "fixups" approach. To do
this, we extract the code which adds skew into its own function, and
call that function when RGMII_ID has been selected.
Another side-effect of this change is that if your PHY has skew set
already, it doesn't clear it. This way, the fixup code can modify the
register without config_init then clearing it.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Thu, 13 Oct 2011 04:33:54 +0000 (04:33 +0000)]
net: Allow skb_recycle_check to be done in stages
skb_recycle_check resets the skb if it's eligible for recycling.
However, there are times when a driver might want to optionally
manipulate the skb data with the skb before resetting the skb,
but after it has determined eligibility. We do this by splitting the
eligibility check from the skb reset, creating two inline functions to
accomplish that task.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 19 Oct 2011 13:44:11 +0000 (06:44 -0700)]
Merge branch 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus
* 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus:
[media] videodev: fix a NULL pointer dereference in v4l2_device_release()