GitHub/exynos8895/android_kernel_samsung_universal8895.git
10 years agoxen-netback: Disable NAPI after disabling interrupts
Zoltan Kiss [Tue, 28 Oct 2014 15:29:30 +0000 (15:29 +0000)]
xen-netback: Disable NAPI after disabling interrupts

Otherwise the interrupt handler still calls napi_complete. Although it
won't schedule NAPI again as either NAPI_STATE_DISABLE or
NAPI_STATE_SCHED is set, it is just unnecessary, and it makes more
sense to do this way.

Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: ipv6: Add a sysctl to make optimistic addresses useful candidates
Erik Kline [Tue, 28 Oct 2014 09:11:14 +0000 (18:11 +0900)]
net: ipv6: Add a sysctl to make optimistic addresses useful candidates

Add a sysctl that causes an interface's optimistic addresses
to be considered equivalent to other non-deprecated addresses
for source address selection purposes.  Preferred addresses
will still take precedence over optimistic addresses, subject
to other ranking in the source address selection algorithm.

This is useful where different interfaces are connected to
different networks from different ISPs (e.g., a cell network
and a home wifi network).

The current behaviour complies with RFC 3484/6724, and it
makes sense if the host has only one interface, or has
multiple interfaces on the same network (same or cooperating
administrative domain(s), but not in the multiple distinct
networks case.

For example, if a mobile device has an IPv6 address on an LTE
network and then connects to IPv6-enabled wifi, while the wifi
IPv6 address is undergoing DAD, IPv6 connections will try use
the wifi default route with the LTE IPv6 address, and will get
stuck until they time out.

Also, because optimistic nodes can receive frames, issue
an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
flag appropriately set).  A second RTM_NEWADDR is sent if DAD
completes (the address flags have changed), otherwise an
RTM_DELADDR is sent.

Also: add an entry in ip-sysctl.txt for optimistic_dad.

Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'r8152-next'
David S. Miller [Wed, 29 Oct 2014 19:09:16 +0000 (15:09 -0400)]
Merge branch 'r8152-next'

Hayes Wang says:

====================
r8152: support nway_reset

Fix the CHECK from checkpatch.pl and support nway_reset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agor8152: support nway_reset of ethtool
hayeswang [Tue, 28 Oct 2014 06:05:52 +0000 (14:05 +0800)]
r8152: support nway_reset of ethtool

Support the nway_reset() function for ethtool.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agor8152: rename tx_underun
hayeswang [Tue, 28 Oct 2014 06:05:51 +0000 (14:05 +0800)]
r8152: rename tx_underun

Replace tx_underun with tx_underrun for checkpatch.pl.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotcp: allow for bigger reordering level
Eric Dumazet [Tue, 28 Oct 2014 04:45:24 +0000 (21:45 -0700)]
tcp: allow for bigger reordering level

While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.

Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.

Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.

[1] Aggregation of links, per packet load balancing, fabrics not doing
 deep packet inspections, alternative TCP congestion modules...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: ethernet: realtek: atp: checkpatch errors and warnings corrected
Roberto Medina [Mon, 27 Oct 2014 23:51:56 +0000 (00:51 +0100)]
net: ethernet: realtek: atp: checkpatch errors and warnings corrected

Several warnings and errors of coding style rules corrected.
Compile tested.

Signed-off-by: Roberto Medina <robertoxmed@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: skb_segment() should preserve backpressure
Toshiaki Makita [Mon, 27 Oct 2014 17:30:51 +0000 (10:30 -0700)]
net: skb_segment() should preserve backpressure

This patch generalizes commit d6a4a1041176 ("tcp: GSO should be TSQ
friendly") to protocols using skb_set_owner_w()

TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
as explained in commit 6ff50cd55545 ("tcp: gso: do not generate out of
order packets")

This allows UDP sockets using UFO to get proper backpressure,
thus avoiding qdisc drops and excessive cpu usage.

Here are performance test results (macvlan on vlan):

- Before
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      144096 1224195    1258.56
212992           60.00          51              0.45

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.23      0.00     25.26      0.08      0.00     74.43

- After
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      109593      0     957.20
212992           60.00      109593            957.20

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.18      0.00      8.38      0.02      0.00     91.43

[edumazet] Rewrote patch and changelog.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agogianfar: Implement PAUSE frame generation support
Matei Pavaluca [Mon, 27 Oct 2014 08:42:44 +0000 (10:42 +0200)]
gianfar: Implement PAUSE frame generation support

The hardware can automatically generate pause frames when the number
of free buffers drops under a certain threshold, but in order to do this,
the address of the last free buffer needs to be written to a specific
register for each RX queue.

This has to be done in 'gfar_clean_rx_ring' which is called for each
RX queue. In order not to impact performance, by adding a register write
for each incoming packet, this operation is done only when the PAUSE frame
transmission is enabled.

Whenever the link is readjusted, this capability is turned on or off.

Signed-off-by: Matei Pavaluca <matei.pavaluca@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoFix the way the local advertising flow options are determined
Pavaluca Matei-B46610 [Mon, 27 Oct 2014 08:42:43 +0000 (10:42 +0200)]
Fix the way the local advertising flow options are determined

Local flow control options needed in order to resolve the negotiation
are incorrectly calculated.

Previously 'mii_advertise_flowctrl' was called to determine the local advertising
options, but these were determined based on FLOW_CTRL_RX/TX flags which are
never set through ethtool.
The patch simply translates from ethtool flow options to mii flow options.

Signed-off-by: Pavaluca Matei <matei.pavaluca@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoAdd flow control support flags to gianfar's capabilities
Pavaluca Matei-B46610 [Mon, 27 Oct 2014 08:42:42 +0000 (10:42 +0200)]
Add flow control support flags to gianfar's capabilities

The phy device supports 802.3x flow control, but the specific flags are not set
in the phy initialisation code. Flow control flags need to be added to the
supported capabilities of the phydev by the driver.

This is needed in order for ethtool to work ('ethtool -A' code checks for these
flags)

Signed-off-by: Pavaluca Matei <matei.pavaluca@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoopenvswitch: Export lockdep_ovsl_is_held to modules.
David S. Miller [Tue, 28 Oct 2014 21:27:23 +0000 (17:27 -0400)]
openvswitch: Export lockdep_ovsl_is_held to modules.

ERROR: "lockdep_ovsl_is_held" [net/openvswitch/vport-gre.ko] undefined!

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'mellanox-next'
David S. Miller [Tue, 28 Oct 2014 21:18:07 +0000 (17:18 -0400)]
Merge branch 'mellanox-next'

Amir Vadai says:

====================
Mellanox ethernet driver update Oct-27-2014

This patchset introduces some small bug fixes, support in get/set of
vlan offload and get/set/capabilities of the link.

First 7 patches by Saeed, add support in setting/getting link speed and getting
cable capabilities.
Next 2 patches also by Saeed, enable the user to turn rx/tx vlan offloading on
and off.
Jenni fixed a bug in error flow during device initalization.
Ido and Jack fixed some code duplication and errors discovered by static checker.
last patch by me is a fix to make ethtool report the actual rings used by
indirection QP.

Patches were applied and tested against commit 61ed53d ("Merge tag 'ntb-3.18'
of git://github.com/jonmason/ntb")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Report actual number of rings in indirection table
Amir Vadai [Mon, 27 Oct 2014 09:37:47 +0000 (11:37 +0200)]
net/mlx4_en: Report actual number of rings in indirection table

Hardware requires the number of rings in indirection table to be a power
of 2. When setting number of channels to a non power of 2 number,
indirection table is using only the closest power of 2 rings.
Report this number in 'ethtool -x' and not the total number of rx rings.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Move spinlocks and work initalizations to beginning of init_netdev
Eugenia Emantayev [Mon, 27 Oct 2014 09:37:46 +0000 (11:37 +0200)]
net/mlx4_en: Move spinlocks and work initalizations to beginning of init_netdev

Upon failures, destroy_netdev is called, and spinlocks/works must be
initialized before calling it. Otherwise kernel panic may occur.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Call napi_synchronize on stop_port
Ido Shamay [Mon, 27 Oct 2014 09:37:45 +0000 (11:37 +0200)]
net/mlx4_en: Call napi_synchronize on stop_port

This is instead of calling the actual implementation of
napi_synchronize, for better encapsulation.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Cleanups suggested by clang static checker
Jack Morgenstein [Mon, 27 Oct 2014 09:37:44 +0000 (11:37 +0200)]
net/mlx4_en: Cleanups suggested by clang static checker

clang flagged the following. All are actually cosmetic cleanups, not really bugs:

drivers/net/ethernet/mellanox/mlx4/en_main.c:233:3: warning: Value stored to 'err' is never read
                err = -ENOMEM;
                ^     ~~~~~~~
drivers/net/ethernet/mellanox/mlx4/en_main.c:293:3: warning: Value stored to 'err' is never read
                err = -ENOMEM;

drivers/net/ethernet/mellanox/mlx4/en_netdev.c:648:16: warning: Assigned value is garbage or undefined
        entry->reg_id = reg_id;
                      ^ ~~~~~~
drivers/net/ethernet/mellanox/mlx4/en_netdev.c:659:2: warning: Function call argument is an uninitialized value
        mlx4_en_uc_steer_release(priv, priv->dev->dev_addr, *qpn, reg_id);
(NOTE: reg_id is only used in the device-managed flow steering path, in which is it always initialized.
 This is not a bug. Cleanup here is therefore cosmetic only).

drivers/net/ethernet/mellanox/mlx4/en_rx.c:122:3: warning: Value stored to 'frag_info' is never read
                frag_info = &priv->frag_info[i];
                ^           ~~~~~~~~~~~~~~~~~~~

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Add ethtool support for [rx|tx]vlan offload set to OFF/ON
Saeed Mahameed [Mon, 27 Oct 2014 09:37:43 +0000 (11:37 +0200)]
net/mlx4_en: Add ethtool support for [rx|tx]vlan offload set to OFF/ON

Move mlx4_en_reset_config to en_netdev.c as it now serves more general purpose.
Add support for turning OFF/ON the rx/tx vlan offlad.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Add support for setting rxvlan offload OFF/ON
Saeed Mahameed [Mon, 27 Oct 2014 09:37:42 +0000 (11:37 +0200)]
net/mlx4_en: Add support for setting rxvlan offload OFF/ON

Rename mlx4_en_timestamp_config to mlx4_en_reset_config and extend it to support
choosing RX vlan offload configuration.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Use PTYS register to set ethtool settings (Speed)
Saeed Mahameed [Mon, 27 Oct 2014 09:37:41 +0000 (11:37 +0200)]
net/mlx4_en: Use PTYS register to set ethtool settings (Speed)

Added Support to set speed or advertised link modes via ethtool:
ethtool -s <ifname> [speed <speed>] [advertise <link modes>]

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Use PTYS register to query ethtool settings
Saeed Mahameed [Mon, 27 Oct 2014 09:37:40 +0000 (11:37 +0200)]
net/mlx4_en: Use PTYS register to query ethtool settings

- If dev cap MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL is ON, query PTYS register to fill ethtool settings.
else use default values.
- Use autoneg port cap and dev backplane autoneg cap to reprort autoneg interface capbilities.
- Fix typo in mlx4_en_port_state struct field (transciver to transceiver).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoethtool, net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting support
Saeed Mahameed [Mon, 27 Oct 2014 09:37:39 +0000 (11:37 +0200)]
ethtool, net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting support

Added 100M, 20G and 56G ethtool speed reporting support.
Update mlx4_en_test_speed self test with the new speeds.

Defined new link speeds in include/uapi/linux/ethtool.h:
+#define SPEED_20000 20000
+#define SPEED_40000 40000
+#define SPEED_56000 56000

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_core: Add ethernet backplane autoneg device capability
Saeed Mahameed [Mon, 27 Oct 2014 09:37:38 +0000 (11:37 +0200)]
net/mlx4_core: Add ethernet backplane autoneg device capability

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap
Saeed Mahameed [Mon, 27 Oct 2014 09:37:37 +0000 (11:37 +0200)]
net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap

Adding ACCESS REG mlx4 command and use it to implement Query method for
PTYS (Port Type and Speed Register).
Query and store eth_prot_ctrl dev cap.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support
Saeed Mahameed [Mon, 27 Oct 2014 09:37:36 +0000 (11:37 +0200)]
ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support

Added support for get_module_info/get_module_eeprom ethtool support for cable info reading.

Added new cable types enum in include/uapi/linux/ethtool.h for ethtool use.
+#define ETH_MODULE_SFF_8636            0x3
+#define ETH_MODULE_SFF_8636_LEN        256
+#define ETH_MODULE_SFF_8436            0x4
+#define ETH_MODULE_SFF_8436_LEN        256

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_core: Introduce mlx4_get_module_info for cable module info reading
Saeed Mahameed [Mon, 27 Oct 2014 09:37:35 +0000 (11:37 +0200)]
net/mlx4_core: Introduce mlx4_get_module_info for cable module info reading

Added new MAD_IFC command to read cable module info with attribute id (0xFF60).
Update include/linux/mlx4/device.h with function declaration (mlx4_get_module_info)
and the needed defines/enums for future use.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agodatapath: Rename last_action() as nla_is_last() and move to netlink.h
Simon Horman [Mon, 27 Oct 2014 07:12:16 +0000 (16:12 +0900)]
datapath: Rename last_action() as nla_is_last() and move to netlink.h

The original motivation for this change was to allow the helper to be used
in files other than actions.c as part of work on an odp select group
action.

It was as pointed out by Thomas Graf that this helper would be best off
living in netlink.h. Furthermore, I think that the generic nature of this
helper means it is best off in netlink.h regardless of if it is used more
than one .c file or not. Thus, I would like it considered independent of
the work on an odp select group action.

Cc: Thomas Graf <tgraf@suug.ch>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: pxa168_eth: Fix providing of phy_interface mode on platform_data
Sebastian Hesselbarth [Sat, 25 Oct 2014 10:08:59 +0000 (12:08 +0200)]
net: pxa168_eth: Fix providing of phy_interface mode on platform_data

Do not add phy include to the board file but platform_data include
instead.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: phy: Adding SGMII support for Marvell 88ee1145 driver
Viet Nga Dao [Fri, 24 Oct 2014 02:41:53 +0000 (19:41 -0700)]
net: phy: Adding SGMII support for Marvell 88ee1145 driver

Additional code to m88e1145_config_init function to allow the driver to
support SGMII mode.

Signed-off-by: Viet Nga Dao <vndao@altera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoovs: Turn vports with dependencies into separate modules
Thomas Graf [Wed, 22 Oct 2014 15:29:06 +0000 (17:29 +0200)]
ovs: Turn vports with dependencies into separate modules

The internal and netdev vport remain part of openvswitch.ko. Encap
vports including vxlan, gre, and geneve can be built as separate
modules and are loaded on demand. Modules can be unloaded after use.
Datapath ports keep a reference to the vport module during their
lifetime.

Allows to remove the error prone maintenance of the global list
vport_ops_list.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'unnecessary_resource_check'
David S. Miller [Mon, 27 Oct 2014 23:16:14 +0000 (19:16 -0400)]
Merge branch 'unnecessary_resource_check'

Varka Bhadram says:

====================
cleanup on resource check

This series removes the duplication of sanity check for
platform_get_resource() return resource. It will be checked
with devm_ioremap_resource()

changes since v2:
- Merge #1 and #2 patches into single patch
- remove the comment

changes since v1:
- remove NULL dereference on resource_size()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoethernet: samsung: sxgbe: remove unnecessary check
Varka Bhadram [Fri, 24 Oct 2014 02:12:10 +0000 (07:42 +0530)]
ethernet: samsung: sxgbe: remove unnecessary check

devm_ioremap_resource checks platform_get_resource() return value.
We can remove the duplicate check here.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoethernet: renesas: remove unnecessary check
Varka Bhadram [Fri, 24 Oct 2014 02:12:09 +0000 (07:42 +0530)]
ethernet: renesas: remove unnecessary check

devm_ioremap_resource checks platform_get_resource() return value.
We can remove the duplicate check here.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoethernet: marvell: remove unnecessary check
Varka Bhadram [Fri, 24 Oct 2014 02:12:08 +0000 (07:42 +0530)]
ethernet: marvell: remove unnecessary check

devm_ioremap_resource checks platform_get_resource() return value.
We can remove the duplicate check here.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoethernet: apm: xgene: remove unnecessary check
Varka Bhadram [Fri, 24 Oct 2014 02:12:07 +0000 (07:42 +0530)]
ethernet: apm: xgene: remove unnecessary check

devm_ioremap_resource checks platform_get_resource() return value.
We can remove the duplicate check here.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoethernet: wiznet: remove unnecessary check
Varka Bhadram [Fri, 24 Oct 2014 02:12:06 +0000 (07:42 +0530)]
ethernet: wiznet: remove unnecessary check

devm_ioremap_resource checks platform_get_resource() return value.
We can remove the duplicate check here.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobridge: Add support for IEEE 802.11 Proxy ARP
Kyeyoon Park [Thu, 23 Oct 2014 21:49:17 +0000 (14:49 -0700)]
bridge: Add support for IEEE 802.11 Proxy ARP

This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows
the AP devices to keep track of the hardware-address-to-IP-address
mapping of the mobile devices within the WLAN network.

The AP will learn this mapping via observing DHCP, ARP, and NS/NA
frames. When a request for such information is made (i.e. ARP request,
Neighbor Solicitation), the AP will respond on behalf of the
associated mobile device. In the process of doing so, the AP will drop
the multicast request frame that was intended to go out to the wireless
medium.

It was recommended at the LKS workshop to do this implementation in
the bridge layer. vxlan.c is already doing something very similar.
The DHCP snooping code will be added to the userspace application
(hostapd) per the recommendation.

This RFC commit is only for IPv4. A similar approach in the bridge
layer will be taken for IPv6 as well.

Signed-off-by: Kyeyoon Park <kyeyoonp@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipx: remove __inline__ in c file on static
Fabian Frederick [Mon, 27 Oct 2014 20:12:08 +0000 (21:12 +0100)]
ipx: remove __inline__ in c file on static

Let compiler decide what to do with static void __ipxitf_put()

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipx: remove unnecessary casting on ntohl
Fabian Frederick [Mon, 27 Oct 2014 19:55:08 +0000 (20:55 +0100)]
ipx: remove unnecessary casting on ntohl

use %08X instead of %08lX and remove casting.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipx: move extern sysctl_ipx_pprop_broadcasting to header file
Fabian Frederick [Mon, 27 Oct 2014 19:00:41 +0000 (20:00 +0100)]
ipx: move extern sysctl_ipx_pprop_broadcasting to header file

include ipx.h from sysctl_net_ipx.c

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: include linux/uaccess.h instead of asm/uaccess.h
Fabian Frederick [Mon, 27 Oct 2014 18:12:58 +0000 (19:12 +0100)]
ipv6: include linux/uaccess.h instead of asm/uaccess.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: replace min/casting by min_t
Fabian Frederick [Mon, 27 Oct 2014 18:11:56 +0000 (19:11 +0100)]
ipv6: replace min/casting by min_t

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv4: remove set but unused variable sha
Fabian Frederick [Mon, 27 Oct 2014 18:03:22 +0000 (19:03 +0100)]
ipv4: remove set but unused variable sha

unsigned char *sha (source) was already in original git version
 but was never used.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 's390-next'
David S. Miller [Mon, 27 Oct 2014 02:21:45 +0000 (22:21 -0400)]
Merge branch 's390-next'

Frank Blaschka says:

====================
s390: network patches for net-next

looks like there was a problem with my previous posting. Hope this time
it will work. Sorry for any inconvenience. The patches are mostly
cleanups and small enhancements for net-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoctcm: replace sscanf by kstrto function
Thomas Richter [Wed, 22 Oct 2014 10:18:07 +0000 (12:18 +0200)]
ctcm: replace sscanf by kstrto function

Since a single integer value is read from the supplied buffer
use the kstrto functions instead of sscanf.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agolcs: replace sscanf by kstrto function
Thomas Richter [Wed, 22 Oct 2014 10:18:06 +0000 (12:18 +0200)]
lcs: replace sscanf by kstrto function

Since a single integer value is read from the supplied buffer
use the kstrto functions instead of sscanf.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoqeth: s390 ethernet device driver dependency
Thomas Richter [Wed, 22 Oct 2014 10:18:05 +0000 (12:18 +0200)]
qeth: s390 ethernet device driver dependency

Compile the s390 10GB ethernet device driver only when
ETHERNET has been defined in the kernel configuration file.
Right now the qeth device driver is always built regardless
of which network connectivity is active.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoqeth: make local functions static in qeth_l3 module
Thomas Richter [Wed, 22 Oct 2014 10:18:04 +0000 (12:18 +0200)]
qeth: make local functions static in qeth_l3 module

This patch makes 4 local functions static and removes
the prototypes from the header file.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoqeth: fix some trace formating issues
Thomas Richter [Wed, 22 Oct 2014 10:18:03 +0000 (12:18 +0200)]
qeth: fix some trace formating issues

This patch fixes trace formatting issues using the
QETH_CARD_TEXT_ macro. The total size of each trace entry
is 8 bytes. Some of the sprintf formats exceed these 8
bytes (for example using abcd:%d and the converted value
needs more than 3 bytes). The solution is to shorten the
text prepending the value or use a different format (%x).

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoqeth: qeth_core_main make local functions static
Thomas Richter [Wed, 22 Oct 2014 10:18:02 +0000 (12:18 +0200)]
qeth: qeth_core_main make local functions static

This patch makes some global functions static and removes
the prototypes from the header file.
Also function qeth_query_card_info is not exported anymore,
there is no external user for it, this function should never
have been exported in the first place.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoxen-netfront: always keep the Rx ring full of requests
David Vrabel [Wed, 22 Oct 2014 10:17:06 +0000 (11:17 +0100)]
xen-netfront: always keep the Rx ring full of requests

A full Rx ring only requires 1 MiB of memory.  This is not enough
memory that it is useful to dynamically scale the number of Rx
requests in the ring based on traffic rates, because:

a) Even the full 1 MiB is a tiny fraction of a typically modern Linux
   VM (for example, the AWS micro instance still has 1 GiB of memory).

b) Netfront would have used up to 1 MiB already even with moderate
   data rates (there was no adjustment of target based on memory
   pressure).

c) Small VMs are going to typically have one VCPU and hence only one
   queue.

Keeping the ring full of Rx requests handles bursty traffic better
than trying to converge on an optimal number of requests to keep
filled.

On a 4 core host, an iperf -P 64 -t 60 run from dom0 to a 4 VCPU guest
improved from 5.1 Gbit/s to 5.6 Gbit/s.  Gains with more bursty
traffic are expected to be higher.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'sunvnet-napi'
David S. Miller [Sat, 25 Oct 2014 20:20:20 +0000 (16:20 -0400)]
Merge branch 'sunvnet-napi'

Sowmini Varadhan says:

====================
sunvnet: NAPIfy sunvnet

This patchset converts the sunvnet driver to use the NAPI framework.
Changes since v4 to Patch1:
  vnet_event accumulates LDC_EVENT_* bits into rx_event.
  vnet_event_napi() unrolls send_events() logic to process all rx_event bits.
Changes since v5:
  Patch 1: use net_device.h definition for NAPI_POLL_WEIGHT.
  Drop sparclinux changes (patch3) per David Miller feedback

Patch 1 in the series addresses the packet-receive path- all
the vnet_event() processing is moved into NAPI context.
This patch is dependant on the sparc-next commit:
  "sparc64: Add vio_set_intr() to enable/disable Rx interrupts"
  (sparc commit id ca605b7dd740c8909408d67911d8ddd272c2b320)

Patch 2 uses RCU to fix race conditions between vnet_port_remove and
paths that access/modify port-related state, such as vnet_start_xmit.

Patch 3 leverages from the NAPIfied Rx path,
dropping superfluous usage of the irqsave/irqrestores on the vio.lock
where possible.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosunvnet: Remove irqsave/irqrestore on vio.lock
Sowmini Varadhan [Sat, 25 Oct 2014 19:12:31 +0000 (15:12 -0400)]
sunvnet: Remove irqsave/irqrestore on vio.lock

After the  NAPIfication of sunvnet, we no longer need to
synchronize by doing irqsave/restore on vio.lock in the
I/O fastpath.

NAPI ->poll() is non-reentrant, so all RX processing occurs
strictly in a serialized environment. TX reclaim is done in NAPI
context, so the netif_tx_lock can be used to serialize
critical sections between Tx and Rx paths.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosunvnet: Use RCU to synchronize port usage with vnet_port_remove()
Sowmini Varadhan [Sat, 25 Oct 2014 19:12:20 +0000 (15:12 -0400)]
sunvnet: Use RCU to synchronize port usage with vnet_port_remove()

A vnet_port_remove could be triggered as a result of an ldm-unbind
operation by the peer, module unload, or other changes to the
inter-vnet-link configuration.  When this is concurrent with
vnet_start_xmit(), there are several race sequences possible,
such as

thread 1                                    thread 2
vnet_start_xmit
-> tx_port_find
   spin_lock_irqsave(&vp->lock..)
   ret = __tx_port_find(..)
   spin_lock_irqrestore(&vp->lock..)
                                           vio_remove -> ..
                                               ->vnet_port_remove
                                           spin_lock_irqsave(&vp->lock..)
                                           cleanup
                                           spin_lock_irqrestore(&vp->lock..)
                                           kfree(port)
/* attempt to use ret will bomb */

This patch adds RCU locking for port access so that vnet_port_remove
will correctly clean up port-related state.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosunvnet: NAPIfy sunvnet
Sowmini Varadhan [Sat, 25 Oct 2014 19:12:12 +0000 (15:12 -0400)]
sunvnet: NAPIfy sunvnet

Move Rx packet procssing to the NAPI poll callback.
Disable VIO interrupt and unconditioanlly go into NAPI
context from vnet_event.

Note that we want to minimize the number of LDC
STOP/START messages sent. Specifically, do not send a STOP
message if vnet_walk_rx does not read all the available descriptors
because of the NAPI budget limitation. Instead, note the end index
as part of port state, and resume from this index when the
next poll callback is triggered.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Raghuram Kothakota <raghuram.kothakota@oracle.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Fri, 24 Oct 2014 20:41:02 +0000 (16:41 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-10-23

This series contains updates to i40e and i40evf.

Jesse modifies the i40e driver to only notify the firmware on link up/down
and qualified module events.  Also simplified the job of managing link
state by using the admin queue receive event for link events as a signal
to tell the driver to update link state.

Jeff (me) cleans up the inconsistent use of tabs for indentation in the admin
queue command header file.

Neerav converts the use of udelay() to usleep_range().

Anjali fixes a bug where receive would stop after some stress by adding
a sleep and restart as well as moving the setting of flow control because
it should be done at a PF level and not a VSI level.

Mitch adds code to handle link events when updating the PF switch, which
allows link information to be properly provided to VFS in all cases.

Catherine adds driver support for 10GBaseT and bumps driver version.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: llc: include linux/errno.h instead of asm/errno.h
Fabian Frederick [Wed, 22 Oct 2014 19:06:26 +0000 (21:06 +0200)]
net: llc: include linux/errno.h instead of asm/errno.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agolapb: move EXPORT_SYMBOL after functions.
Fabian Frederick [Wed, 22 Oct 2014 19:01:41 +0000 (21:01 +0200)]
lapb: move EXPORT_SYMBOL after functions.

See Documentation/CodingStyle Chapter 6

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'berlin_ethernet'
David S. Miller [Fri, 24 Oct 2014 19:49:25 +0000 (15:49 -0400)]
Merge branch 'berlin_ethernet'

Sebastian Hesselbarth says:

====================
Marvell PXA168 libphy handling and Berlin Ethernet

This patch series deals with a removing a IP feature that can be found
on all currently supported Marvell Ethernet IP (pxa168_eth, mv643xx_eth,
mvneta). The MAC IP allows to automatically perform PHY auto-negotiation
without software interaction.

However, this feature (a) fundamentally clashes with the way libphy works
and (b) is unable to deal with quirky PHYs that require special treatment.
In this series, pxa168_eth driver is rewritten to completely disable that
feature and properly deal with libphy provided PHYs.

As usual, a branch on top of v3.18-rc1 can be found at

git://git.infradead.org/users/hesselba/linux-berlin.git devel/bg2-bg2cd-eth-v2

Patches 1-5 should go through David's net tree, I'll pick up the DT patches
6-9.

There have been some changes,
compared to the RFT
- added phy-connection-type property to BG2Q PHY DT node
- bail out from pxa168_eth_adjust_link when there is no change in
  PHY parameters. Also, add a call to phy_print_status.
compared to v1
- move phy-connection-type to ethernet node instead of PHY node

Patch 1 adds support for Marvell 88E3016 FastEthernet PHY that is also
integrated in Marvell Berlin BG2/BG2CD SoCs.

Patch 2 allows to pass phy_interface_t on pxa168_eth platform_data that
is only used by mach-mmp/gplug. From the board setup, I guessed gplug's
PHY is connected via RMII. The patch still isn't even compile tested.

Patches 3-5 prepare proper libphy handling and finally remove all in-driver
PHY mangling related to the feature explained above.

Patches 6-9 add corresponding ethernet DT nodes to BG2, BG2CD, add a
phy-connection-type property to BG2Q and enable ethernet on BG2-based Sony
NSZ-GS7. I have tested all this on GS7 successfully with ip=dhcp on 100M FD.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: pxa168_eth: Remove in-driver PHY mangling
Sebastian Hesselbarth [Wed, 22 Oct 2014 18:26:48 +0000 (20:26 +0200)]
net: pxa168_eth: Remove in-driver PHY mangling

With properly using libphy PHYs now, remove the in-driver PHY
mangling.

Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: pxa168_eth: Remove HW auto-negotiaion
Sebastian Hesselbarth [Wed, 22 Oct 2014 18:26:47 +0000 (20:26 +0200)]
net: pxa168_eth: Remove HW auto-negotiaion

Marvell Ethernet IP supports PHY negotiation driven by HW. This
fundamentally clashes with libphy (software) driven negotiation and
also cannot cope with quirky PHYs. Therefore, always disable any HW
negotiation features and properly use libphy's phy_device.

Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: pxa168_eth: Prepare proper libphy handling
Sebastian Hesselbarth [Wed, 22 Oct 2014 18:26:46 +0000 (20:26 +0200)]
net: pxa168_eth: Prepare proper libphy handling

Current libphy handling in pxa168_eth lacks proper phy_connect. Prepare
to fix this by first moving phy properties from platform_data to private
driver data.

Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: pxa168_eth: Provide phy_interface mode on platform_data
Sebastian Hesselbarth [Wed, 22 Oct 2014 18:26:45 +0000 (20:26 +0200)]
net: pxa168_eth: Provide phy_interface mode on platform_data

The PXA168 Ethernet IP support MII and RMII connection to its PHY.
Currently, pxa168 platform_data does not provide a way to pass that
and there is one user of pxa168 platform_data (mach-mmp/gplug).
Given the pinctrl settings of gplug it uses RMII, so add and pass
a corresponding phy_interface_t.

Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agophy: marvell: Add support for 88E3016 FastEthernet PHY
Sebastian Hesselbarth [Wed, 22 Oct 2014 18:26:44 +0000 (20:26 +0200)]
phy: marvell: Add support for 88E3016 FastEthernet PHY

Marvell 88E3016 is a FastEthernet PHY that also can be found in Marvell
Berlin SoCs as integrated PHY.

Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonatsemi/macsonic: Remove superfluous interrupt disable/restore
Geert Uytterhoeven [Tue, 21 Oct 2014 17:53:57 +0000 (19:53 +0200)]
natsemi/macsonic: Remove superfluous interrupt disable/restore

As of commit e4dc601bf99ccd1c ("m68k: Disable/restore interrupts in
hwreg_present()/hwreg_write()"), this is no longer needed.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agocirrus/mac89x0: Remove superfluous interrupt disable/restore
Geert Uytterhoeven [Tue, 21 Oct 2014 17:53:11 +0000 (19:53 +0200)]
cirrus/mac89x0: Remove superfluous interrupt disable/restore

As of commit e4dc601bf99ccd1c ("m68k: Disable/restore interrupts in
hwreg_present()/hwreg_write()"), this is no longer needed.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: typhoon: Remove redundant casts
Rasmus Villemoes [Tue, 21 Oct 2014 14:51:43 +0000 (16:51 +0200)]
net: typhoon: Remove redundant casts

Both image_data and typhoon_fw->data are const u8*, so the cast to u8*
is unnecessary and confusing.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: David Dillow <dave@thedillows.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoRemoved unused function sctp_addr_is_valid()
Sébastien Barré [Tue, 21 Oct 2014 13:26:15 +0000 (15:26 +0200)]
Removed unused function sctp_addr_is_valid()

sctp_addr_is_valid() only appeared in its definition.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'ipv6_route'
David S. Miller [Fri, 24 Oct 2014 04:14:52 +0000 (00:14 -0400)]
Merge branch 'ipv6_route'

Martin KaFai Lau says:

====================
ipv6: Reduce the number of fib6_lookup() calls from ip6_pol_route()

This patch set is trying to reduce the number of fib6_lookup()
calls from ip6_pol_route().

I have adapted davem's udpflooda and kbench_mod test
(https://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git) to
support IPv6 and here is the result:

Before:
[root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done

real    0m34.190s
user    0m3.047s
sys     0m31.108s

real    0m34.635s
user    0m3.125s
sys     0m31.475s

real    0m34.517s
user    0m3.034s
sys     0m31.449s

[root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2
[  660.160976] ip6_route_kbench: ip6_route_output tdiff: 933
[  660.207261] ip6_route_kbench: ip6_route_output tdiff: 988
[  660.253492] ip6_route_kbench: ip6_route_output tdiff: 896
[  660.298862] ip6_route_kbench: ip6_route_output tdiff: 898

After:
[root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done

real    0m32.695s
user    0m2.925s
sys     0m29.737s

real    0m32.636s
user    0m3.007s
sys     0m29.596s

real    0m32.797s
user    0m2.866s
sys     0m29.898s

[root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2
[  881.220793] ip6_route_kbench: ip6_route_output tdiff: 684
[  881.253477] ip6_route_kbench: ip6_route_output tdiff: 640
[  881.286867] ip6_route_kbench: ip6_route_output tdiff: 630
[  881.320749] ip6_route_kbench: ip6_route_output tdiff: 653

/****************************** udpflood.c ******************************/
/* It is an adaptation of the Eric Dumazet's and David Miller's
 * udpflood tool, by adding IPv6 support.
 */

typedef uint32_t u32;

static int debug =3D 0;

/* Allow -fstrict-aliasing */
typedef union sa_u {
struct sockaddr_storage a46;
struct sockaddr_in a4;
struct sockaddr_in6 a6;
} sa_u;

static int usage(void)
{
printf("usage: udpflood [ -l count ] [ -m message_size ] [ -c num_ip_addrs=
 ] IP_ADDRESS\n");
return -1;
}

static u32 get_last32h(const sa_u *sa)
{
if (sa->a46.ss_family =3D=3D PF_INET)
return ntohl(sa->a4.sin_addr.s_addr);
else
return ntohl(sa->a6.sin6_addr.s6_addr32[3]);
}

static void set_last32h(sa_u *sa, u32 last32h)
{
if (sa->a46.ss_family =3D=3D PF_INET)
sa->a4.sin_addr.s_addr =3D htonl(last32h);
else
sa->a6.sin6_addr.s6_addr32[3] =3D htonl(last32h);
}

static void print_saddr(const sa_u *sa, const char *msg)
{
char buf[64];

if (!debug)
return;

switch (sa->a46.ss_family) {
case PF_INET:
inet_ntop(PF_INET, &(sa->a4.sin_addr.s_addr), buf,
  sizeof(buf));
break;
case PF_INET6:
inet_ntop(PF_INET6, &(sa->a6.sin6_addr), buf, sizeof(buf));
break;
}

printf("%s: %s\n", msg, buf);
}

static int send_packets(const sa_u *sa, size_t num_addrs, int count, int ms=
g_sz)
{
char *msg =3D malloc(msg_sz);
sa_u saddr;
u32 start_addr32h, end_addr32h, cur_addr32h;
int fd, i, err;

if (!msg)
return -ENOMEM;

memset(msg, 0, msg_sz);

memcpy(&saddr, sa, sizeof(saddr));
cur_addr32h =3D start_addr32h =3D get_last32h(&saddr);
end_addr32h =3D start_addr32h + num_addrs;

fd =3D socket(saddr.a46.ss_family, SOCK_DGRAM, 0);
if (fd < 0) {
perror("socket");
err =3D fd;
goto out_nofd;
}

/* connect to avoid the kernel spending time in figuring
 * out the source address (i.e pin the src address)
 */
err =3D connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
if (err < 0) {
perror("connect");
goto out;
}

print_saddr(&saddr, "start_addr");
for (i =3D 0; i < count; i++) {
print_saddr(&saddr, "sendto");
err =3D sendto(fd, msg, msg_sz, 0, (struct sockaddr *)&saddr,
     sizeof(saddr));
if (err < 0) {
perror("sendto");
goto out;
}

if (++cur_addr32h >=3D end_addr32h)
cur_addr32h =3D start_addr32h;
set_last32h(&saddr, cur_addr32h);
}

err =3D 0;
out:
close(fd);
out_nofd:
free(msg);
return err;
}

int main(int argc, char **argv, char **envp)
{
int port, msg_sz, count, num_addrs, ret;

sa_u start_addr;

port =3D 6000;
msg_sz =3D 32;
count =3D 10000000;
num_addrs =3D 1;

while ((ret =3D getopt(argc, argv, "dl:s:p:c:")) >=3D 0) {
switch (ret) {
case 'l':
sscanf(optarg, "%d", &count);
break;
case 's':
sscanf(optarg, "%d", &msg_sz);
break;
case 'p':
sscanf(optarg, "%d", &port);
break;
case 'c':
sscanf(optarg, "%d", &num_addrs);
break;
case 'd':
debug =3D 1;
break;
case '?':
return usage();
}
}

if (num_addrs < 1)
return usage();

if (!argv[optind])
return usage();

start_addr.a4.sin_port =3D htons(port);
if (inet_pton(PF_INET, argv[optind], &start_addr.a4.sin_addr))
start_addr.a46.ss_family =3D PF_INET;
else if (inet_pton(PF_INET6, argv[optind], &start_addr.a6.sin6_addr.s6_add=
r))
start_addr.a46.ss_family =3D PF_INET6;
else
return usage();

return send_packets(&start_addr, num_addrs, count, msg_sz);
}

/****************** ip6_route_kbench_mod.c ******************/

/* We can't just use "get_cycles()" as on some platforms, such
 * as sparc64, that gives system cycles rather than cpu clock
 * cycles.
 */

static inline unsigned long long get_tick(void)
{
unsigned long long t;

__asm__ __volatile__("rd %%tick, %0" : "=r" (t));
return t;
}
static inline unsigned long long get_tick(void)
{
unsigned long long t;

rdtscll(t);

return t;
}
static inline unsigned long long get_tick(void)
{
return get_cycles();
}

static int flow_oif = DEFAULT_OIF;
static int flow_iif = DEFAULT_IIF;
static u32 flow_mark = DEFAULT_MARK;
static struct in6_addr flow_dst_ip_addr;
static struct in6_addr flow_src_ip_addr;
static int flow_tos = DEFAULT_TOS;

static char dst_string[64];
static char src_string[64];

module_param_string(dst, dst_string, sizeof(dst_string), 0);
module_param_string(src, src_string, sizeof(src_string), 0);

static int __init flow_setup(void)
{
if (dst_string[0] &&
    !in6_pton(dst_string, -1, &flow_dst_ip_addr.s6_addr[0], -1, NULL)) {
pr_info("cannot parse \"%s\"\n", dst_string);
return -1;
}

if (src_string[0] &&
    !in6_pton(src_string, -1, &flow_src_ip_addr.s6_addr[0], -1, NULL)) {
pr_info("cannot parse \"%s\"\n", dst_string);
return -1;
}

return 0;
}

module_param_named(oif, flow_oif, int, 0);
module_param_named(iif, flow_iif, int, 0);
module_param_named(mark, flow_mark, uint, 0);
module_param_named(tos, flow_tos, int, 0);

static int warmup_count = DEFAULT_WARMUP_COUNT;
module_param_named(count, warmup_count, int, 0);

static void flow_init(struct flowi6 *fl6)
{
memset(fl6, 0, sizeof(*fl6));
fl6->flowi6_proto = IPPROTO_ICMPV6;
fl6->flowi6_oif = flow_oif;
fl6->flowi6_iif = flow_iif;
fl6->flowi6_mark = flow_mark;
fl6->flowi6_tos = flow_tos;
fl6->daddr = flow_dst_ip_addr;
fl6->saddr = flow_src_ip_addr;
}

static struct sk_buff * fake_skb_get(void)
{
struct ipv6hdr *hdr;
struct sk_buff *skb;

skb = alloc_skb(4096, GFP_KERNEL);
if (!skb) {
pr_info("Cannot alloc SKB for test\n");
return NULL;
}
skb->dev = __dev_get_by_index(&init_net, flow_iif);
if (skb->dev == NULL) {
pr_info("Input device (%d) does not exist\n", flow_iif);
goto err;
}

skb_reset_mac_header(skb);
skb_reset_network_header(skb);
skb_reserve(skb, MAX_HEADER + sizeof(struct ipv6hdr));
hdr = ipv6_hdr(skb);

hdr->priority = 0;
hdr->version = 6;
memset(hdr->flow_lbl, 0, sizeof(hdr->flow_lbl));
hdr->payload_len = htons(sizeof(struct icmp6hdr));
hdr->nexthdr = IPPROTO_ICMPV6;
hdr->saddr = flow_src_ip_addr;
hdr->daddr = flow_dst_ip_addr;
skb->protocol = htons(ETH_P_IPV6);
skb->mark = flow_mark;

return skb;
err:
kfree_skb(skb);
return NULL;
}

static void do_full_output_lookup_bench(void)
{
unsigned long long t1, t2, tdiff;
struct rt6_info *rt;
struct flowi6 fl6;
int i;

rt = NULL;

for (i = 0; i < warmup_count; i++) {
flow_init(&fl6);

rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl6);
if (IS_ERR(rt))
break;
ip6_rt_put(rt);
}
if (IS_ERR(rt)) {
pr_info("ip_route_output_key: err=%ld\n", PTR_ERR(rt));
return;
}

flow_init(&fl6);

t1 = get_tick();
rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl6);
t2 = get_tick();
if (!IS_ERR(rt))
ip6_rt_put(rt);

tdiff = t2 - t1;
pr_info("ip6_route_output tdiff: %llu\n", tdiff);
}

static void do_full_input_lookup_bench(void)
{
unsigned long long t1, t2, tdiff;
struct sk_buff *skb;
struct rt6_info *rt;
int err, i;

skb = fake_skb_get();
if (skb == NULL)
goto out_free;

err = 0;
local_bh_disable();
for (i = 0; i < warmup_count; i++) {
ip6_route_input(skb);
rt = (struct rt6_info *)skb_dst(skb);
err = (!rt || rt == init_net.ipv6.ip6_null_entry);
skb_dst_drop(skb);
if (err)
break;
}
local_bh_enable();

if (err) {
pr_info("Input route lookup fails\n");
goto out_free;
}

local_bh_disable();
t1 = get_tick();
ip6_route_input(skb);
t2 = get_tick();
local_bh_enable();

rt = (struct rt6_info *)skb_dst(skb);
err = (!rt || rt == init_net.ipv6.ip6_null_entry);
skb_dst_drop(skb);
if (err) {
pr_info("Input route lookup fails\n");
goto out_free;
}

tdiff = t2 - t1;
pr_info("ip6_route_input tdiff: %llu\n", tdiff);

out_free:
kfree_skb(skb);
}

static void do_full_lookup_bench(void)
{
if (!flow_iif)
do_full_output_lookup_bench();
else
do_full_input_lookup_bench();
}

static void do_bench(void)
{
do_full_lookup_bench();
do_full_lookup_bench();
do_full_lookup_bench();
do_full_lookup_bench();
}

static int __init kbench_init(void)
{
if (flow_setup())
return -EINVAL;

pr_info("flow [IIF(%d),OIF(%d),MARK(0x%08x),D("IP6_FMT"),"
"S("IP6_FMT"),TOS(0x%02x)]\n",
flow_iif, flow_oif, flow_mark,
IP6_PRT(flow_dst_ip_addr),
IP6_PRT(flow_src_ip_addr),
flow_tos);

if (!cpu_has_tsc) {
pr_err("X86 TSC is required, but is unavailable.\n");
return -EINVAL;
}

pr_info("sizeof(struct rt6_info)==%zu\n", sizeof(struct rt6_info));

do_bench();

return -ENODEV;
}

static void __exit kbench_exit(void)
{
}

module_init(kbench_init);
module_exit(kbench_exit);
MODULE_LICENSE("GPL");
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: Avoid redoing fib6_lookup() with reachable = 0 by saving fn
Martin KaFai Lau [Mon, 20 Oct 2014 20:42:45 +0000 (13:42 -0700)]
ipv6: Avoid redoing fib6_lookup() with reachable = 0 by saving fn

This patch save the fn before doing rt6_backtrack.
Hence, without redo-ing the fib6_lookup(), saved_fn can be used
to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off.

Some minor changes I think make sense to review as a single patch:
* Remove the 'out:' goto label.
* Remove the 'reachable' variable. Only use the 'strict' variable instead.

After this patch, "failing ip6_ins_rt()" should be the only case that
requires a redo of fib6_lookup().

Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: Avoid redoing fib6_lookup() for RTF_CACHE hit case
Martin KaFai Lau [Mon, 20 Oct 2014 20:42:44 +0000 (13:42 -0700)]
ipv6: Avoid redoing fib6_lookup() for RTF_CACHE hit case

When there is a RTF_CACHE hit, no need to redo fib6_lookup()
with reachable=0.

Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: Remove BACKTRACK macro
Martin KaFai Lau [Mon, 20 Oct 2014 20:42:43 +0000 (13:42 -0700)]
ipv6: Remove BACKTRACK macro

It is the prep work to reduce the number of calls to fib6_lookup().

The BACKTRACK macro could be hard-to-read and error-prone due to
its side effects (mainly goto).

This patch is to:
1. Replace BACKTRACK macro with a function (fib6_backtrack) with the following
   return values:
   * If it is backtrack-able, returns next fn for retry.
   * If it reaches the root, returns NULL.
2. The caller needs to decide if a backtrack is needed (by testing
   rt == net->ipv6.ip6_null_entry).
3. Rename the goto labels in ip6_pol_route() to make the next few
   patches easier to read.

Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Remove trailing whitespace in tcp.h icmp.c syncookies.c
Kenjiro Nakayama [Mon, 20 Oct 2014 09:15:50 +0000 (18:15 +0900)]
net: Remove trailing whitespace in tcp.h icmp.c syncookies.c

Remove trailing whitespace in tcp.h icmp.c syncookies.c

Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoi40e: Bump version
Catherine Sullivan [Sat, 13 Sep 2014 07:40:48 +0000 (07:40 +0000)]
i40e: Bump version

Bump i40e version to 1.0.21.

Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-By: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Moving variable declaration out of the loops
Akeem G Abodunrin [Fri, 17 Oct 2014 03:14:39 +0000 (03:14 +0000)]
i40e: Moving variable declaration out of the loops

Move the three variables out of the loop, so it only declares once.

Change-ID: I436913777c7da3c16dc0031b59e3ffa61de74718
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Add 10GBaseT support
Mitch Williams [Sat, 13 Sep 2014 07:40:47 +0000 (07:40 +0000)]
i40e: Add 10GBaseT support

Add driver support for 10GBaseT device.

Change-ID: I4be6ed847ac0bddd220b9878a95c523b32038174
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: process link events when setting up switch
Mitch Williams [Sat, 13 Sep 2014 07:40:46 +0000 (07:40 +0000)]
i40e: process link events when setting up switch

Add code to handle link events when updating the PF switch. This
allows link information to be properly provided to VFs in all cases.

Change-ID: If314c95f3d39259ef4c40a4a3b823381e28fb24f
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Fix a bug where Rx would stop after some time
Anjali Singhai Jain [Sat, 13 Sep 2014 07:40:45 +0000 (07:40 +0000)]
i40e: Fix a bug where Rx would stop after some time

Move the setting of flow control because this should be done at a pf level not
a vsi level. Also add a sleep and restart an to fix a bug where Rx would stop
after some stress.

Change-ID: I9a93d8c2ff27c39339eb00bc4ec1225e43900be0
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e/i40evf: Use usleep_range() instead of udelay()
Neerav Parikh [Sat, 13 Sep 2014 07:40:44 +0000 (07:40 +0000)]
i40e/i40evf: Use usleep_range() instead of udelay()

As per the Documentation/timers/timers-howto.txt it is preferred to use
usleep_range() instead of udelay() if the delay value is > 10us in
non-atomic contexts.
So, replacing all the instances of udelay() with 10 or greater than 10
micro seconds delay in the driver and using usleep_range() instead.

Change-ID: Iaa2ab499a4c26f6005e5d86cc421407ef9de16c7
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e/i40evf: Fix whitespace indentation
Jeff Kirsher [Sat, 13 Sep 2014 07:40:43 +0000 (07:40 +0000)]
i40e/i40evf: Fix whitespace indentation

This is one small step in making the indentation more consistent.  If
we truly want to align values, then use tabs rather than spaces.

Change-ID: I12368bc77a52f296d1843fdcb67201a7d7cd4749
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
10 years agoi40e: enable LSE poke and simplify link state
Jesse Brandeburg [Sat, 13 Sep 2014 07:40:42 +0000 (07:40 +0000)]
i40e: enable LSE poke and simplify link state

The driver can do a simpler job of managing link state by simply
using the admin queue receive event for link events as a doorbell
that tells the driver to update link state.

Additionally, add a workaround will help make sure the link state in the
hardware is consistent with the link state the driver is reporting
by refreshing the link state every service task interval.

Change-ID: Ib95b5b7b8cc016e97d8009f6363c9f9eed301444
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: mask phy events
Jesse Brandeburg [Sat, 13 Sep 2014 07:40:41 +0000 (07:40 +0000)]
i40e: mask phy events

Tell the firmware what kind of link related events the driver is
interested in.  In this case, just link up/down and qualified module
events are the ones the driver really cares about.

Change-ID: If132c812c340c8e1927c2caf6d55185296b66201
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agohyperv: Fix the total_data_buflen in send path
Haiyang Zhang [Wed, 22 Oct 2014 20:47:18 +0000 (13:47 -0700)]
hyperv: Fix the total_data_buflen in send path

total_data_buflen is used by netvsc_send() to decide if a packet can be put
into send buffer. It should also include the size of RNDIS message before the
Ethernet frame. Otherwise, a messge with total size bigger than send_section_size
may be copied into the send buffer, and cause data corruption.

[Request to include this patch to the Stable branches]

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'amd-xgbe'
David S. Miller [Wed, 22 Oct 2014 21:50:39 +0000 (17:50 -0400)]
Merge branch 'amd-xgbe'

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver fixes 2014-10-22

The following series of patches includes fixes to the driver.

- Properly handle feature changes via ethtool by using correctly sized
  variables
- Perform proper napi packet counting and budget checking

This patch series is based on net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoamd-xgbe: Fix napi Rx budget accounting
Lendacky, Thomas [Wed, 22 Oct 2014 16:26:17 +0000 (11:26 -0500)]
amd-xgbe: Fix napi Rx budget accounting

Currently the amd-xgbe driver increments the packets processed counter
each time a descriptor is processed.  Since a packet can be represented
by more than one descriptor incrementing the counter in this way is not
appropriate.  Also, since multiple descriptors cause the budget check
to be short circuited, sometimes the returned value from the poll
function would be larger than the budget value resulting in a WARN_ONCE
being triggered.

Update the polling logic to properly account for the number of packets
processed and exit when the budget value is reached.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoamd-xgbe: Properly handle feature changes via ethtool
Lendacky, Thomas [Wed, 22 Oct 2014 16:26:11 +0000 (11:26 -0500)]
amd-xgbe: Properly handle feature changes via ethtool

The ndo_set_features callback function was improperly using an unsigned
int to save the current feature value for features such as NETIF_F_RXCSUM.
Since that feature is in the upper 32 bits of a 64 bit variable the
result was always 0 making it not possible to actually turn off the
hardware RX checksum support.  Change the unsigned int type to the
netdev_features_t type in order to properly capture the current value
and perform the proper operation.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: fec: ptp: fix NULL pointer dereference if ptp_clock is not set
Philipp Zabel [Wed, 22 Oct 2014 14:34:35 +0000 (16:34 +0200)]
net: fec: ptp: fix NULL pointer dereference if ptp_clock is not set

Since commit 278d24047891 (net: fec: ptp: Enable PPS output based on ptp clock)
fec_enet_interrupt calls fec_ptp_check_pps_event unconditionally, which calls
into ptp_clock_event. If fep->ptp_clock is NULL, ptp_clock_event tries to
dereference the NULL pointer.
Since on i.MX53 fep->bufdesc_ex is not set, fec_ptp_init is never called,
and fep->ptp_clock is NULL, which reliably causes a kernel panic.

This patch adds a check for fep->ptp_clock == NULL in fec_enet_interrupt.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: fix saving TX flow hash in sock for outgoing connections
Sathya Perla [Wed, 22 Oct 2014 16:12:01 +0000 (21:42 +0530)]
net: fix saving TX flow hash in sock for outgoing connections

The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
to set skb->hash which is used to spread flows across multiple TXQs.

But, the above routines are invoked before the source port of the connection
is created. Because of this all outgoing connections that just differ in the
source port get hashed into the same TXQ.

This patch fixes this problem for IPv4/6 by invoking the the above routines
after the source port is available for the socket.

Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit")

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoxfrm6: fix a potential use after free in xfrm6_policy.c
Li RongQing [Wed, 22 Oct 2014 09:09:53 +0000 (17:09 +0800)]
xfrm6: fix a potential use after free in xfrm6_policy.c

pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
oboslete, so recompute the nd and exthdr

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: fs_enet: set back promiscuity mode after restart
LEROY Christophe [Wed, 22 Oct 2014 07:05:47 +0000 (09:05 +0200)]
net: fs_enet: set back promiscuity mode after restart

After interface restart (eg: after link disconnection/reconnection), the bridge
function doesn't work anymore. This is due to the promiscuous mode being cleared
by the restart.

The mac-fcc already includes code to set the promiscuous mode back during the restart.
This patch adds the same handling to mac-fec and mac-scc.

Tested with bridge function on MPC885 with FEC.

Reported-by: Germain Montoies <germain.montoies@c-s.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: tso: fix unaligned access to crafted TCP header in helper API
Karl Beldan [Tue, 21 Oct 2014 14:06:05 +0000 (16:06 +0200)]
net: tso: fix unaligned access to crafted TCP header in helper API

The crafted header start address is from a driver supplied buffer, which
one can reasonably expect to be aligned on a 4-bytes boundary.
However ATM the TSO helper API is only used by ethernet drivers and
the tcp header will then be aligned to a 2-bytes only boundary from the
header start address.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosfc: remove incorrect EFX_BUG_ON_PARANOID check
Jon Cooper [Tue, 21 Oct 2014 13:50:29 +0000 (14:50 +0100)]
sfc: remove incorrect EFX_BUG_ON_PARANOID check

write_count and insert_count can wrap around, making > check invalid.

Fixes: 70b33fb0ddec827cbbd14cdc664fc27b2ef4a6b6 ("sfc: add support for
 skb->xmit_more").

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sched: initialize bstats syncp
Sabrina Dubroca [Tue, 21 Oct 2014 09:23:30 +0000 (11:23 +0200)]
net: sched: initialize bstats syncp

Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp.

Fixes: 22e0f8b9322c "net: sched: make bstats per cpu and estimator RCU safe"
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobpf: fix bug in eBPF verifier
Alexei Starovoitov [Mon, 20 Oct 2014 21:54:57 +0000 (14:54 -0700)]
bpf: fix bug in eBPF verifier

while comparing for verifier state equivalency the comparison
was missing a check for uninitialized register.
Make sure it does so and add a testcase.

Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier")
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonetlink: Re-add locking to netlink_lookup() and seq walker
Thomas Graf [Tue, 21 Oct 2014 20:05:38 +0000 (22:05 +0200)]
netlink: Re-add locking to netlink_lookup() and seq walker

The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: fix lockdep warning when intra-node messages are delivered
Ying Xue [Mon, 20 Oct 2014 06:46:35 +0000 (14:46 +0800)]
tipc: fix lockdep warning when intra-node messages are delivered

When running tipcTC&tipcTS test suite, below lockdep unsafe locking
scenario is reported:

[ 1109.997854]
[ 1109.997988] =================================
[ 1109.998290] [ INFO: inconsistent lock state ]
[ 1109.998575] 3.17.0-rc1+ #113 Not tainted
[ 1109.998762] ---------------------------------
[ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 1109.998762]  (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] {SOFTIRQ-ON-W} state was registered at:
[ 1109.998762]   [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80
[ 1109.998762]   [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]   [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]   [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]   [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc]
[ 1109.998762]   [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc]
[ 1109.998762]   [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc]
[ 1109.998762]   [<ffffffff817676ee>] SYSC_connect+0xae/0xc0
[ 1109.998762]   [<ffffffff81767b7e>] SyS_connect+0xe/0x10
[ 1109.998762]   [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200
[ 1109.998762]   [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f
[ 1109.998762] irq event stamp: 241060
[ 1109.998762] hardirqs last  enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0
[ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0
[ 1109.998762] softirqs last  enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50
[ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]
[ 1109.998762] other info that might help us debug this:
[ 1109.998762]  Possible unsafe locking scenario:
[ 1109.998762]
[ 1109.998762]        CPU0
[ 1109.998762]        ----
[ 1109.998762]   lock(slock-AF_TIPC);
[ 1109.998762]   <Interrupt>
[ 1109.998762]     lock(slock-AF_TIPC);
[ 1109.998762]
[ 1109.998762]  *** DEADLOCK ***
[ 1109.998762]
[ 1109.998762] 2 locks held by swapper/7/0:
[ 1109.998762]  #0:  (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70
[ 1109.998762]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]
[ 1109.998762] stack backtrace:
[ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113
[ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1109.998762]  ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007
[ 1109.998762]  ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001
[ 1109.998762]  ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000
[ 1109.998762] Call Trace:
[ 1109.998762]  <IRQ>  [<ffffffff81a209eb>] dump_stack+0x4e/0x68
[ 1109.998762]  [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202
[ 1109.998762]  [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50
[ 1109.998762]  [<ffffffff810a406c>] mark_lock+0x28c/0x2f0
[ 1109.998762]  [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0
[ 1109.998762]  [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80
[ 1109.998762]  [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10
[ 1109.998762]  [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0
[ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762]  [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0
[ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762]  [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0
[ 1109.998762]  [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762]  [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc]
[ 1109.998762]  [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc]
[ 1109.998762]  [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]  [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70
[ 1109.998762]  [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70
[ 1109.998762]  [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0
[ 1109.998762]  [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70
[ 1109.998762]  [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0
[ 1109.998762]  [<ffffffff81785518>] napi_gro_receive+0xd8/0x240
[ 1109.998762]  [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530
[ 1109.998762]  [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0
[ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762]  [<ffffffff817842b1>] net_rx_action+0x141/0x310
[ 1109.998762]  [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150
[ 1109.998762]  [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0
[ 1109.998762]  [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]  [<ffffffff81a30d07>] do_IRQ+0x67/0x110
[ 1109.998762]  [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f
[ 1109.998762]  <EOI>  [<ffffffff8100d2b7>] ? default_idle+0x37/0x250
[ 1109.998762]  [<ffffffff8100d2b5>] ? default_idle+0x35/0x250
[ 1109.998762]  [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20
[ 1109.998762]  [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0
[ 1109.998762]  [<ffffffff81034c78>] start_secondary+0x188/0x1f0

When intra-node messages are delivered from one process to another
process, tipc_link_xmit() doesn't disable BH before it directly calls
tipc_sk_rcv() on process context to forward messages to destination
socket. Meanwhile, if messages delivered by remote node arrive at the
node and their destinations are also the same socket, tipc_sk_rcv()
running on process context might be preempted by tipc_sk_rcv() running
BH context. As a result, the latter cannot obtain the socket lock as
the lock was obtained by the former, however, the former has no chance
to be run as the latter is owning the CPU now, so headlock happens. To
avoid it, BH should be always disabled in tipc_sk_rcv().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: fix a potential deadlock
Ying Xue [Mon, 20 Oct 2014 06:44:25 +0000 (14:44 +0800)]
tipc: fix a potential deadlock

Locking dependency detected below possible unsafe locking scenario:

           CPU0                          CPU1
T0:  tipc_named_rcv()                tipc_rcv()
T1:  [grab nametble write lock]*     [grab node lock]*
T2:  tipc_update_nametbl()           tipc_node_link_up()
T3:  tipc_nodesub_subscribe()        tipc_nametbl_publish()
T4:  [grab node lock]*               [grab nametble write lock]*

The opposite order of holding nametbl write lock and node lock on
above two different paths may result in a deadlock. If we move the
the updating of the name table after link state named out of node
lock, the reverse order of holding locks will be eliminated, and
as a result, the deadlock risk.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'enic'
David S. Miller [Tue, 21 Oct 2014 19:24:30 +0000 (15:24 -0400)]
Merge branch 'enic'

Govindarajulu Varadarajan says:

====================
enic: Bug fixes

This series fixes the following problem.

Please apply this to net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoenic: Do not call napi_disable when preemption is disabled.
Govindarajulu Varadarajan [Sun, 19 Oct 2014 08:50:28 +0000 (14:20 +0530)]
enic: Do not call napi_disable when preemption is disabled.

In enic_stop, we disable preemption using local_bh_disable(). We disable
preemption to wait for busy_poll to finish.

napi_disable should not be called here as it might sleep.

Moving napi_disable() call out side of local_bh_disable.

BUG: sleeping function called from invalid context at include/linux/netdevice.h:477
in_atomic(): 1, irqs_disabled(): 0, pid: 443, name: ifconfig
INFO: lockdep is turned off.
Preemption disabled at:[<ffffffffa029c5c4>] enic_rfs_flw_tbl_free+0x34/0xd0 [enic]

CPU: 31 PID: 443 Comm: ifconfig Not tainted 3.17.0-netnext-05504-g59f35b8 #268
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 ffff8800dac10000 ffff88020b8dfcb8 ffffffff8148a57c 0000000000000000
 ffff88020b8dfcd0 ffffffff8107e253 ffff8800dac12a40 ffff88020b8dfd10
 ffffffffa029305b ffff88020b8dfd48 ffff8800dac10000 ffff88020b8dfd48
Call Trace:
 [<ffffffff8148a57c>] dump_stack+0x4e/0x7a
 [<ffffffff8107e253>] __might_sleep+0x123/0x1a0
 [<ffffffffa029305b>] enic_stop+0xdb/0x4d0 [enic]
 [<ffffffff8138ed7d>] __dev_close_many+0x9d/0xf0
 [<ffffffff8138ef81>] __dev_close+0x31/0x50
 [<ffffffff813974a8>] __dev_change_flags+0x98/0x160
 [<ffffffff81397594>] dev_change_flags+0x24/0x60
 [<ffffffff814085fd>] devinet_ioctl+0x63d/0x710
 [<ffffffff81139c16>] ? might_fault+0x56/0xc0
 [<ffffffff81409ef5>] inet_ioctl+0x65/0x90
 [<ffffffff813768e0>] sock_do_ioctl+0x20/0x50
 [<ffffffff81376ebb>] sock_ioctl+0x20b/0x2e0
 [<ffffffff81197250>] do_vfs_ioctl+0x2e0/0x500
 [<ffffffff81492619>] ? sysret_check+0x22/0x5d
 [<ffffffff81285f23>] ? __this_cpu_preempt_check+0x13/0x20
 [<ffffffff8109fe19>] ? trace_hardirqs_on_caller+0x119/0x270
 [<ffffffff811974ac>] SyS_ioctl+0x3c/0x80
 [<ffffffff814925ed>] system_call_fastpath+0x1a/0x1f

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoenic: fix possible deadlock in enic_stop/ enic_rfs_flw_tbl_free
Govindarajulu Varadarajan [Sun, 19 Oct 2014 08:50:27 +0000 (14:20 +0530)]
enic: fix possible deadlock in enic_stop/ enic_rfs_flw_tbl_free

The following warning is shown when spinlock debug is enabled.

This occurs when enic_flow_may_expire timer function is running and
enic_stop is called on same CPU.

Fix this by using spink_lock_bh().

=================================
[ INFO: inconsistent lock state ]
3.17.0-netnext-05504-g59f35b8 #268 Not tainted
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
ifconfig/443 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&(&enic->rfs_h.lock)->rlock){+.?...}, at:
enic_rfs_flw_tbl_free+0x34/0xd0 [enic]
{IN-SOFTIRQ-W} state was registered at:
  [<ffffffff810a25af>] __lock_acquire+0x83f/0x21c0
  [<ffffffff810a45f2>] lock_acquire+0xa2/0xd0
  [<ffffffff814913fc>] _raw_spin_lock+0x3c/0x80
  [<ffffffffa029c3d5>] enic_flow_may_expire+0x25/0x130[enic]
  [<ffffffff810bcd07>] call_timer_fn+0x77/0x100
  [<ffffffff810bd8e3>] run_timer_softirq+0x1e3/0x270
  [<ffffffff8105f9ae>] __do_softirq+0x14e/0x280
  [<ffffffff8105fdae>] irq_exit+0x8e/0xb0
  [<ffffffff8103da0f>] smp_apic_timer_interrupt+0x3f/0x50
  [<ffffffff81493742>] apic_timer_interrupt+0x72/0x80
  [<ffffffff81018143>] default_idle+0x13/0x20
  [<ffffffff81018a6a>] arch_cpu_idle+0xa/0x10
  [<ffffffff81097676>] cpu_startup_entry+0x2c6/0x330
  [<ffffffff8103b7ad>] start_secondary+0x21d/0x290
irq event stamp: 2997
hardirqs last  enabled at (2997): [<ffffffff81491865>] _raw_spin_unlock_irqrestore+0x65/0x90
hardirqs last disabled at (2996): [<ffffffff814915e6>] _raw_spin_lock_irqsave+0x26/0x90
softirqs last  enabled at (2968): [<ffffffff813b57a3>] dev_deactivate_many+0x213/0x260
softirqs last disabled at (2966): [<ffffffff813b5783>] dev_deactivate_many+0x1f3/0x260

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&enic->rfs_h.lock)->rlock);
  <Interrupt>
    lock(&(&enic->rfs_h.lock)->rlock);

 *** DEADLOCK ***

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>