Jiri Benc [Wed, 19 Oct 2016 09:26:36 +0000 (11:26 +0200)]
openvswitch: remove unused functions
ovs_vport_deferred_free is not used anywhere. It's the only caller of
free_vport_rcu thus this one can be removed, too.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Tue, 18 Oct 2016 17:51:19 +0000 (19:51 +0200)]
bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers
A BPF program is required to check the return register of a
map_elem_lookup() call before accessing memory. The verifier keeps
track of this by converting the type of the result register from
PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional
jump ensures safety. This check is currently exclusively performed
for the result register 0.
In the event the compiler reorders instructions, BPF_MOV64_REG
instructions may be moved before the conditional jump which causes
them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the
verifier objects when the register is accessed:
0: (b7) r1 = 10
1: (7b) *(u64 *)(r10 -8) = r1
2: (bf) r2 = r10
3: (07) r2 += -8
4: (18) r1 = 0x59c00000
6: (85) call 1
7: (bf) r4 = r0
8: (15) if r0 == 0x0 goto pc+1
R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp
9: (7a) *(u64 *)(r4 +0) = 0
R4 invalid mem access 'map_value_or_null'
This commit extends the verifier to keep track of all identical
PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by
assigning them an ID and then marking them all when the conditional
jump is observed.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Wed, 19 Oct 2016 09:24:57 +0000 (11:24 +0200)]
net: fs_enet: Use net_device_stats from struct net_device
Instead of using a private copy of struct net_device_stats in struct
fs_enet_private, use stats from struct net_device. Also remove the now
unnecessary .ndo_get_stats function.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 18 Oct 2016 15:54:50 +0000 (15:54 +0000)]
qed: Remove useless set memory to zero use memset()
The memory return by kzalloc() has already be set to zero, so
remove useless memset(0).
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 18 Oct 2016 15:53:37 +0000 (15:53 +0000)]
net: dsa: mv88e6xxx: fix non static symbol warning
Fixes the following sparse warning:
drivers/net/dsa/mv88e6xxx/chip.c:2866:5: warning:
symbol 'mv88e6xxx_g1_set_switch_mac' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Tue, 18 Oct 2016 03:41:48 +0000 (11:41 +0800)]
r8152: add new products of Lenovo
Add the following four products of Lenovo and sort the order of the list.
VID PID
0x17ef 0x3062
0x17ef 0x3069
0x17ef 0x720c
0x17ef 0x7214
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Tue, 18 Oct 2016 00:57:50 +0000 (08:57 +0800)]
net: vlan: Use sizeof instead of literal number
Use sizeof variable instead of literal number to enhance the readability.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Oct 2016 18:14:22 +0000 (14:14 -0400)]
Merge branch 'smc91x-dt'
Robert Jarzmik says:
====================
support smc91x on mainstone and devicetree
This series aims at bringing support to mainstone board on a
device-tree based build, as what is already in place for legacy
mainstone.
The bulk of the mainstone "specific" behavior is that a u16 write
doesn't work on a address of the form 4*n + 2, while it works on 4*n.
The legacy workaround was in SMC_outw(), with calls to
machine_is_mainstone(). These calls don't work with a pxa27x-dt
machine type, which is used when a generic device-tree pxa27x machine
is used to boot the mainstone board.
Therefore, this series enables the smc91c111 adapter of the mainstone
board to work on a device-tree build, exaclty as it's been working for
years with the legacy arch/arm/mach-pxa/mainstone.c definition.
As a sum up, this extends an existing mechanism to device-tree based
pxa platforms.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Mon, 17 Oct 2016 19:45:32 +0000 (21:45 +0200)]
net: smsc91x: add u16 workaround for pxa platforms
Add a workaround for mainstone, idp and stargate2 boards, for u16 writes
which must be aligned on 32 bits addresses.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Mon, 17 Oct 2016 19:45:31 +0000 (21:45 +0200)]
net: smc91x: take into account half-word workaround
For device-tree builds, platforms such as mainstone, idp and stargate2
must have their u16 writes all aligned on 32 bit boundaries. This is
already enabled in platform data builds, and this patch adds it to
device-tree builds.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Mon, 17 Oct 2016 19:45:30 +0000 (21:45 +0200)]
net: smc91x: isolate u16 writes alignment workaround
Writes to u16 has a special handling on 3 PXA platforms, where the
hardware wiring forces these writes to be u32 aligned.
This patch isolates this handling for PXA platforms as before, but
enables this "workaround" to be set up dynamically, which will be the
case in device-tree build types.
This patch was tested on 2 PXA platforms : mainstone, which relies on
the workaround, and lubbock, which doesn't.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Mon, 17 Oct 2016 19:45:29 +0000 (21:45 +0200)]
ARM: pxa: enhance smc91x platform data
Instead of having the smc91x driver relying on machine_is_*() calls,
provide this data through platform data, ie. idp, mainstone and
stargate.
This way, the driver doesn't need anymore machine_is_*() calls, which
wouldn't work anymore with a device-tree build.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bert Kenward [Tue, 18 Oct 2016 16:47:45 +0000 (17:47 +0100)]
ethernet/sfc: use core min/max MTU checking
Fixes:
61e84623ace3 ("net: centralize net_device min/max MTU checking")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Oct 2016 15:56:31 +0000 (11:56 -0400)]
Merge branch 'phy-led-triggers'
Zach Brown says:
====================
Add support for led triggers on phy link state change
Fix skge driver that declared enum contants that conflicted with enum
constants in linux/leds.h
Create function that encapsulates actions taken during the adjust phy link step
of phy state changes.
Create function that provides list of speeds currently supported by the phy.
Add support for led triggers on phy link state changes by adding
a config option. When set the config option will create a set of led triggers
for each phy device. Users can use the led triggers to represent link state
changes on the phy.
v2:
* New patch that creates phy_adjust_link function to encapsulate actions taken
when adjusting phy link during phy state changes
* led trigger speed strings changed to match existing phy speed strings
* New function that maps speeds to led triggers
* Replace magic constants with definitions when declaring trigger name
buffer and number of triggers.
v3:
* Changed LED_ON to LED_REG_ON in skge driver to avoid possible future
conflict and improve consistency.
* Dropped rtl8712 patch that was accepted separately.
v4:
* tweaked commit message
v5
* Changed commit message to explain relationship between the new triggers and
leds driven by phys.
* Added new patch that creates phy_supported_speeds function.
* Moved phy_leds_triggers_register and phy_leds_triggers_unregister to
phy_attach and phy_detach respectively. This change is so the
phydev->supported field will be filled by the time the triggers are
registered.
* Changed hardcoded list of triggers to dynamic list determined by speeds
return by phy_supported_speeds.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zach Brown [Mon, 17 Oct 2016 15:49:55 +0000 (10:49 -0500)]
net: phy: leds: add support for led triggers on phy link state change
Create an option CONFIG_LED_TRIGGER_PHY (default n), which will create a
set of led triggers for each instantiated PHY device. There is one LED
trigger per link-speed, per-phy.
The triggers are registered during phy_attach and unregistered during
phy_detach.
This allows for a user to configure their system to allow a set of LEDs
not controlled by the phy to represent link state changes on the phy.
LEDS controlled by the phy are unaffected.
For example, we have a board where some of the leds in the
RJ45 socket are controlled by the phy, but others are not. Using the
triggers provided by this patch the leds not controlled by the phy can
be configured to show the current speed of the ethernet connection. The
leds controlled by the phy are unaffected.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zach Brown [Mon, 17 Oct 2016 15:49:54 +0000 (10:49 -0500)]
net: phy: Create phy_supported_speeds function which lists speeds currently supported by a phydevice
phy_supported_speeds provides a means to get a list of all the speeds a
phy device currently supports.
Signed-off-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zach Brown [Mon, 17 Oct 2016 15:49:53 +0000 (10:49 -0500)]
net: phy: Encapsulate actions performed during link state changes into function phy_adjust_link
During phy state machine state transitions some set of actions should
occur whenever the link state changes. These actions should be
encapsulated into a single function
This patch adds the phy_adjust_link function, which is called whenever
phydev->adjust_link would have been called before. Actions that should
occur whenever the phy link is adjusted can now be added to the
phy_adjust_link function.
Signed-off-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zach Brown [Mon, 17 Oct 2016 15:49:52 +0000 (10:49 -0500)]
skge: Rename LED_OFF and LED_ON in marvel skge driver to avoid conflicts with leds namespace
Adding led support for phy causes namespace conflicts for some
phy drivers.
The marvel skge driver declared an enum for representing the states of
Link LED Register. The enum contained constant LED_OFF which conflicted
with declartation found in linux/leds.h.
LED_OFF changed to LED_REG_OFF
Also changed LED_ON to LED_REG_ON to avoid possible future conflict and
for consistency.
Signed-off-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Oct 2016 15:45:01 +0000 (11:45 -0400)]
Merge branch 'netdev-adjacency'
David Ahern says:
====================
net: Fix netdev adjacency tracking
The netdev adjacency tracking is failing to create proper dependencies
for some topologies. For example this topology
+--------+
| myvrf |
+--------+
| |
| +---------+
| | macvlan |
| +---------+
| |
+----------+
| bridge |
+----------+
|
+--------+
| bond1 |
+--------+
|
+--------+
| eth3 |
+--------+
hits 1 of 2 problems depending on the order of enslavement. The base set of
commands for both cases:
ip link add bond1 type bond
ip link set bond1 up
ip link set eth3 down
ip link set eth3 master bond1
ip link set eth3 up
ip link add bridge type bridge
ip link set bridge up
ip link add macvlan link bridge type macvlan
ip link set macvlan up
ip link add myvrf type vrf table 1234
ip link set myvrf up
ip link set bridge master myvrf
Case 1 enslave macvlan to the vrf before enslaving the bond to the bridge:
ip link set macvlan master myvrf
ip link set bond1 master bridge
Attempts to delete the VRF:
ip link delete myvrf
trigger the BUG in __netdev_adjacent_dev_remove:
[ 587.405260] tried to remove device eth3 from myvrf
[ 587.407269] ------------[ cut here ]------------
[ 587.408918] kernel BUG at /home/dsa/kernel.git/net/core/dev.c:5661!
[ 587.411113] invalid opcode: 0000 [#1] SMP
[ 587.412454] Modules linked in: macvlan bridge stp llc bonding vrf
[ 587.414765] CPU: 0 PID: 726 Comm: ip Not tainted 4.8.0+ #109
[ 587.416766] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 587.420241] task:
ffff88013ab6eec0 task.stack:
ffffc90000628000
[ 587.422163] RIP: 0010:[<
ffffffff813cef03>] [<
ffffffff813cef03>] __netdev_adjacent_dev_remove+0x40/0x12c
...
[ 587.446053] Call Trace:
[ 587.446424] [<
ffffffff813d1542>] __netdev_adjacent_dev_unlink+0x20/0x3c
[ 587.447390] [<
ffffffff813d16a3>] netdev_upper_dev_unlink+0xfa/0x15e
[ 587.448297] [<
ffffffffa00003a3>] vrf_del_slave+0x13/0x2a [vrf]
[ 587.449153] [<
ffffffffa00004a4>] vrf_dev_uninit+0xea/0x114 [vrf]
[ 587.450036] [<
ffffffff813d19b0>] rollback_registered_many+0x22b/0x2da
[ 587.450974] [<
ffffffff813d1aac>] unregister_netdevice_many+0x17/0x48
[ 587.451903] [<
ffffffff813de444>] rtnl_delete_link+0x3c/0x43
[ 587.452719] [<
ffffffff813dedcd>] rtnl_dellink+0x180/0x194
When the BUG is converted to a WARN_ON it shows 4 missing adjacencies:
eth3 - myvrf, mvrf - eth3, bond1 - myvrf and myvrf - bond1
All of those are because the __netdev_upper_dev_link function does not
properly link macvlan lower devices to myvrf when it is enslaved.
The second case just flips the ordering of the enslavements:
ip link set bond1 master bridge
ip link set macvlan master myvrf
Then run:
ip link delete bond1
ip link delete myvrf
The vrf delete command hangs because myvrf has a reference that has not
been released. In this case the removal code does not account for 2 paths
between eth3 and myvrf - one from bridge to vrf and the other through the
macvlan.
Rather than try to maintain a linked list of all upper and lower devices
per netdevice, only track the direct neighbors. The remaining stack can
be determined by recursively walking the neighbors.
The existing netdev_for_each_all_upper_dev_rcu,
netdev_for_each_all_lower_dev and netdev_for_each_all_lower_dev_rcu macros
are replaced with APIs that walk the upper and lower device lists. The
new APIs take a callback function and a data arg that is passed to the
callback for each device in the list. Drivers using the old macros are
converted in separate patches to make it easier on reviewers. It is an
API conversion only; no functional change is intended.
v3
- address Stephen's comment to simplify logic and remove typecasts
v2
- fixed bond0 references in cover-letter
- fixed definition of netdev_next_lower_dev_rcu to mirror the upper_dev
version.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:53 +0000 (19:15 -0700)]
net: dev: Improve debug statements for adjacency tracking
Adjacency code only has debugs for the insert case. Add debugs for
the remove path and make both consistently worded to make it easier
to follow the insert and removal with reference counts.
In addition, change the BUG to a WARN_ON. A missing adjacency at
removal time is not cause for a panic.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:52 +0000 (19:15 -0700)]
net: Add warning if any lower device is still in adjacency list
Lower list should be empty just like upper.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:51 +0000 (19:15 -0700)]
net: Remove all_adj_list and its references
Only direct adjacencies are maintained. All upper or lower devices can
be learned via the new walk API which recursively walks the adj_list for
upper devices or lower devices.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:50 +0000 (19:15 -0700)]
rocker: Flip to the new dev walk API
Convert rocker to the new dev walk API. This is just a code conversion;
no functional change is intended.
v2
- removed typecast of data
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:49 +0000 (19:15 -0700)]
mlxsw: Flip to the new dev walk API
Convert mlxsw users to new dev walk API. This is just a code conversion;
no functional change is intended.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:48 +0000 (19:15 -0700)]
ixgbe: Flip to the new dev walk API
Convert ixgbe users to new dev walk API. This is just a code conversion;
no functional change is intended.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:47 +0000 (19:15 -0700)]
IB/ipoib: Flip to new dev walk API
Convert ipoib_get_net_dev_match_addr to the new upper device walk API.
This is just a code conversion; no functional change is intended.
v2
- removed typecast of data
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:46 +0000 (19:15 -0700)]
IB/core: Flip to the new dev walk API
Convert rdma_is_upper_dev_rcu, handle_netdev_upper and
ipoib_get_net_dev_match_addr to the new upper device walk API.
This is just a code conversion; no functional change is intended.
v2
- removed typecast of data
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:45 +0000 (19:15 -0700)]
net: bonding: Flip to the new dev walk API
Convert alb_send_learning_packets and bond_has_this_ip to use the new
netdev_walk_all_upper_dev_rcu API. In both cases this is just a code
conversion; no functional change is intended.
v2
- removed typecast of data and simplified bond_upper_dev_walk
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:44 +0000 (19:15 -0700)]
net: Introduce new api for walking upper and lower devices
This patch introduces netdev_walk_all_upper_dev_rcu,
netdev_walk_all_lower_dev and netdev_walk_all_lower_dev_rcu. These
functions recursively walk the adj_list of devices to determine all upper
and lower devices.
The functions take a callback function that is invoked for each device
in the list. If the callback returns non-0, the walk is terminated and
the functions return that code back to callers.
v3
- simplified netdev_has_upper_dev_all_rcu and __netdev_has_upper_dev and
removed typecast as suggested by Stephen
v2
- fixed definition of netdev_next_lower_dev_rcu to mirror the upper_dev
version.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:43 +0000 (19:15 -0700)]
net: Remove refnr arg when inserting link adjacencies
Commit
93409033ae65 ("net: Add netdev all_adj_list refcnt propagation to
fix panic") propagated the refnr to insert and remove functions tracking
the netdev adjacency graph. However, for the insert path the refnr can
only be 1. Accordingly, remove the refnr argument to make that clear.
ie., the refnr arg in
93409033ae65 was only needed for the remove path.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Oct 2016 15:35:55 +0000 (11:35 -0400)]
Merge branch 'bpf-selftests'
Daniel Borkmann says:
====================
Move to BPF selftests
This set improves the test_verifier and test_maps suite and moves
it over to a new BPF selftest directory, so we can keep improving
it under kernel selftest umbrella. This also integrates a test
script for checking test_bpf.ko under various JIT options.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 17 Oct 2016 12:28:36 +0000 (14:28 +0200)]
bpf: add initial suite for selftests
Add a start of a test suite for kernel selftests. This moves test_verifier
and test_maps over to tools/testing/selftests/bpf/ along with various
code improvements and also adds a script for invoking test_bpf module.
The test suite can simply be run via selftest framework, f.e.:
# cd tools/testing/selftests/bpf/
# make
# make run_tests
Both test_verifier and test_maps were kind of misplaced in samples/bpf/
directory and we were looking into adding them to selftests for a while
now, so it can be picked up by kbuild bot et al and hopefully also get
more exposure and thus new test case additions.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 17 Oct 2016 12:28:35 +0000 (14:28 +0200)]
bpf: add various tests around spill/fill of regs
Add several spill/fill tests. Besides others, one that performs xadd
on the spilled register, one ldx/stx test where different types are
spilled from two branches and read out from common path. Verfier does
handle all correctly.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Oct 2016 15:34:22 +0000 (11:34 -0400)]
Merge branch 'ethernet-use-core-min-max-mtu'
Jarod Wilson says:
====================
ethernet: use core min/max MTU checking
Now that the network stack core min/max MTU checking infrastructure is in
place, time to start making drivers use it. We'll start with the easiest
ones, the ethernet drivers, split roughly by vendor, with a catch-all
patch at the end.
For the most part, every patch does the same essential thing: removes the
MTU range checking from the drivers' ndo_change_mtu function, puts those
ranges into the core net_device min_mtu and max_mtu fields, and where
possible, removes ndo_change_mtu functions entirely.
These patches have all been built through the 0-day build infrastructure
provided by Intel, on top of net-next as of October 17.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:17 +0000 (15:54 -0400)]
ethernet: use core min/max MTU checking
et131x: min_mtu 64, max_mtu 9216
altera_tse: min_mtu 64, max_mtu 1500
amd8111e: min_mtu 60, max_mtu 9000
bnad: min_mtu 46, max_mtu 9000
macb: min_mtu 68, max_mtu 1500 or 10240 depending on hardware capability
xgmac: min_mtu 46, max_mtu 9000
cxgb2: min_mtu 68, max_mtu 9582 (pm3393) or 9600 (vsc7326)
enic: min_mtu 68, max_mtu 9000
gianfar: min_mtu 50, max_mu 9586
hns_enet: min_mtu 68, max_mtu 9578 (v1) or 9706 (v2)
ksz884x: min_mtu 60, max_mtu 1894
myri10ge: min_mtu 68, max_mtu 9000
natsemi: min_mtu 64, max_mtu 2024
nfp: min_mtu 68, max_mtu hardware-specific
forcedeth: min_mtu 64, max_mtu 1500 or 9100, depending on hardware
pch_gbe: min_mtu 46, max_mtu 10300
pasemi_mac: min_mtu 64, max_mtu 9000
qcaspi: min_mtu 46, max_mtu 1500
- remove qcaspi_netdev_change_mtu as it is now redundant
rocker: min_mtu 68, max_mtu 9000
sxgbe: min_mtu 68, max_mtu 9000
stmmac: min_mtu 46, max_mtu depends on hardware
tehuti: min_mtu 60, max_mtu 16384
- driver had no max mtu checking, but product docs say 16k jumbo packets
are supported by the hardware
netcp: min_mtu 68, max_mtu 9486
- remove netcp_ndo_change_mtu as it is now redundant
via-velocity: min_mtu 64, max_mtu 9000
octeon: min_mtu 46, max_mtu 65370
CC: netdev@vger.kernel.org
CC: Mark Einon <mark.einon@gmail.com>
CC: Vince Bridgers <vbridger@opensource.altera.com>
CC: Rasesh Mody <rasesh.mody@qlogic.com>
CC: Nicolas Ferre <nicolas.ferre@atmel.com>
CC: Santosh Raspatur <santosh@chelsio.com>
CC: Hariprasad S <hariprasad@chelsio.com>
CC: Christian Benvenuti <benve@cisco.com>
CC: Sujith Sankar <ssujith@cisco.com>
CC: Govindarajulu Varadarajan <_govind@gmx.com>
CC: Neel Patel <neepatel@cisco.com>
CC: Claudiu Manoil <claudiu.manoil@freescale.com>
CC: Yisen Zhuang <yisen.zhuang@huawei.com>
CC: Salil Mehta <salil.mehta@huawei.com>
CC: Hyong-Youb Kim <hykim@myri.com>
CC: Jakub Kicinski <jakub.kicinski@netronome.com>
CC: Olof Johansson <olof@lixom.net>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Byungho An <bh74.an@samsung.com>
CC: Girish K S <ks.giri@samsung.com>
CC: Vipul Pandya <vipul.pandya@samsung.com>
CC: Giuseppe Cavallaro <peppe.cavallaro@st.com>
CC: Alexandre Torgue <alexandre.torgue@st.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Wingman Kwok <w-kwok2@ti.com>
CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:16 +0000 (15:54 -0400)]
ethernet/toshiba: use core min/max MTU checking
gelic_net: min_mtu 64, max_mtu 1518
- remove gelic_net_change_mtu now that it is redundant
spidernet: min_Mtu 64, max_mtu 2294
- remove spiter_net_change_mtu now that it is redundant
CC: netdev@vger.kernel.org
CC: Geoff Levand <geoff@infradead.org>
CC: Ishizaki Kou <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:15 +0000 (15:54 -0400)]
ethernet/tile: use core min/max MTU checking
tilegx: min_mtu 68, max_mtu 1500 or 9000, depending on modparam
- remove tile_net_change_mtu now that it is fully redundant
tilepro: min_mtu 68, max_mtu 1500
- hardware supports jumbo packets up to 10226, but it's not implemented or
tested yet, according to code comments
CC: netdev@vger.kernel.org
CC: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:14 +0000 (15:54 -0400)]
ethernet/ibm: use core min/max MTU checking
ehea: min_mtu 68, max_mtu 9022
- remove ehea_change_mtu, it's now redundant
emac: min_mtu 46, max_mtu 1500 or whatever gets read from OF
CC: netdev@vger.kernel.org
CC: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:13 +0000 (15:54 -0400)]
ethernet/cavium: use core min/max MTU checking
liquidio: min_mtu 68, max_mtu 16000
thunder: min_mtu 64, max_mtu 9200
CC: netdev@vger.kernel.org
CC: Sunil Goutham <sgoutham@cavium.com>
CC: Robert Richter <rric@kernel.org>
CC: Derek Chickles <derek.chickles@caviumnetworks.com>
CC: Satanand Burla <satananda.burla@caviumnetworks.com>
CC: Felix Manlunas <felix.manlunas@caviumnetworks.com>
CC: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:12 +0000 (15:54 -0400)]
ethernet/neterion: use core min/max MTU checking
s2io: min_mtu 46, max_mtu 9600
vxge: min_mtu 68, max_mtu 9600
CC: netdev@vger.kernel.org
CC: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:11 +0000 (15:54 -0400)]
ethernet/dlink: use core min/max MTU checking
dl2k: min_mtu 68, max_mtu 1536 or 8000, depending on hardware
- Removed change_mtu, does nothing productive anymore
sundance: min_mtu 68, max_mtu 8191
CC: netdev@vger.kernel.org
CC: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:10 +0000 (15:54 -0400)]
ethernet/sun: use core min/max MTU checking
cassini: min_mtu 60, max_mtu 9000
niu: min_mtu 68, max_mtu 9216
sungem: min_mtu 68, max_mtu 1500 (comments say jumbo mode is broken)
sunvnet: min_mtu 68, max_mtu 65535
- removed sunvnet_change_mut_common as it does nothing now
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:09 +0000 (15:54 -0400)]
ethernet/realtek: use core min/max MTU checking
8139cp: min_mtu 60, max_mtu 4096
8139too: min_mtu 68, max_mtu 1770
r8169: min_mtu 60, max_mtu depends on chipset, 1500 to 9k-ish
CC: netdev@vger.kernel.org
CC: Realtek linux nic maintainers <nic_swsd@realtek.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:08 +0000 (15:54 -0400)]
ethernet/qlogic: use core min/max MTU checking
qede: min_mtu 46, max_mtu 9600
- Put define for max in qede.h
qlcnic: min_mtu 68, max_mtu 9600
CC: netdev@vger.kernel.org
CC Dept-GELinuxNICDev@qlogic.com
CC: Yuval Mintz <Yuval.Mintz@qlogic.com>
CC: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:07 +0000 (15:54 -0400)]
ethernet/mellanox: use core min/max MTU checking
mlx4: min_mtu 46, max_mtu depends on hardware
mlx5: min_mtu 68, max_mtu depends on hardware
CC: netdev@vger.kernel.org
CC: Tariq Toukan <tariqt@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:06 +0000 (15:54 -0400)]
ethernet/marvell: use core min/max MTU checking
mvneta: min_mtu 68, max_mtu 9676
- mtu validation routine mostly did range check, merge back into
mvneta_change_mtu for simplicity
mvpp2: min_mtu 68, max_mtu 9676
- mtu validation routine mostly did range check, merge back into
mvpp2_change_mtu for simplicity
pxa168_eth: min_mtu 68, max_mtu 9500
skge: min_mtu 60, max_mtu 9000
sky2: min_mtu 68, max_mtu 1500 or 9000, depending on hw
CC: netdev@vger.kernel.org
CC: Mirko Lindner <mlindner@marvell.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:05 +0000 (15:54 -0400)]
ethernet/intel: use core min/max MTU checking
e100: min_mtu 68, max_mtu 1500
- remove e100_change_mtu entirely, is identical to old eth_change_mtu,
and no longer serves a purpose. No need to set min_mtu or max_mtu
explicitly, as ether_setup() will already set them to 68 and 1500.
e1000: min_mtu 46, max_mtu 16110
e1000e: min_mtu 68, max_mtu varies based on adapter
fm10k: min_mtu 68, max_mtu 15342
- remove fm10k_change_mtu entirely, does nothing now
i40e: min_mtu 68, max_mtu 9706
i40evf: min_mtu 68, max_mtu 9706
igb: min_mtu 68, max_mtu 9216
- There are two different "max" frame sizes claimed and both checked in
the driver, the larger value wasn't relevant though, so I've set max_mtu
to the smaller of the two values here to retain identical behavior.
igbvf: min_mtu 68, max_mtu 9216
- Same issue as igb duplicated
ixgb: min_mtu 68, max_mtu 16114
- Also remove pointless old == new check, as that's done in dev_set_mtu
ixgbe: min_mtu 68, max_mtu 9710
ixgbevf: min_mtu 68, max_mtu dependent on hardware/firmware
- Some hw can only handle up to max_mtu 1504 on a vf, others 9710
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:04 +0000 (15:54 -0400)]
ethernet/broadcom: use core min/max MTU checking
tg3: min_mtu 60, max_mtu 9000/1500
bnxt: min_mtu 60, max_mtu 9000
bnx2x: min_mtu 46, max_mtu 9600
- Fix up ETH_OVREHEAD -> ETH_OVERHEAD while we're in here, remove
duplicated defines from bnx2x_link.c.
bnx2: min_mtu 46, max_mtu 9000
- Use more standard ETH_* defines while we're at it.
bcm63xx_enet: min_mtu 46, max_mtu 2028
- compute_hw_mtu was made largely pointless, and thus merged back into
bcm_enet_change_mtu.
b44: min_mtu 60, max_mtu 1500
CC: netdev@vger.kernel.org
CC: Michael Chan <michael.chan@broadcom.com>
CC: Sony Chacko <sony.chacko@qlogic.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Dept-HSGLinuxNICDev@qlogic.com
CC: Siva Reddy Kallam <siva.kallam@broadcom.com>
CC: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Mon, 17 Oct 2016 19:54:03 +0000 (15:54 -0400)]
ethernet/atheros: use core min/max MTU checking
atl2: min_mtu 40, max_mtu 1504
- Remove a few redundant defines that already have equivalents in
if_ether.h.
atl1: min_mtu 42, max_mtu 10218
atl1e: min_mtu 42, max_mtu 8170
atl1c: min_mtu 42, max_mtu 6122/1500
- GbE hardware gets a max_mtu of 6122, slower hardware gets 1500.
alx: min_mtu 34, max_mtu 9256
- Not so sure that minimum MTU number is really what was intended, but
that's what the math actually makes it out to be, due to max_frame
manipulations and comparison in alx_change_mtu, rather than just
comparing new_mtu. (I think 68 was the intended min_mtu value).
CC: netdev@vger.kernel.org
CC: Jay Cliburn <jcliburn@gmail.com>
CC: Chris Snook <chris.snook@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Oct 2016 14:42:17 +0000 (10:42 -0400)]
Merge branch 'dp83867-impedance-control'
Mugunthan V N says:
====================
add support for impedance control for TI dp83867 phy and fix 2nd ethernet on dra72 rev C evm
Add support for configurable impedance control for TI dp83867
phy via devicetree. More documentation in [1].
CPSW second ethernet is not working, fix it by enabling
impedance configuration on the phy.
Verified the patch on DRA72 Rev C evm, logs at [2]. Also pushed
a branch [3] for others to test.
Changes from v3:
* Fixup change log text and no code changes.
Changes from v2:
* Fixed a typo in dts and driver.
Changes from initial version:
* As per Sekhar's comment, instead of passing impedance values,
change to max and min impedance from DT
* Adopted phy_read_mmd_indirect() to cunnrent implementation.
* Corrected the phy delay timings to the optimal value.
[1] - http://www.ti.com/lit/ds/symlink/dp83867ir.pdf
[2] - http://pastebin.ubuntu.com/
23343139/
[3] - git://git.ti.com/~mugunthanvnm/ti-linux-kernel/linux.git dp83867-v4
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Tue, 18 Oct 2016 11:20:20 +0000 (16:50 +0530)]
ARM: dts: dra72-evm-revc: fix correct phy delay
The current delay settings of the phy are not the optimal value,
fix it with correct values.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Tue, 18 Oct 2016 11:20:19 +0000 (16:50 +0530)]
ARM: dts: dra72-evm-revc: add phy impedance settings
The default impedance settings of the phy is not the optimal
value, due to this the second ethernet is not working. Fix it
with correct values which makes the second ethernet port to work.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Tue, 18 Oct 2016 11:20:18 +0000 (16:50 +0530)]
net: phy: dp83867: add support for MAC impedance configuration
Add support for programmable MAC impedance configuration
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Tue, 18 Oct 2016 11:20:17 +0000 (16:50 +0530)]
net: phy: dp83867: Add documentation for optional impedance control
Add documention of ti,min-output-impedance and ti,max-output-impedance
which can be used to correct MAC impedance mismatch using phy extended
registers.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Tue, 18 Oct 2016 06:44:17 +0000 (08:44 +0200)]
vlan: Remove unnecessary comparison of unsigned against 0
args.u.name_type is of type unsigned int and is always >= 0.
This fixes the following GCC warning:
net/8021q/vlan.c: In function ‘vlan_ioctl_handler’:
net/8021q/vlan.c:574:14: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 17 Oct 2016 15:19:56 +0000 (15:19 +0000)]
fsl/fman: fix error return code in mac_probe()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes:
3933961682a3 ("fsl/fman: Add FMan MAC driver")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 17 Oct 2016 17:02:22 +0000 (18:02 +0100)]
net: report right mtu value in error message
Check is for max_mtu but message reports min_mtu.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 17 Oct 2016 15:31:58 +0000 (15:31 +0000)]
net: ethernet: nb8800: fix error return code in nb8800_open()
Fix to return error code -ENODEV from the of_phy_connect() error
handling case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 17 Oct 2016 14:30:57 +0000 (16:30 +0200)]
fjes: fix format string for trace output
phys_addr_t may be wider than a pointer and has to be printed
using the special %pap format string, as pointed out by
this new warning.
arch/x86/include/../../../drivers/net/fjes/fjes_trace.h: In function ‘trace_raw_output_fjes_hw_start_debug_req’:
arch/x86/include/../../../drivers/net/fjes/fjes_trace.h:212:563: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
Note that this has to pass the address by reference instead of
casting it to a different type.
Fixes:
b6ba737d0b29 ("fjes: ethtool -w and -W support for fjes driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Oct 2016 15:18:09 +0000 (11:18 -0400)]
Merge branch 'mv88e6xxx-interrupt-support'
Andrew Lunn says:
====================
Interrupt support for mv88e6xxx
This patchset add interrupt controller support to the MV88E6xxx. This
allows access to the interrupts the internal PHY generate. These
interrupts can then be associated to a PHY device in the device tree
and used by the PHY lib, rather than polling.
Since interrupt handling needs to make MDIO bus accesses, threaded
interrupts are used. The phylib needs to request the PHY interrupt
using the threaded IRQ API. This in term allows some simplification to
the code, in that the phylib interrupt handler can directly call
phy_change(), rather than use a work queue. The work queue is however
retained for the phy_mac_interrupt() call, which can be called in hard
interrupt context.
Since RFC v1:
Keep phy_mac_interrupt() callable in hard IRQ context.
The fix to trigger the phy state machine transitions on interrupts has
already been submitted, so is dropped from here.
Added back shared interrupts support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 16 Oct 2016 17:56:53 +0000 (19:56 +0200)]
arm: vf610: zii devel b: Add support for switch interrupts
The Switches use GPIO lines to indicate interrupts from two of the
switches.
With these interrupts in place, we can make use of the interrupt
controllers within the switch to indicate when the internal PHYs
generate an interrupt. Use standard PHY properties to do this.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 16 Oct 2016 17:56:52 +0000 (19:56 +0200)]
net: phy: Use phy name when requesting the interrupt
Using the fixed name "phy_interrupt" is not very informative in
/proc/interrupts when there are a lot of phys, e.g. a device with an
Ethernet switch. So when requesting the interrupt, use the name of the
phy.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 16 Oct 2016 17:56:51 +0000 (19:56 +0200)]
net: phy: Threaded interrupts allow some simplification
The PHY interrupts are now handled in a threaded interrupt handler,
which can sleep. The work queue is no longer needed, phy_change() can
be called directly. phy_mac_interrupt() still needs to be safe to call
in interrupt context, so keep the work queue, and use a helper to call
phy_change().
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 16 Oct 2016 17:56:50 +0000 (19:56 +0200)]
net: phy: Use threaded IRQ, to allow IRQ from sleeping devices
The interrupt lines from PHYs maybe connected to I2C bus expanders, or
from switches on MDIO busses. Such interrupts are sourced from devices
which sleep, so use threaded interrupts. Threaded interrupts require
that the interrupt requester also uses the threaded API. Change the
phylib to use the threaded API, which is backwards compatible with
none-threaded IRQs.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 16 Oct 2016 17:56:49 +0000 (19:56 +0200)]
net: dsa: mv88e6xxx: Implement interrupt support.
The switch can have up to two interrupt controllers. One of these
contains the interrupts from the integrated PHYs, so is useful to
export. The Marvell PHY driver can then be used in interrupt mode,
rather than polling, speeding up PHY handling and reducing load on the
MDIO bus.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sat, 15 Oct 2016 18:53:22 +0000 (11:53 -0700)]
rds: Remove duplicate prefix from rds_conn_path_error use
rds_conn_path_error already prefixes "RDS:" to the output.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sat, 15 Oct 2016 18:53:21 +0000 (11:53 -0700)]
rds: Remove unused rds_conn_error
This macro's last use was removed in commit
d769ef81d5b59
("RDS: Update rds_conn_shutdown to work with rds_conn_path")
so make the macro and the __rds_conn_error function definition
and declaration disappear.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Mon, 17 Oct 2016 07:22:04 +0000 (09:22 +0200)]
net: hip04: Remove superfluous ether_setup after alloc_etherdev
There is no need to call ether_setup after alloc_ethdev since it was
already called there.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 17 Oct 2016 04:25:35 +0000 (21:25 -0700)]
ila: Don't use dest cache when gateway is set
If the gateway is set on an ILA route we don't need to bother with using
the destination cache in the ILA route. Translation does not change the
routing in this case so we can stick with orig_output in the lwstate
output function.
Tested: Ran netperf with and without gateway for LWT route.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sat, 15 Oct 2016 15:40:30 +0000 (17:40 +0200)]
ipvlan: constify l3mdev_ops structure
This l3mdev_ops structure is only stored in the l3mdev_ops field of a
net_device structure. This field is declared const, so the l3mdev_ops
structure can be declared as const also. Additionally drop the
__read_mostly annotation.
The semantic patch that adds const is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct l3mdev_ops i@p = { ... };
@ok@
identifier r.i;
struct net_device *e;
position p;
@@
e->l3mdev_ops = &i@p;
@bad@
position p != {r.p,ok.p};
identifier r.i;
struct l3mdev_ops e;
@@
e@i@p
@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
struct l3mdev_ops i = { ... };
// </smpl>
The effect on the layout of the .o file is shown by the following output
of the size command, first before then after the transformation:
text data bss dec hex filename
7364 466 52 7882 1eca drivers/net/ipvlan/ipvlan_main.o
7412 434 52 7898 1eda drivers/net/ipvlan/ipvlan_main.o
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 15 Oct 2016 21:33:42 +0000 (17:33 -0400)]
Merge branch 'ila-cached-route'
Tom Herbert says:
====================
ila: Cache a route in ILA lwt structure
Add a dst_cache to ila_lwt structure. This holds a cached route for the
translated address. In ila_output we now perform a route lookup after
translation and if possible (destination in original route is full 128
bits) we set the dst_cache. Subsequent calls to ila_output can then use
the cache to avoid the route lookup.
This eliminates the need to set the gateway on ILA routes as previously
was being done. Now we can do somthing like:
./ip route add 3333::2000:0:0:2/128 encap ila 2222:0:0:2 \
csum-mode neutral-map dev eth0 ## No via needed!
Also, add destroy_state to lwt ops. We need this do destroy the
dst_cache.
- v2
- Fixed comparisons to fc_dst_len to make comparison against number
of bits in data structure not bytes.
- Move destroy_state under build_state (requested by Jiri)
- Other minor cleanup
Tested:
Running 200 TCP_RR streams:
Baseline, no ILA
1730716 tps
102/170/313 50/90/99% latencies
88.11 CPU utilization
Using ILA in both directions
1680428 tps
105/176/325 50/90/99% latencies
88.16 CPU utilization
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 14 Oct 2016 18:25:37 +0000 (11:25 -0700)]
ila: Cache a route to translated address
Add a dst_cache to ila_lwt structure. This holds a cached route for the
translated address. In ila_output we now perform a route lookup after
translation and if possible (destination in original route is full 128
bits) we set the dst_cache. Subsequent calls to ila_output can then use
the cache to avoid the route lookup.
This eliminates the need to set the gateway on ILA routes as previously
was being done. Now we can do something like:
./ip route add 3333::2000:0:0:2/128 encap ila 2222:0:0:2 \
csum-mode neutral-map dev eth0 ## No via needed!
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 14 Oct 2016 18:25:36 +0000 (11:25 -0700)]
lwtunnel: Add destroy state operation
Users of lwt tunnels may set up some secondary state in build_state
function. Add a corresponding destroy_state function to allow users to
clean up state. This destroy state function is called from lwstate_free.
Also, we now free lwstate using kfree_rcu so user can assume structure
is not freed before rcu.
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Oct 2016 16:04:58 +0000 (12:04 -0400)]
Merge branch 'fjes-next'
Taku Izumi says:
====================
FUJITSU Extended Socket driver version 1.2
This patchset updates FUJITSU Extended Socket network driver into version 1.2.
This includes the following enhancements:
- ethtool -d support
- ethtool -S enhancement
- ethtool -w/-W support
- Add some debugging feature (tracepoints etc)
v1 -> v2:
- Use u64 instead of phys_addr_t as TP_STRUCT__entry
- Use ethtool facility to achieve debug mode instead of using debugfs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 14 Oct 2016 11:28:15 +0000 (20:28 +0900)]
fjes: Update fjes driver version : 1.2
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 14 Oct 2016 11:28:07 +0000 (20:28 +0900)]
fjes: Add debugfs entry for EP status information in fjes driver
This patch adds debugfs entry to show EP status information.
You can get each EP's status information like the following:
# cat /sys/kernel/debug/fjes/fjes.0/status
EPID STATUS SAME_ZONE CONNECTED
ep0 shared Y Y
ep1 - - -
ep2 unshared N N
ep3 unshared N N
ep4 unshared N N
ep5 unshared N N
ep6 unshared N N
ep7 unshared N N
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 14 Oct 2016 11:27:45 +0000 (20:27 +0900)]
fjes: ethtool -w and -W support for fjes driver
This patch adds implementation of supporting
ethtool -w and -W for fjes driver.
You can enable and disable firmware debug mode by
using ethtool -W, and also retrieve firmware
activity information by using ethtool -w.
This is useful for debugging.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 14 Oct 2016 11:27:38 +0000 (20:27 +0900)]
fjes: Add tracepoints in fjes driver
This patch adds tracepoints in fjes driver.
This is useful for debugging purpose.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 14 Oct 2016 11:27:32 +0000 (20:27 +0900)]
fjes: Enhance ethtool -S for fjes driver
This patch enhances ethtool -S for fjes driver so that
EP related statistics can be retrieved.
The following statistics can be displayed via ethtool -S:
ep%d_com_regist_buf_exec
ep%d_com_unregist_buf_exec
ep%d_send_intr_rx
ep%d_send_intr_unshare
ep%d_send_intr_zoneupdate
ep%d_recv_intr_rx
ep%d_recv_intr_unshare
ep%d_recv_intr_stop
ep%d_recv_intr_zoneupdate
ep%d_tx_buffer_full
ep%d_tx_dropped_not_shared
ep%d_tx_dropped_ver_mismatch
ep%d_tx_dropped_buf_size_mismatch
ep%d_tx_dropped_vlanid_mismatch
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 14 Oct 2016 11:27:25 +0000 (20:27 +0900)]
fjes: ethtool -d support for fjes driver
This patch adds implementation of supporting
ethtool -d for fjes driver. By using ethtool -d,
you can get registers dump of Exetnded socket device.
# ethtool -d es0
Offset Values
------ ------
0x0000: 01 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00
0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0020: 02 00 00 80 02 00 00 80 64 a6 58 08 07 00 00 00
0x0030: 00 00 00 00 28 80 00 00 00 00 f9 e3 06 00 00 00
0x0040: 00 00 00 00 18 00 00 00 80 a4 58 08 07 00 00 00
0x0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0080: 00 00 00 00 00 00 e0 7f 00 00 01 00 00 00 01 00
0x0090: 00 00 00 00
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Oct 2016 15:59:59 +0000 (11:59 -0400)]
Merge branch 'qed-next'
Manish Chopra says:
====================
qed*: driver updates
There are several new additions in this series;
Most are connected to either Tx offloading or Rx classifications
[either fastpath changes or supporting configuration].
In addition, there's a single IOV enhancement.
Please consider applying this series to `net-next'.
V2->V3:
Fixes below kbuild warning
call to '__compiletime_assert_60' declared with
attribute error: Need native word sized stores/loads for atomicity.
V1->V2:
Added a fix for the race in ramrod handling
pointed by Eric Dumazet [patch 7].
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Fri, 14 Oct 2016 09:19:23 +0000 (05:19 -0400)]
qed: Fix possible race when reading firmware return code.
While handling SPQ ramrod completion, there is a possible race
where driver might not read updated fw return code based on
ramrod completion done. This patch ensures that fw return code
is written first and then completion done flag is updated
using appropriate memory barriers.
Signed-off-by: Manish Chopra <manish.chopra@caviumnetworks.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Fri, 14 Oct 2016 09:19:22 +0000 (05:19 -0400)]
qed: Handle malicious VFs events
Malicious VFs might be caught in several different methods:
- Misusing their bar permission and being blocked by hardware.
- Misusing their fastpath logic and being blocked by firmware.
- Misusing their interaction with their PF via hw-channel,
and being blocked by PF driver.
On the first two items, firmware would indicate to driver that
the VF is to be considered malicious, but would sometime still
allow the VF to communicate with the PF [depending on the exact
nature of the malicious activity done by the VF].
The current existing logic on the PF side lacks handling of such events,
and might allow the PF to perform some incorrect configuration on behalf
of a VF that was previously indicated as malicious.
The new scheme is simple -
Once the PF determines a VF is malicious it would:
a. Ignore any further requests on behalf of the VF-driver.
b. Prevent any configurations initiated by the hyperuser for
the malicious VF, as firmware isn't willing to serve such.
The malicious indication would be cleared upon the VF flr,
after which it would become usable once again.
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Fri, 14 Oct 2016 09:19:21 +0000 (05:19 -0400)]
qed: Allow chance for fast ramrod completions
Whenever a ramrod is being sent for some device configuration,
the driver is going to sleep at least 5ms between each iteration
of polling on the completion of the ramrod.
However, in almost every configuration scenario the firmware
would be able to comply and complete the ramrod in a manner of
several usecs. This is especially important in cases where there
might be a lot of sequential configurations applying to the hardware
[e.g., RoCE], in which case the existing scheme might cause some
visible user delays.
This patch changes the completion scheme - instead of immediately
starting to sleep for a 'long' period, allow the device to quickly
poll on the first iteration after a couple of usecs.
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Fri, 14 Oct 2016 09:19:20 +0000 (05:19 -0400)]
qed*: Allow unicast filtering
Apparently qede fails to set IFF_UNICAST_FLT, and as a result is not
actually performing unicast MAC filtering.
While we're at it - relax a hard-coded limitation that limits each
interface into using at most 15 unicast MAC addresses before turning
promiscuous. Instead utilize the HW resources to their limit.
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Fri, 14 Oct 2016 09:19:19 +0000 (05:19 -0400)]
qede: Prevent GSO on long Geneve headers
Due to hardware limitation, when transmitting a geneve-encapsulated
packet with more than 32 bytes worth of geneve options the hardware
would not be able to crack the packet and consider it a regular UDP
packet.
This implements the ndo_features_check() in qede in order to prevent
GSO on said transmitted packets.
Signed-off-by: Manish Chopra <manish.chopra@caviumnetworks.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Fri, 14 Oct 2016 09:19:18 +0000 (05:19 -0400)]
qede: GSO support for tunnels with outer csum
This patch adds GSO support for GRE and UDP tunnels
where outer checksums are enabled.
Signed-off-by: Manish Chopra <manish.chopra@caviumnetworks.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Fri, 14 Oct 2016 09:19:17 +0000 (05:19 -0400)]
qed: Pass MAC hints to VFs
Some hypervisors can support MAC hints to their VFs.
Even though we don't have such a hypervisor API in linux, we add
sufficient logic for the VF to be able to receive such hints and
set the mac accordingly - as long as the VF has not been set with
a MAC already.
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Oct 2016 14:23:07 +0000 (10:23 -0400)]
Merge branch 'ingress-actions'
Shmulik Ladkani says:
====================
act_mirred: Ingress actions support
This patch series implements action mirred 'ingress' actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.
v4:
in 4/4, check ret code of netif_receive_skb, as suggested by Cong Wang
v3:
in 4/4, addressed non coherency due to reading m->tcfm_eaction multiple
times, as spotted by Eric Dumazet
v2:
in 1/4, declare tcfm_mac_header_xmit as bool instead of int
====================
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shmulik Ladkani [Thu, 13 Oct 2016 06:06:44 +0000 (09:06 +0300)]
net/sched: act_mirred: Implement ingress actions
Up until now, 'action mirred' supported only egress actions (either
TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
This patch implements the corresponding ingress actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shmulik Ladkani [Thu, 13 Oct 2016 06:06:43 +0000 (09:06 +0300)]
net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror'
These accessors are used in various drivers that support tc offloading,
to detect properties of a given 'tc_action'.
'is_tcf_mirred_redirect' tests that the action is TCA_EGRESS_REDIR.
'is_tcf_mirred_mirror' tests that the action is TCA_EGRESS_MIRROR.
As a prep towards supporting INGRESS redir/mirror, rename these
predicates to reflect their true meaning:
s/is_tcf_mirred_redirect/is_tcf_mirred_egress_redirect/
s/is_tcf_mirred_mirror/is_tcf_mirred_egress_mirror/
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shmulik Ladkani [Thu, 13 Oct 2016 06:06:42 +0000 (09:06 +0300)]
net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header
Move detection logic that tests whether device expects skb data to point
at mac_header upon xmit into a function.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shmulik Ladkani [Thu, 13 Oct 2016 06:06:41 +0000 (09:06 +0300)]
net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit and make it a bool
'tcfm_ok_push' specifies whether a mac_len sized push is needed upon
egress to the target device (if action is performed at ingress).
Rename it to 'tcfm_mac_header_xmit' as this is actually an attribute of
the target device (and use a bool instead of int).
This allows to decouple the attribute from the action to be taken.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allan W. Nielsen [Wed, 12 Oct 2016 13:47:51 +0000 (15:47 +0200)]
net: phy: Cleanup the Edge-Rate feature in Microsemi PHYs.
Edge-Rate cleanup include the following:
- Updated device tree bindings documentation for edge-rate
- The edge-rate is now specified as a "slowdown", meaning that it is now
being specified as positive values instead of negative (both
documentation and implementation wise).
- Only explicitly documented values for "vsc8531,vddmac" and
"vsc8531,edge-slowdown" are accepted by the device driver.
- Deleted include/dt-bindings/net/mscc-phy-vsc8531.h as it was not needed.
- Read/validate devicetree settings in probe instead of init
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: Raju Lakkaraju <raju.lakkaraju@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Tue, 11 Oct 2016 20:04:09 +0000 (13:04 -0700)]
Revert "net: Add driver helper functions to determine checksum offloadability"
This reverts commit
6ae23ad36253a8033c5714c52b691b84456487c5.
The code has been in kernel since 4.4 but there are no in tree
code that uses. Unused code is broken code, remove it.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Oct 2016 14:00:27 +0000 (10:00 -0400)]
Merge git://git./linux/kernel/git/davem/net
Linus Torvalds [Fri, 14 Oct 2016 04:40:23 +0000 (21:40 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix various build warnings in tlan/qed/xen-netback drivers, from
Arnd Bergmann.
2) Propagate proper error code in strparser's strp_recv(), from Geert
Uytterhoeven.
3) Fix accidental broadcast of RTM_GETTFILTER responses, from Eric
Dumazret.
4) Need to use list_for_each_entry_safe() in qed driver, from Wei
Yongjun.
5) Openvswitch 802.1AD bug fixes from Jiri Benc.
6) Cure BUILD_BUG_ON() in mlx5 driver, from Tom Herbert.
7) Fix UDP ipv6 checksumming in netvsc driver, from Stephen Hemminger.
8) stmmac driver fixes from Giuseppe CAVALLARO.
9) Fix access to mangled IP6CB in tcp, from Eric Dumazet.
10) Fix info leaks in tipc and rtnetlink, from Dan Carpenter.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
net: bridge: add the multicast_flood flag attribute to brport_attrs
net: axienet: Remove unused parameter from __axienet_device_reset
liquidio: CN23XX: fix a loop timeout
net: rtnl: info leak in rtnl_fill_vfinfo()
tipc: info leak in __tipc_nl_add_udp_addr()
net: ipv4: Do not drop to make_route if oif is l3mdev
net: phy: Trigger state machine on state change and not polling.
ipv6: tcp: restore IP6CB for pktoptions skbs
netvsc: Remove mistaken udp.h inclusion.
xen-netback: fix type mismatch warning
stmmac: fix error check when init ptp
stmmac: fix ptp init for gmac4
qed: fix old-style function definition
netvsc: fix checksum on UDP IPV6
net_sched: reorder pernet ops and act ops registrations
xen-netback: fix guest Rx stall detection (after guest Rx refactor)
drivers/ptp: Fix kernel memory disclosure
net/mlx5: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON
qmi_wwan: add support for Quectel EC21 and EC25
openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev
...
Linus Torvalds [Fri, 14 Oct 2016 04:28:20 +0000 (21:28 -0700)]
Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"Highlights include:
Stable bugfixes:
- sunrpc: fix writ espace race causing stalls
- NFS: Fix inode corruption in nfs_prime_dcache()
- NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
- NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
- NFSv4: Open state recovery must account for file permission changes
- NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
Features:
- Add support for tracking multiple layout types with an ordered list
- Add support for using multiple backchannel threads on the client
- Add support for pNFS file layout session trunking
- Delay xprtrdma use of DMA API (for device driver removal)
- Add support for xprtrdma remote invalidation
- Add support for larger xprtrdma inline thresholds
- Use a scatter/gather list for sending xprtrdma RPC calls
- Add support for the CB_NOTIFY_LOCK callback
- Improve hashing sunrpc auth_creds by using both uid and gid
Bugfixes:
- Fix xprtrdma use of DMA API
- Validate filenames before adding to the dcache
- Fix corruption of xdr->nwords in xdr_copy_to_scratch
- Fix setting buffer length in xdr_set_next_buffer()
- Don't deadlock the state manager on the SEQUENCE status flags
- Various delegation and stateid related fixes
- Retry operations if an interrupted slot receives EREMOTEIO
- Make nfs boot time y2038 safe"
* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
fs: nfs: Make nfs boot time y2038 safe
sunrpc: replace generic auth_cred hash with auth-specific function
sunrpc: add RPCSEC_GSS hash_cred() function
sunrpc: add auth_unix hash_cred() function
sunrpc: add generic_auth hash_cred() function
sunrpc: add hash_cred() function to rpc_authops struct
Retry operation on EREMOTEIO on an interrupted slot
pNFS: Fix atime updates on pNFS clients
sunrpc: queue work on system_power_efficient_wq
NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
NFSv4: If recovery failed for a specific open stateid, then don't retry
NFSv4: Fix retry issues with nfs41_test/free_stateid
NFSv4: Open state recovery must account for file permission changes
NFSv4: Mark the lock and open stateids as invalid after freeing them
NFSv4: Don't test open_stateid unless it is set
NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
NFSv4: Fix a race when updating an open_stateid
NFSv4: Fix a race in nfs_inode_reclaim_delegation()
...
Linus Torvalds [Fri, 14 Oct 2016 04:04:42 +0000 (21:04 -0700)]
Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Some RDMA work and some good bugfixes, and two new features that could
benefit from user testing:
- Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
COPY is already supported on the client side, so a call to
copy_file_range() on a recent client should now result in a
server-side copy that doesn't require all the data to make a round
trip to the client and back.
- Jeff Layton implemented callbacks to notify clients when contended
locks become available, which should reduce latency on workloads
with contended locks"
* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
NFSD: Implement the COPY call
nfsd: handle EUCLEAN
nfsd: only WARN once on unmapped errors
exportfs: be careful to only return expected errors.
nfsd4: setclientid_confirm with unmatched verifier should fail
nfsd: randomize SETCLIENTID reply to help distinguish servers
nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
nfsd: add a LRU list for blocked locks
nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
nfsd: plumb in a CB_NOTIFY_LOCK operation
NFSD: fix corruption in notifier registration
svcrdma: support Remote Invalidation
svcrdma: Server-side support for rpcrdma_connect_private
rpcrdma: RDMA/CM private message data structure
svcrdma: Skip put_page() when send_reply() fails
svcrdma: Tail iovec leaves an orphaned DMA mapping
nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
nfsd: eliminate cb_minorversion field
nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
Linus Torvalds [Fri, 14 Oct 2016 03:28:22 +0000 (20:28 -0700)]
Merge tag 'xfs-reflink-for-linus-4.9-rc1' of git://git./linux/kernel/git/dgc/linux-xfs
< XFS has gained super CoW powers! >
----------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Pull XFS support for shared data extents from Dave Chinner:
"This is the second part of the XFS updates for this merge cycle. This
pullreq contains the new shared data extents feature for XFS.
Given the complexity and size of this change I am expecting - like the
addition of reverse mapping last cycle - that there will be some
follow-up bug fixes and cleanups around the -rc3 stage for issues that
I'm sure will show up once the code hits a wider userbase.
What it is:
At the most basic level we are simply adding shared data extents to
XFS - i.e. a single extent on disk can now have multiple owners. To do
this we have to add new on-disk features to both track the shared
extents and the number of times they've been shared. This is done by
the new "refcount" btree that sits in every allocation group. When we
share or unshare an extent, this tree gets updated.
Along with this new tree, the reverse mapping tree needs to be updated
to track each owner or a shared extent. This also needs to be updated
ever share/unshare operation. These interactions at extent allocation
and freeing time have complex ordering and recovery constraints, so
there's a significant amount of new intent-based transaction code to
ensure that operations are performed atomically from both the runtime
and integrity/crash recovery perspectives.
We also need to break sharing when writes hit a shared extent - this
is where the new copy-on-write implementation comes in. We allocate
new storage and copy the original data along with the overwrite data
into the new location. We only do this for data as we don't share
metadata at all - each inode has it's own metadata that tracks the
shared data extents, the extents undergoing CoW and it's own private
extents.
Of course, being XFS, nothing is simple - we use delayed allocation
for CoW similar to how we use it for normal writes. ENOSPC is a
significant issue here - we build on the reservation code added in
4.8-rc1 with the reverse mapping feature to ensure we don't get
spurious ENOSPC issues part way through a CoW operation. These
mechanisms also help minimise fragmentation due to repeated CoW
operations. To further reduce fragmentation overhead, we've also
introduced a CoW extent size hint, which indicates how large a region
we should allocate when we execute a CoW operation.
With all this functionality in place, we can hook up .copy_file_range,
.clone_file_range and .dedupe_file_range and we gain all the
capabilities of reflink and other vfs provided functionality that
enable manipulation to shared extents. We also added a fallocate mode
that explicitly unshares a range of a file, which we implemented as an
explicit CoW of all the shared extents in a file.
As such, it's a huge chunk of new functionality with new on-disk
format features and internal infrastructure. It warns at mount time as
an experimental feature and that it may eat data (as we do with all
new on-disk features until they stabilise). We have not released
userspace suport for it yet - userspace support currently requires
download from Darrick's xfsprogs repo and build from source, so the
access to this feature is really developer/tester only at this point.
Initial userspace support will be released at the same time the kernel
with this code in it is released.
The new code causes 5-6 new failures with xfstests - these aren't
serious functional failures but things the output of tests changing
slightly due to perturbations in layouts, space usage, etc. OTOH,
we've added 150+ new tests to xfstests that specifically exercise this
new functionality so it's got far better test coverage than any
functionality we've previously added to XFS.
Darrick has done a pretty amazing job getting us to this stage, and
special mention also needs to go to Christoph (review, testing,
improvements and bug fixes) and Brian (caught several intricate bugs
during review) for the effort they've also put in.
Summary:
- unshare range (FALLOC_FL_UNSHARE) support for fallocate
- copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
interface
- shared extent support for XFS
- copy-on-write support for shared extents
- copy_file_range support
- clone_file_range support (implements reflink)
- dedupe_file_range support
- defrag support for reverse mapping enabled filesystems"
* tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
xfs: convert COW blocks to real blocks before unwritten extent conversion
xfs: rework refcount cow recovery error handling
xfs: clear reflink flag if setting realtime flag
xfs: fix error initialization
xfs: fix label inaccuracies
xfs: remove isize check from unshare operation
xfs: reduce stack usage of _reflink_clear_inode_flag
xfs: check inode reflink flag before calling reflink functions
xfs: implement swapext for rmap filesystems
xfs: refactor swapext code
xfs: various swapext cleanups
xfs: recognize the reflink feature bit
xfs: simulate per-AG reservations being critically low
xfs: don't mix reflink and DAX mode for now
xfs: check for invalid inode reflink flags
xfs: set a default CoW extent size of 32 blocks
xfs: convert unwritten status of reverse mappings for shared files
xfs: use interval query for rmap alloc operations on shared files
xfs: add shared rmap map/unmap/convert log item types
xfs: increase log reservations for reflink
...