David S. Miller [Mon, 31 Jul 2017 21:37:50 +0000 (14:37 -0700)]
Merge branch 'tcp-remove-prequeue-and-header-prediction'
Florian Westphal says:
====================
tcp: remove prequeue and header prediction
During a hallway discussion with Eric Dumazet at Netdev 1.2 in
Tokyo some maybe-not-so-useful-anymore TCP stack features came up,
among these header prediction and prequeueing.
In brief, TCP prequeue assumes a single-process-blocking-read design,
which is not that common anymore. The most frequently used high-performance
networking program that is an excellent fit for these features is netperf.
The idea behind prequeueing is to move part of tcp processing, including
retransmit queue cleaning, to process context.
With (e)poll designs, prequeue is always skipped, so for such programs
this is dead-code removal.
Header prediction is also less useful nowadays.
For packet trains, GRO will do packet aggregation so we do not get the
per-packet benefit that this had before GRO anymore.
Because of SACK, header prediction also will be ineffective once
a connection suffers even light packet losses.
code removal aside, after this change processing always occurs in BH
context, this allows to experiment e.g. with doing bulk freeing of
skb heads when incoming ACKs clean packets from the retransmit queue.
There are no changes since the RFC, except in last patch (i missed
another no-longer-used mib counter). I also edited a few commit messages.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Sun, 30 Jul 2017 01:57:23 +0000 (03:57 +0200)]
tcp: remove unused mib counters
was used by tcp prequeue and header prediction.
TCPFORWARDRETRANS use was removed in january.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Sun, 30 Jul 2017 01:57:22 +0000 (03:57 +0200)]
tcp: remove CA_ACK_SLOWPATH
re-indent tcp_ack, and remove CA_ACK_SLOWPATH; it is always set now.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Sun, 30 Jul 2017 01:57:21 +0000 (03:57 +0200)]
tcp: remove header prediction
Like prequeue, I am not sure this is overly useful nowadays.
If we receive a train of packets, GRO will aggregate them if the
headers are the same (HP predates GRO by several years) so we don't
get a per-packet benefit, only a per-aggregated-packet one.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Sun, 30 Jul 2017 01:57:20 +0000 (03:57 +0200)]
tcp: remove low_latency sysctl
Was only checked by the removed prequeue code.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Sun, 30 Jul 2017 01:57:19 +0000 (03:57 +0200)]
tcp: reindent two spots after prequeue removal
These two branches are now always true, remove the conditional.
objdiff shows no changes.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Sun, 30 Jul 2017 01:57:18 +0000 (03:57 +0200)]
tcp: remove prequeue support
prequeue is a tcp receive optimization that moves part of rx processing
from bh to process context.
This only works if the socket being processed belongs to a process that
is blocked in recv on that socket.
In practice, this doesn't happen anymore that often because nowadays
servers tend to use an event driven (epoll) model.
Even normal client applications (web browsers) commonly use many tcp
connections in parallel.
This has measureable impact only in netperf (which uses plain recv and
thus allows prequeue use) from host to locally running vm (~4%), however,
there were no changes when using netperf between two physical hosts with
ixgbe interfaces.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 31 Jul 2017 02:28:08 +0000 (19:28 -0700)]
Merge branch 'net-sched-actions-improve-dump-performance'
Jamal Hadi Salim says:
====================
net sched actions: improve dump performance
Changes since v11:
------------------
1) Jiri - renames: nla_value to value and nla_selector to selector
2) Jiri - rename: validate_nla_bitfield_32 to validate_nla_bitfield_32
3) Jiri - rename: NLA_BITFIELD_32 to NLA_BITFIELD32
4) Jiri - remove unnecessary break when we return in case statement
5) Jiri - rename and move nla_get_bitfield_32 to an earlier patch
6) Jiri - xmas tree alignment of var declaration
7) Jiri - rename all declarations of bitfield 32 vars to be consistent ("bf")
8) Jiri - improve validate_nla_bitfield32() validation to disallow valid
bit values that are not selected by the selector
Changes since v10:
-----------------
1) Jiri: move type->validate_content() to its own patch
Jamal: decided to remove it altogether so we can get this patch set in.
2) Change name of NLA_FLAG_BITS to NLA_BITFIELD_32 based on discussions
with D. Ahern and Jiri. D. Ahern suggests to make this a variable bitmap size.
My analysis at this point is it too complex and i only need a few bit
flags. If we run out of bits someone else can create a new NLA_BITFIELD_XXX
and start using that. So please let this go.
3) Jamal - Add Suggested-by: Jiri for type NLA_BITFIELD_32
4) Jiri: Change name allowed_flags to tcaa_root_flags_allowed
5) Jiri: Introduce nla_get_flag_bits_values() helper instead of using
memcpy for retrieving nla_bitfield_32 fields.
Changes since v9:
-----------------
1) General consensus:
- remove again the use of BIT() to maintain uapi consistency ;->
1) Jiri:
- Add a new netlink type NLA_FLAG_BITS to check for valid bits
and use it instead of inline vetting (patch 4/4 now)
Changes since v8:
-----------------
1) Jiri:
- Add back the use of BIT(). Eventually fix iproute2 instead
- Rename VALID_TCA_FLAGS to VALID_TCA_ROOT_FLAGS
Changes since v7:
-----------------
Jamal:
No changes.
Patch 1 went out twice. Resend without two copies of patch 1
changes since v6:
-----------------
1) DaveM:
New rules for netlink messages. From now on we are going to start
checking for bits that are not used and rejecting anything we dont
understand. In the future this is going to require major changes
to user space code (tc etc). This is just a start.
To quote, David:
"
Again, bits you aren't using now, make sure userspace doesn't
set them. And if it does, reject.
"
Added checks for ensuring things work as above.
2) Jiri:
a)Fix the commit message to properly use "Fixes" description
b)Align assignments for nla_policy
Changes since v5:
----------------
0)
Remove use of BIT() because it is kernel specific. Requires a separate
patch (Jiri can submit that in his cleanups)
1)To paraphrase Eric D.
"memcpy(nla_data(count_attr), &cb->args[1], sizeof(u32));
wont work on 64bit BE machines because cb->args[1]
(which is 64 bit is larger in size than sizeof(u32))"
Fixed
2) Jiri Pirko
i) Spotted a bug fix mixed in the patch for wrong TLV
fix. Add patch 1/3 to address this. Make part of this
series because of dependencies.
ii) Rename ACT_LARGE_DUMP_ON -> TCA_FLAG_LARGE_DUMP_ON
iii) Satisfy Jiri's obsession against the noun "tcaa"
a)Rename struct nlattr *tcaa --> struct nlattr *tb
b)Rename TCAA_ACT_XXX -> TCA_ROOT_XXX
Changes since v4:
-----------------
1) Eric D.
pointed out that when all skb space is used up by the dump
there will be no space to insert the TCAA_ACT_COUNT attribute.
2) Jiri:
i) Change:
enum {
TCAA_UNSPEC,
TCAA_ACT_TAB,
TCAA_ACT_FLAGS,
TCAA_ACT_COUNT,
TCAA_ACT_TIME_FILTER,
__TCAA_MAX
};
to:
enum {
TCAA_UNSPEC,
TCAA_ACT_TAB,
TCAA_ACT_FLAGS,
TCAA_ACT_COUNT,
__TCAA_MAX,
};
Jiri plans to followup with the rest of the code to make the
style consistent.
ii) Rename attribute TCAA_ACT_TIME_FILTER --> TCAA_ACT_TIME_DELTA
iii) Rename variable jiffy_filter --> jiffy_since
iv) Rename msecs_filter --> msecs_since
v) get rid of unused cb->args[0] and rename cb->args[4] to cb->args[0]
Earlier Changes
----------------
- Jiri mostly on names of things.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Sun, 30 Jul 2017 17:24:52 +0000 (13:24 -0400)]
net sched actions: add time filter for action dumping
This patch adds support for filtering based on time since last used.
When we are dumping a large number of actions it is useful to
have the option of filtering based on when the action was last
used to reduce the amount of data crossing to user space.
With this patch the user space app sets the TCA_ROOT_TIME_DELTA
attribute with the value in milliseconds with "time of interest
since now". The kernel converts this to jiffies and does the
filtering comparison matching entries that have seen activity
since then and returns them to user space.
Old kernels and old tc continue to work in legacy mode since
they dont specify this attribute.
Some example (we have 400 actions bound to 400 filters); at
installation time. Using updated when tc setting the time of
interest to 120 seconds earlier (we see 400 actions):
prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
400
go get some coffee and wait for > 120 seconds and try again:
prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
0
Lets see a filter bound to one of these actions:
....
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2 success 1)
match
7f000002/
ffffffff at 12 (success 1 )
action order 1: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1145 sec used 802 sec
Action statistics:
Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
....
that coffee took long, no? It was good.
Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
1
More details please:
prompt$ hackedtc -s actions ls action gact since 120000
action order 0: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1270 sec used 30 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
And the filter?
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 4 success 2)
match
7f000002/
ffffffff at 12 (success 2 )
action order 1: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1324 sec used 84 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Sun, 30 Jul 2017 17:24:51 +0000 (13:24 -0400)]
net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
When you dump hundreds of thousands of actions, getting only 32 per
dump batch even when the socket buffer and memory allocations allow
is inefficient.
With this change, the user will get as many as possibly fitting
within the given constraints available to the kernel.
The top level action TLV space is extended. An attribute
TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
is set by the user indicating the user is capable of processing
these large dumps. Older user space which doesnt set this flag
doesnt get the large (than 32) batches.
The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
actions are put in a single batch. As such user space app knows how long
to iterate (independent of the type of action being dumped)
instead of hardcoded maximum of 32 thus maintaining backward compat.
Some results dumping 1.5M actions below:
first an unpatched tc which doesnt understand these features...
prompt$ time -p tc actions ls action gact | grep index | wc -l
1500000
real 1388.43
user 2.07
sys 1386.79
Now lets see a patched tc which sets the correct flags when requesting
a dump:
prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
1500000
real 178.13
user 2.02
sys 176.96
That is about 8x performance improvement for tc app which sets its
receive buffer to about 32K.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Sun, 30 Jul 2017 17:24:50 +0000 (13:24 -0400)]
net sched actions: Use proper root attribute table for actions
Bug fix for an issue which has been around for about a decade.
We got away with it because the enumeration was larger than needed.
Fixes:
7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Sun, 30 Jul 2017 17:24:49 +0000 (13:24 -0400)]
net netlink: Add new type NLA_BITFIELD32
Generic bitflags attribute content sent to the kernel by user.
With this netlink attr type the user can either set or unset a
flag in the kernel.
The value is a bitmap that defines the bit values being set
The selector is a bitmask that defines which value bit is to be
considered.
A check is made to ensure the rules that a kernel subsystem always
conforms to bitflags the kernel already knows about. i.e
if the user tries to set a bit flag that is not understood then
the _it will be rejected_.
In the most basic form, the user specifies the attribute policy as:
[ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data = &myvalidflags },
where myvalidflags is the bit mask of the flags the kernel understands.
If the user _does not_ provide myvalidflags then the attribute will
also be rejected.
Examples:
value = 0x0, and selector = 0x1
implies we are selecting bit 1 and we want to set its value to 0.
value = 0x2, and selector = 0x2
implies we are selecting bit 2 and we want to set its value to 1.
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 30 Jul 2017 17:36:05 +0000 (19:36 +0200)]
net: fec: Allow reception of frames bigger than 1522 bytes
The FEC Receive Control Register has a 14 bit field indicating the
longest frame that may be received. It is being set to 1522. Frames
longer than this are discarded, but counted as being in error.
When using DSA, frames from the switch has an additional header,
either 4 or 8 bytes if a Marvell switch is used. Thus a full MTU frame
of 1522 bytes received by the switch on a port becomes 1530 bytes when
passed to the host via the FEC interface.
Change the maximum receive size to 2048 - 64, where 64 is the maximum
rx_alignment applied on the receive buffer for AVB capable FEC
cores. Use this value also for the maximum receive buffer size. The
driver is already allocating a receive SKB of 2048 bytes, so this
change should not have any significant effects.
Tested on imx51, imx6, vf610.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 30 Jul 2017 20:11:06 +0000 (22:11 +0200)]
net: fec: Issue error for missing but expected PHY
If the PHY is missing but expected, e.g. because of a typ0 in the dt
file, it is not possible to open the interface. ip link returns:
RTNETLINK answers: No such device
It is not very obvious what the problem is. Add a netdev_err() in this
case to make it easier to debug the issue.
[ 21.409385] fec
2188000.ethernet eth0: Unable to connect to phy
RTNETLINK answers: No such device
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 31 Jul 2017 02:23:29 +0000 (19:23 -0700)]
Merge branch 'dsa-lan9303-Fix-MDIO-issues'
Egil Hjelmeland says:
====================
net: dsa: lan9303: Fix MDIO issues.
This series fix the MDIO interface for the lan9303 DSA driver.
Bugs found after testing on actual HW.
This series is extracted from the first patch of my first large
series. Significant changes from that version are:
- use mdiobus_write_nested, mdiobus_read_nested.
- EXPORT lan9303_indirect_phy_ops
Unfortunately I do not have access to i2c based system for
testing.
Changes from first version:
- Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Sun, 30 Jul 2017 17:58:56 +0000 (19:58 +0200)]
net: dsa: lan9303: MDIO access phy registers directly
Indirect access (PMI) to phy register only work in I2C mode. In
MDIO mode phy registers must be accessed directly. Introduced
struct lan9303_phy_ops to handle the two modes.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Sun, 30 Jul 2017 17:58:55 +0000 (19:58 +0200)]
net: dsa: lan9303: Renamed indirect phy access functions
Preparing for the following fix of MDIO phy access:
Renamed functions that access PHY 1 and 2 indirectly through PMI
registers.
lan9303_port_phy_reg_wait_for_completion() to
lan9303_indirect_phy_wait_for_completion()
lan9303_port_phy_reg_read() to
lan9303_indirect_phy_read()
lan9303_port_phy_reg_write() to
lan9303_indirect_phy_write()
Also changed "val" parameter of lan9303_indirect_phy_write() to u16,
for clarity.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Sun, 30 Jul 2017 17:58:54 +0000 (19:58 +0200)]
net: dsa: lan9303: Multiply by 4 to get MDIO register
lan9303_mdio_write()/_read() must multiply register number by 4 to get
offset.
Added some commments to the register definitions.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Sun, 30 Jul 2017 17:58:53 +0000 (19:58 +0200)]
net: dsa: lan9303: Fix lan9303_detect_phy_setup() for MDIO
Handle that MDIO read with no response return 0xffff.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 30 Jul 2017 06:23:45 +0000 (23:23 -0700)]
Merge branch 'ethtool-fec'
Roopa Prabhu says:
====================
ethtool: support for forward error correction mode setting on a link
Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds. Various networking devices
which support 25G/40G/100G provides ability to manage supported FEC
modes and the lack of FEC encoding control and reporting today is a
source for interoperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of base link
codeword.
This patch set intends to provide option under ethtool to manage and
report FEC encoding settings for networking devices as per IEEE 802.3
bj, bm and by specs.
v2 :
- minor patch format fixes and typos pointed out by Andrew
- there was a pending discussion on the use of 'auto' vs
'automatic' for fec settings. I have left it as 'auto'
because in most cases today auto is used in place of
automatic to represent automatically generated values.
We use it in other networking config too. I would prefer
leaving it as auto.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Thu, 27 Jul 2017 23:47:28 +0000 (16:47 -0700)]
cxgb4: ethtool forward error correction management support
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Thu, 27 Jul 2017 23:47:27 +0000 (16:47 -0700)]
cxgb4: core hardware/firmware support for Forward Error Correction on a link
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vidya Sagar Ravipati [Thu, 27 Jul 2017 23:47:26 +0000 (16:47 -0700)]
net: ethtool: add support for forward error correction modes
Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds. Various networking devices
which support 25G/40G/100G provides ability to manage supported FEC
modes and the lack of FEC encoding control and reporting today is a
source for interoperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of
base link codeword.
This patch set intends to provide option under ethtool to manage
and report FEC encoding settings for networking devices as per
IEEE 802.3 bj, bm and by specs.
set-fec/show-fec option(s) are designed to provide control and
report the FEC encoding on the link.
SET FEC option:
root@tor: ethtool --set-fec swp1 encoding [off | RS | BaseR | auto]
Encoding: Types of encoding
Off : Turning off any encoding
RS : enforcing RS-FEC encoding on supported speeds
BaseR : enforcing Base R encoding on supported speeds
Auto : IEEE defaults for the speed/medium combination
Here are a few examples of what we would expect if encoding=auto:
- if autoneg is on, we are expecting FEC to be negotiated as on or off
as long as protocol supports it
- if the hardware is capable of detecting the FEC encoding on it's
receiver it will reconfigure its encoder to match
- in absence of the above, the configuration would be set to IEEE
defaults.
>From our understanding , this is essentially what most hardware/driver
combinations are doing today in the absence of a way for users to
control the behavior.
SHOW FEC option:
root@tor: ethtool --show-fec swp1
FEC parameters for swp1:
Active FEC encodings: RS
Configured FEC encodings: RS | BaseR
ETHTOOL DEVNAME output modification:
ethtool devname output:
root@tor:~# ethtool swp1
Settings for swp1:
root@hpe-7712-03:~# ethtool swp18
Settings for swp18:
Supported ports: [ FIBRE ]
Supported link modes: 40000baseCR4/Full
40000baseSR4/Full
40000baseLR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: [RS | BaseR | None | Not reported]
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: [RS | BaseR | None | Not reported]
<<<< One or more FEC modes
Speed: 100000Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 106
Transceiver: internal
Auto-negotiation: off
Link detected: yes
This patch includes following changes
a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
the new get_fecparam/set_fecparam callbacks, provides support
for configuration of forward error correction modes.
b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
are defined so that users can configure these fec modes for supported
and advertising fields as part of link autonegotiation.
Signed-off-by: Vidya Sagar Ravipati <vidya.chowdary@gmail.com>
Signed-off-by: Dustin Byford <dustin@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Jul 2017 22:25:44 +0000 (15:25 -0700)]
Merge branch 'netvsc-minor-fixes-and-optimization'
Stephen Hemminger says:
====================
netvsc: minor fixes and optimization
This is a subset of earlier submission with a few more fixes
found during testing. The are two small optimizations, one is to
better manage the receive completion ring, and the other is removing
one unneeded level of indirection.
Will submit the improved VF support and buffer sizing in a later
patch so they get more review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 28 Jul 2017 15:59:47 +0000 (08:59 -0700)]
netvsc: signal host if receive ring is emptied
Latency improvement related to NAPI conversion.
If all packets are processed from receive ring then need
to signal host.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 28 Jul 2017 15:59:46 +0000 (08:59 -0700)]
netvsc: fix error unwind on device setup failure
If setting receive buffer fails, the error unwind would cause
kernel panic because it was not correctly doing RCU and NAPI
unwind. RCU'd pointer needs to be reset to NULL, and NAPI needs
to be disabled not deleted.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 28 Jul 2017 15:59:45 +0000 (08:59 -0700)]
netvsc: optimize receive completions
Optimize how receive completion ring are managed.
* Allocate only as many slots as needed for all buffers from host
* Allocate before setting up sub channel for better error detection
* Don't need to keep copy of initial receive section message
* Precompute the watermark for when receive flushing is needed
* Replace division with conditional test
* Replace atomic per-device variable with per-channel check.
* Handle corner case where receive completion send
fails if ring buffer to host is full.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 28 Jul 2017 15:59:44 +0000 (08:59 -0700)]
netvsc: remove unnecessary indirection of page_buffer
The internal API was passing struct hv_page_buffer **
when only simple struct hv_page_buffer * was necessary
for passing an array.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 28 Jul 2017 15:59:43 +0000 (08:59 -0700)]
netvsc: don't print pointer value in error message
Using %p to print pointer to packet meta-data doesn't give any
good info, and exposes kernel memory offsets.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 28 Jul 2017 15:59:42 +0000 (08:59 -0700)]
netvsc: fix warnings reported by lockdep
This includes a bunch of fixups for issues reported by
lockdep.
* ethtool routines can assume RTNL
* send is done with RCU lock (and BH disable)
* avoid refetching internal device struct (netvsc)
instead pass it as a parameter.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 28 Jul 2017 15:59:41 +0000 (08:59 -0700)]
netvsc: fix return value for set_channels
The error and normal case got swapped.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Manlunas [Thu, 27 Jul 2017 19:32:28 +0000 (12:32 -0700)]
liquidio: bump up driver version to match newer NIC firmware
Bump up driver version to match newer NIC firmware. Also update
nic_rx_stats (a struct common to host driver and firmware) by adding a new
field: fw_total_fwd_bytes.
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Thu, 27 Jul 2017 10:29:51 +0000 (06:29 -0400)]
bnxt_re: add MAY_USE_DEVLINK dependency
bnxt_en depends on MAY_USE_DEVLINK; this is used to force bnxt_en
to be =m when DEVLINK is =m.
Now, bnxt_re selects bnxt_en. Unless bnxt_re also explicitly calls
out dependency on MAY_USE_DEVLINK, Kconfig does not force bnxt_re
to be =m when DEVLINK is =m, causing the following error:
drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.o: In function
`bnxt_dl_register':
bnxt_vfr.c:(.text+0x1440): undefined reference to `devlink_alloc'
bnxt_vfr.c:(.text+0x14c0): undefined reference to `devlink_register'
bnxt_vfr.c:(.text+0x14e0): undefined reference to `devlink_free'
drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.o: In function
`bnxt_dl_unregister':
bnxt_vfr.c:(.text+0x1534): undefined reference to `devlink_unregister'
bnxt_vfr.c:(.text+0x153c): undefined reference to `devlink_free'
Fix this by adding MAY_USE_DEVLINK dependency in bnxt_re.
Fixes:
4ab0c6a8ffd7 ("bnxt_en: add support to enable VF-representors")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Thu, 27 Jul 2017 00:32:07 +0000 (17:32 -0700)]
bpf: testing: fix devmap tests
Apparently through one of my revisions of the initial patches
series I lost the devmap test. We can add more testing later but
for now lets fix the simple one we have.
Fixes:
546ac1ffb70d "bpf: add devmap, a map for storing net device references"
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Jul 2017 21:02:07 +0000 (14:02 -0700)]
Merge branch 'moxa-Fix-style-issues'
SZ Lin says:
====================
net: moxa: Fix style issues
This patch set fixs the WARNINGs found by the checkpatch.pl tool
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
SZ Lin [Sat, 29 Jul 2017 10:42:39 +0000 (18:42 +0800)]
net: moxa: Add spaces preferred around that '{+,-}'
This patch fixes all checkpatch occurences of
"CHECK: spaces preferred around that '{+,-}' (ctx:VxV)"
in moxart_ether code.
Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SZ Lin [Sat, 29 Jul 2017 10:42:38 +0000 (18:42 +0800)]
net: moxa: Fix for typo in comment to function moxart_mac_setup_desc_ring()
Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SZ Lin [Sat, 29 Jul 2017 10:42:37 +0000 (18:42 +0800)]
net: moxa: Remove extra space after a cast
No space is necessary after a cast
This warning is found using checkpatch.pl
Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SZ Lin [Sat, 29 Jul 2017 10:42:36 +0000 (18:42 +0800)]
net: moxa: Fix comparison to NULL could be written with !
Fixed coding style for null comparisons in moxart_ether driver
to be more consistent with the rest of the kernel coding style
Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SZ Lin [Sat, 29 Jul 2017 10:42:35 +0000 (18:42 +0800)]
net: moxa: Prefer 'unsigned int' to bare use of 'unsigned'
Use 'unsigned int' instead of 'unsigned'
This warning is found using checkpatch.pl
Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SZ Lin [Sat, 29 Jul 2017 10:42:34 +0000 (18:42 +0800)]
net: moxa: Remove braces from single-line body
Remove unnecessary braces from single-line if statement
This warning is found using checkpatch.pl
Signed-off-by: SZ Lin <sz.lin@moxa.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Jul 2017 18:22:58 +0000 (11:22 -0700)]
Merge branch 'smc-get-rid-of-unsafe_global_rkey'
Ursula Braun says:
====================
net/smc: get rid of unsafe_global_rkey
The smc code uses the unsafe_global_rkey, exposing all memory for
remote reads and writes once a connection is established.
Here is now a patch series to get rid of unsafe_global_rkey usage.
Main idea is to switch to SG-logic and separate memory regions for RMBs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 28 Jul 2017 11:56:22 +0000 (13:56 +0200)]
net/smc: synchronize buffer usage with device
Usage of send buffer "sndbuf" is synced
(a) before filling sndbuf for cpu access
(b) after filling sndbuf for device access
Usage of receive buffer "RMB" is synced
(a) before reading RMB content for cpu access
(b) after reading RMB content for device access
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 28 Jul 2017 11:56:21 +0000 (13:56 +0200)]
net/smc: cleanup function __smc_buf_create()
Split function __smc_buf_create() for better readability.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 28 Jul 2017 11:56:20 +0000 (13:56 +0200)]
net/smc: common functions for RMBs and send buffers
Creation and deletion of SMC receive and send buffers shares a high
amount of common code . This patch introduces common functions to get
rid of duplicate code.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 28 Jul 2017 11:56:19 +0000 (13:56 +0200)]
net/smc: introduce sg-logic for send buffers
SMC send buffers are processed the same way as RMBs. Since RMBs have
been converted to sg-logic, do the same for send buffers.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 28 Jul 2017 11:56:18 +0000 (13:56 +0200)]
net/smc: remove Kconfig warning
Now separate memory regions are created and registered for separate
RMBs. The unsafe_global_rkey of the protection domain is no longer
used. Thus the exposing memory warning can be removed.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 28 Jul 2017 11:56:17 +0000 (13:56 +0200)]
net/smc: register RMB-related memory region
A memory region created for a new RMB must be registered explicitly,
before the peer can make use of it for remote DMA transfer.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 28 Jul 2017 11:56:16 +0000 (13:56 +0200)]
net/smc: use separate memory regions for RMBs
SMC currently uses the unsafe_global_rkey of the protection domain,
which exposes all memory for remote reads and writes once a connection
is established. This patch introduces separate memory regions with
separate rkeys for every RMB. Now the unsafe_global_rkey of the
protection domain is no longer needed.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 28 Jul 2017 11:56:15 +0000 (13:56 +0200)]
net/smc: introduce sg-logic for RMBs
The follow-on patch makes use of ib_map_mr_sg() when introducing
separate memory regions for RMBs. This function is based on
scatterlists; thus this patch introduces scatterlists for RMBs.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 28 Jul 2017 11:56:14 +0000 (13:56 +0200)]
net/smc: shorten local bufsize variables
Initiate the coming rework of SMC buffer handling with this
small code cleanup. No functional changes here.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 28 Jul 2017 11:56:13 +0000 (13:56 +0200)]
net/smc: serialize connection creation in all cases
If a link group for a new server connection exists already, the mutex
serializing the determination of link groups is given up early.
The coming registration of memory regions benefits from the serialization
as well, if the mutex is held till connection creation is finished.
This patch postpones the unlocking of the link group creation mutex.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Jul 2017 01:52:57 +0000 (18:52 -0700)]
Merge branch 'inet6_protocol-const'
Julia Lawall says:
====================
constify inet6_protocol structures
The inet6_protocol structure is only passed as the first argument to
inet6_add_protocol or inet6_del_protocol, both of which are declared as
const. Thus the inet6_protocol structure itself can be const.
Done with the help of Coccinelle.
// <smpl>
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct inet6_protocol i@p = { ... };
@ok1@
identifier r.i;
expression e1;
position p;
@@
\(inet6_add_protocol\|inet6_del_protocol\)(&i@p,...)
@bad@
position p != {r.p,ok1.p};
identifier r.i;
struct inet6_protocol e;
@@
e@i@p
@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
struct inet6_protocol i = { ... };
// </smpl>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Fri, 28 Jul 2017 20:18:58 +0000 (22:18 +0200)]
l2tp: constify inet6_protocol structures
The inet6_protocol structure is only passed as the first argument to
inet6_add_protocol or inet6_del_protocol, both of which are declared as
const. Thus the inet6_protocol structure itself can be const.
Also drop __read_mostly on the newly const structure.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Fri, 28 Jul 2017 20:18:57 +0000 (22:18 +0200)]
ipv6: constify inet6_protocol structures
The inet6_protocol structure is only passed as the first argument to
inet6_add_protocol or inet6_del_protocol, both of which are declared as
const. Thus the inet6_protocol structure itself can be const.
Also drop __read_mostly where present on the newly const structures.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Jul 2017 00:29:35 +0000 (17:29 -0700)]
Merge branch 'liquidio-standardization-and-cleanup'
Rick Farrington says:
====================
liquidio: standardization and cleanup
This patchset corrects some non-standard macro usage.
1. Replaced custom MIN macro with use of standard 'min_t'.
2. Removed cryptic and misleading macro 'CAST_ULL'.
change log:
V1 -> V2:
1. Add driver cleanup of macro 'CAST_ULL'.
V2 -> V3:
1. Remove extra parentheses from previous usage of macro 'CAST_ULL'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Farrington [Wed, 26 Jul 2017 19:11:09 +0000 (12:11 -0700)]
liquidio: cleanup: removed cryptic and misleading macro
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Farrington [Wed, 26 Jul 2017 19:10:48 +0000 (12:10 -0700)]
liquidio: standardization: use min_t instead of custom macro
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 26 Jul 2017 19:05:38 +0000 (12:05 -0700)]
net: phy: Remove stale comments referencing timer
Since commit
a390d1f379cf ("phylib: convert state_queue work to
delayed_work"), the PHYLIB state machine was converted to use delayed
workqueues, yet some functions were still referencing the PHY library
timer in their comments, fix that and remove the now unused
linux/timer.h include.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Jul 2017 00:26:26 +0000 (17:26 -0700)]
Merge branch 'nfp-extend-firmware-request-logic'
Jakub Kicinski says:
====================
nfp: extend firmware request logic
We have been pondering for some time how to support loading different
application firmwares onto NFP. We want to support both users selecting
one of the firmware images provided by Netronome (which are optimized
for different use cases each) as well as firmware images created by
users themselves or other companies.
In the end we decided to go with the simplest solution - depending on
the right firmware image being placed in /lib/firmware. This vastly
simplifies driver logic and also doesn't require any new API.
Different NICs on one system may want to run different applications
therefore we first try to load firmware specific to the device (by
serial number of PCI slot) and if not present try the device model
based name we have been using so far.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 26 Jul 2017 18:09:48 +0000 (11:09 -0700)]
nfp: only use direct firmware requests
request_firmware() will fallback to user space helper and may cause
long delays when driver is loaded if udev doesn't correctly handle
FW requests. Since we never really made use of the user space
helper functionality switch to the simpler request_firmware_direct()
call. The side effect of this change is that no warning will be
printed when the FW image does not exists. To help users figure
out which FW file is missing print a info message when we request
each file.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 26 Jul 2017 18:09:47 +0000 (11:09 -0700)]
nfp: look for firmware image by device serial number and PCI name
We generally look up firmware by card type, but that doesn't allow
users who have more than one card of the same type in their system
to select firmware per adapter.
Unfortunately user space firmware helper seems fraught with
difficulties and to be on its way out. In particular support for
handling firmware uevents have been dropped from systemd and most
distributions don't enable the FW fallback by default any more.
To allow users selecting firmware for a particular device look up
firmware names by serial and pci_name(). Use the direct lookup to
disable generating uevents when enabled in Kconfig and not print
any warnings to logs if adapter-specific files are missing. Users
can place in /lib/firmware/netronome files named:
pci-${pci_name}.nffw
serial-${serial}.nffw
to target a specific card. E.g.:
pci-0000:04:00.0.nffw
pci-0000:82:00.0.nffw
serial-00-aa-bb-11-22-33-10-ff.nffw
We use the full serial number including the interface id, as it
appears in lspci output (bytes separated by '-').
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 26 Jul 2017 18:09:46 +0000 (11:09 -0700)]
nfp: remove the probe deferral when FW not present
We use a hack to defer probe when firmware was not pre-loaded
or found on disk. This helps in case users forgot to include
firmware in initramfs, the driver will most likely get another
shot at probing after real root is mounted.
This is not for what EPROBE_DEFER is supposed to be used, and
when FW is completely missing every time new device is probed
NFP will reprobe spamming kernel logs.
Remove this hack, users will now have to make sure the right
firmware image is present in initramfs if nfp.ko is placed
there or built in.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Jul 2017 07:05:23 +0000 (00:05 -0700)]
Merge branch 'qed-next'
Manish Chopra says:
====================
qed/qede: Enhancements
This patch series adds these below features support in qed/qede
1) Ntuple filter configuration [via ethtool -n/N]
2) EEE (energy efficient ethernet) support [ethtool --set-eee/show-eee]
3) Coalescing configuration support for VFs [via ethtool -c/C]
Please consider applying this to "net-next"
V1->V2:
* Fixes below Kbuild test robot warning.
drivers/net//ethernet/qlogic/qed/qed_l2.c:
In function 'qed_get_queue_coalesce':
drivers/net//ethernet/qlogic/qed/qed_l2.c:2137:8: error:
implicit declaration of function 'qed_vf_pf_get_coalesce'
[-Werror=implicit-function-declaration]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Verma [Wed, 26 Jul 2017 13:07:15 +0000 (06:07 -0700)]
qed: enhanced per queue max coalesce value.
Maximum coalesce per Rx/Tx queue is extended from
255 to 511.
Signed-off-by: Rahul Verma <rahul.verma@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Verma [Wed, 26 Jul 2017 13:07:14 +0000 (06:07 -0700)]
qed: Read per queue coalesce from hardware
Retrieve the actual coalesce value from hardware for every Rx/Tx
queue, instead of Rx/Tx coalesce value cached during set coalesce.
Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Verma [Wed, 26 Jul 2017 13:07:13 +0000 (06:07 -0700)]
qed: Add support for vf coalesce configuration.
This patch add the ethtool support to set RX/Tx coalesce
value to the VF associated Rx/Tx queues.
Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Wed, 26 Jul 2017 13:07:12 +0000 (06:07 -0700)]
qede: Add ethtool support for Energy efficient ethernet.
The patch adds ethtool callback implementations for querying/configuring
the Energy Efficient Ethernet (EEE) parameters.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Wed, 26 Jul 2017 13:07:11 +0000 (06:07 -0700)]
qed: Add support for Energy efficient ethernet.
The patch adds required driver support for reading/configuring the
Energy Efficient Ethernet (EEE) parameters.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chopra, Manish [Wed, 26 Jul 2017 13:07:10 +0000 (06:07 -0700)]
qed/qede: Add setter APIs support for RX flow classification
This patch adds support for adding and deleting rx flow
classification rules. Using this user can classify RX flow
constituting of TCP/UDP 4-tuples [src_ip/dst_ip and src_port/dst_port]
to be steered on a given RX queue
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chopra, Manish [Wed, 26 Jul 2017 13:07:09 +0000 (06:07 -0700)]
qede: Add getter APIs support for RX flow classification
This patch adds support for ethtool getter APIs to query
RX flow classification rules.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Jul 2017 23:58:45 +0000 (16:58 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-07-25
This series contains updates to i40e and i40evf only.
Gustavo Silva fixes a variable assignment, where the incorrect variable
was being used to store the error parameter.
Carolyn provides a fix for a problem found in systems when entering S4
state, by ensuring that the misc vector's IRQ is disabled as well.
Jake removes the single-threaded restriction on the module workqueue,
which was causing issues with events such as CORER. Does some future
proofing, by changing how the driver displays the UDP tunnel type.
Paul adds a retry in releasing resources if the admin queue times out
during the first attempt to release the resources.
Jesse fixes up references to 32bit timspec, since there are a small set
of errors on 32 bit, so we need to be using the right calls for dealing
with timespec64 variables. Cleaned up code indentation and corrected
an "if" conditional check, as well as making the code flow more clear.
Cast or changed the types to remove warnings for comparing signed and
unsigned types. Adds missing includes in i40evf, which were being used
but were not being directly included.
Daniel Borkmann fixes i40e to fill the XDP prog_id with the id just like
other XDP enabled drivers, so that on dump we can retrieve the attached
program based on the id and dump BPF insns, opcodes, etc back to user
space.
Tushar Dave adds le32_to_cpu while evaluating the hardware descriptor
fields, since they are in little-endian format. Also removed
unnecessary "__packed" to a couple of i40evf structures.
Stefan Assmann fixes an issue when an administratively set MAC was set
and should now be switched back to 00:00:00:00:00:00, the pf_set_mac
flag is not being toggled back to false.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Jul 2017 23:36:29 +0000 (16:36 -0700)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2017-07-25
This series contains updates to ixgbe only.
Tony provides all of the changes in the series, starting with adding a
check to ensure that adding a MAC filter was successful, before setting the
MACVLAN. In order to receive notifications of link configurations of the
external PHY and support the configuration of the internal iXFI link on
X552 devices, Tony enables LASI interrupts. Update the iXFI driver code
flow, since the MAC register NW_MNG_IF_SEL fields have been redefined for
X553 devices, so add MAC checks for iXFI flows. Added additional checks
for flow control autonegotiation, since it is not support for X553 fiber
and XFI devices.
v2: removed unnecessary parens noticed by David Miller in patch 6 of the
series.
v3: dropped patch 6 of the original series, while we work out a more
generic solution for malicious driver detection (MDD) support.
v4: updated patch 1 of the series with the comments from Joe Perches which
were:
- switched logic to return on error
- return 0 on success
- declare retval as an integer
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 25 Jul 2017 18:17:11 +0000 (11:17 -0700)]
bpf: install libbpf headers on 'make install'
Add a new target to install the bpf.h header to $(prefix)/include/bpf/
directory. This is necessary to build standalone applications using
libbpf, without the need to clone the kernel sources and point to them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 26 Jul 2017 07:55:33 +0000 (09:55 +0200)]
hamradio: dmascc: avoid -Wformat-overflow warning
gcc warns that the device name might overflow:
drivers/net/hamradio/dmascc.c: In function 'dmascc_init':
drivers/net/hamradio/dmascc.c:584:22: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
sprintf(dev->name, "dmascc%i", 2 * n + i);
drivers/net/hamradio/dmascc.c:584:3: note: 'sprintf' output between 8 and 17 bytes into a destination of size 16
sprintf(dev->name, "dmascc%i", 2 * n + i);
>From the static data in this file, I can tell that the index is
strictly limited to 16, so it won't overflow. This simply changes
the sprintf() to snprintf(), which is a good idea in general,
and shuts up this warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Assmann [Fri, 23 Jun 2017 07:46:24 +0000 (09:46 +0200)]
i40e: handle setting administratively set MAC address back to zero
When an administratively set MAC was previously set and should now be
switched back to 00:00:00:00:00:00 the pf_set_mac flag did not get
toggled back to false.
As a result VFs were still treated as if an administratively set MAC was
present.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tushar Dave [Thu, 22 Jun 2017 17:12:11 +0000 (10:12 -0700)]
i40evf: remove unnecessary __packed
This is similar to 'commit
9588397d24eec ("i40e: remove unnecessary
__packed")' to avoid unaligned access.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tushar Dave [Thu, 22 Jun 2017 16:44:32 +0000 (09:44 -0700)]
i40evf: Use le32_to_cpu before evaluating HW desc fields
i40e hardware descriptor fields are in little-endian format. Driver
must use le32_to_cpu while evaluating these fields otherwise on
big-endian arch we end up evaluating incorrect values, cause errors
like:
i40evf 0000:03:0a.0: Expected response 24 from PF, received
402653184
i40evf 0000:03:0a.1: Expected response 7 from PF, received
117440512
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Daniel Borkmann [Sat, 24 Jun 2017 19:13:52 +0000 (21:13 +0200)]
i40e: report BPF prog id during XDP_QUERY_PROG
Fill the XDP prog_id with the id just like we do in other XDP enabled
drivers such as ixgbe. This is needed so that on dump we can retrieve
the attached program based on the id, and dump BPF insns, opcodes, etc
back to user space. Only XDP driver missing this is currently i40e.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Tue, 20 Jun 2017 22:17:01 +0000 (15:17 -0700)]
i40evf: add some missing includes
These includes were all being used in the driver, but weren't
being directly included.
Since the current advised method is to directly include anything
that you need, this implements that.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 20 Jun 2017 22:17:00 +0000 (15:17 -0700)]
i40e: display correct UDP tunnel type name
The i40e driver attempts to display the UDP tunnel name by doing a check
against the type, where for non-zero types we use "vxlan" and for zero
type we use "geneve". This is not future proof, because if new tunnel
types get added, we'll incorrectly label them. It also depends on the
value of UDP_TUNNEL_TYPE_GENEVE == 0, which is brittle.
Instead, replace this with a function that can return a constant string
depending on the type. For now we'll use "unknown" for types we don't
know about, and we can expand this in the future if new types get added.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Tue, 20 Jun 2017 22:16:59 +0000 (15:16 -0700)]
i40e/i40evf: remove mismatched type warnings
Compiler reported several places where driver compared
signed and unsigned types. Cast or change the types to remove
the warnings.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Tue, 20 Jun 2017 22:16:58 +0000 (15:16 -0700)]
i40e/i40evf: make IPv6 ATR code clearer
This just reorders some local vars and makes the code flow
clearer.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Tue, 20 Jun 2017 22:16:57 +0000 (15:16 -0700)]
i40e: fix odd formatting and indent
The compiler warned on an oddly indented bit of code, and when
investigating that, noted that the functions themselves had
an odd flow. The if condition was checked, and would exclude
a call to AQ, but then the aq_ret would be checked unconditionally
which just looks really weird, and is likely to cause objections.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Tue, 20 Jun 2017 22:16:56 +0000 (15:16 -0700)]
i40e: fix up 32 bit timespec references
As it turns out there was only a small set of errors
on 32 bit, and we just needed to be using the right calls
for dealing with timespec64 variables.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Paul M Stillwell Jr [Tue, 20 Jun 2017 22:16:55 +0000 (15:16 -0700)]
i40e: Handle admin Q timeout when releasing NVM
There are some rare cases where the release resource call will return an
admin Q timeout. In these cases the code needs to try to release the
resource again until it succeeds or it times out.
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 20 Jun 2017 22:16:54 +0000 (15:16 -0700)]
i40e: remove WQ_UNBOUND and the task limit of our workqueue
During certain events such as a CORER, multiple devices will run a work
task to handle some cleanup. This can cause issues due to
a single-threaded workqueue which can mean that a device doesn't cleanup
in time. Prevent this by removing the single-threaded restriction on the
module workqueue. This avoids the need to add more complex yielding
logic in our service task routine. This is also similar to what other
drivers such as fm10k do.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Carolyn Wyborny [Tue, 20 Jun 2017 22:16:53 +0000 (15:16 -0700)]
i40e: Fix for trace found with S4 state
This patch fixes a problem found in systems when entering
S4 state. This patch fixes the problem by ensuring that
the misc vector's IRQ is disabled as well. Without this
patch a stack trace can be seen upon entering S4 state.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Gustavo A R Silva [Thu, 15 Jun 2017 02:38:26 +0000 (21:38 -0500)]
i40e: fix incorrect variable assignment
Fix incorrect variable assignment.
Based on line 1511: aq_ret = I40_ERR_PARAM; the correct variable to be
used in this instance is aq_ret instead of ret. Also, variable ret is
updated at line 1602 just before return, so assigning a value to this
variable in this code block is useless.
Addresses-Coverity-ID:
1397693
Signed-off-by: Gustavo A R Silva <garsilva@embeddedor.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Arnd Bergmann [Tue, 25 Jul 2017 15:35:50 +0000 (17:35 +0200)]
virtio-net: mark PM functions as __maybe_unused
After removing the reset function, the freeze and restore functions
are now unused when CONFIG_PM_SLEEP is disabled:
drivers/net/virtio_net.c:1881:12: error: 'virtnet_restore_up' defined but not used [-Werror=unused-function]
static int virtnet_restore_up(struct virtio_device *vdev)
drivers/net/virtio_net.c:1859:13: error: 'virtnet_freeze_down' defined but not used [-Werror=unused-function]
static void virtnet_freeze_down(struct virtio_device *vdev)
A more robust way to do this is to remove the #ifdef around the callers
and instead mark them as __maybe_unused. The compiler will now just
silently drop the unused code.
Fixes:
4941d472bf95 ("virtio-net: do not reset during XDP set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen [Wed, 7 Jun 2017 21:36:21 +0000 (14:36 -0700)]
ixgbe: Disable flow control for XFI
Flow control autonegotiation is not supported for XFI. Make sure that
ixgbe_device_supports_autoneg_fc() returns false and
hw->fc.disable_fc_autoneg is set to true to avoid running the fc_autoneg
function for that device.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Wed, 7 Jun 2017 21:36:20 +0000 (14:36 -0700)]
ixgbe: Do not support flow control autonegotiation for X553
Flow control autonegotiation is not supported for fiber on X553. Add
device ID checks in ixgbe_device_supports_autoneg_fc() to return the
appropriate value.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Wed, 7 Jun 2017 21:36:19 +0000 (14:36 -0700)]
ixgbe: Update NW_MNG_IF_SEL support for X553
The MAC register NW_MNG_IF_SEL fields have been redefined for
X553. These changes impact the iXFI driver code flow. Since iXFI is
only supported in X552, add MAC checks for iXFI flows.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Wed, 7 Jun 2017 21:36:18 +0000 (14:36 -0700)]
ixgbe: Enable LASI interrupts for X552 devices
Enable LASI interrupts on X552 devices in order to receive notifications of
link configurations of the external PHY and support the configuration of
the internal iXFI link since iXFI does not support auto-negotiation. This
is not required for X553 devices; add a check to avoid enabling LASI
interrupts for X553 devices.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Wed, 19 Jul 2017 22:00:26 +0000 (15:00 -0700)]
ixgbe: Ensure MAC filter was added before setting MACVLAN
This patch adds a check to ensure that adding the MAC filter was
successful before setting the MACVLAN. If it was unsuccessful, propagate
the error.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Tue, 25 Jul 2017 19:48:20 +0000 (12:48 -0700)]
Merge branch 'bnxt_en-Fix-kbuild-errors-and-rename-phys_port_name'
Michael Chan says:
====================
bnxt_en: Fix kbuild errors and rename phys_port_name.
Fix 2 more kbuild errors (the first one already fixed by DaveM), and rename
the physical port name.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Tue, 25 Jul 2017 17:28:41 +0000 (13:28 -0400)]
bnxt_en: fix switchdev port naming for external-port-rep and vf-reps
Fix the phys_port_name for the external physical port to be in
"pA" format and that of VF-rep to be in "pCvfD" format as
suggested by Jakub Kicinski.
Fixes:
c124a62ff2dd ("bnxt_en: add support for port_attr_get and get_phys_port_name")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Tue, 25 Jul 2017 17:28:40 +0000 (13:28 -0400)]
bnxt_en: use SWITCHDEV_SET_OPS() for setting vf_rep_switchdev_ops
This fixes the build error:
‘struct net_device’ has no member named ‘switchdev_ops’
Reported-by: kbuild test robot <lkp@intel.com>
Fixes:
c124a62ff2dd ("bnxt_en: add support for port_attr_get and and get_phys_port_name")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Tue, 25 Jul 2017 17:28:39 +0000 (13:28 -0400)]
bnxt_en: include bnxt_vfr.c code under CONFIG_BNXT_SRIOV switch
And define empty functions in bnxt_vfr.h when CONFIG_BNXT_SRIOV is not
defined.
This fixes build error when CONFIG_BNXT_SRIOV is switched off:
>> drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:165:16: error: 'struct
>> bnxt' has no member named 'sriov_lock'
Reported-by: kbuild test robot <lkp@intel.com>
Fixes:
4ab0c6a8ffd7 ("bnxt_en: add support to enable VF-representors")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 25 Jul 2017 19:31:38 +0000 (12:31 -0700)]
Merge branch 'net-warnings'
Stephen Hemminger says:
====================
network related warning fixes
Various fixes for warnings in network code and drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>