David Howells [Mon, 4 Apr 2016 13:00:32 +0000 (14:00 +0100)]
rxrpc: Rework peer object handling to use hash table and RCU
Rework peer object handling to use a hash table instead of a flat list and
to use RCU. Peer objects are no longer destroyed by passing them to a
workqueue to process, but rather are just passed to the RCU garbage
collector as kfree'able objects.
The hash function uses the local endpoint plus all the components of the
remote address, except for the RxRPC service ID. Peers thus represent a
UDP port on the remote machine as contacted by a UDP port on this machine.
The RCU read lock is used to handle non-creating lookups so that they can
be called from bottom half context in the sk_error_report handler without
having to lock the hash table against modification.
rxrpc_lookup_peer_rcu() *does* take a reference on the peer object as in
the future, this will be passed to a work item for error distribution in
the error_report path and this function will cease being used in the
data_ready path.
Creating lookups are done under spinlock rather than mutex as they might be
set up due to an external stimulus if the local endpoint is a server.
Captured network error messages (ICMP) are handled with respect to this
struct and MTU size and RTT are cached here.
Signed-off-by: David Howells <dhowells@redhat.com>
WANG Cong [Mon, 13 Jun 2016 17:47:44 +0000 (10:47 -0700)]
act_police: rename tcf_act_police_locate() to tcf_act_police_init()
This function is just ->init(), rename it to make it obvious.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Mon, 13 Jun 2016 17:47:43 +0000 (10:47 -0700)]
net_sched: remove internal use of TC_POLICE_*
These should be gone when we removed CONFIG_NET_CLS_POLICE.
We can not totally remove them since they are exposed
to userspace.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Jun 2016 06:50:44 +0000 (23:50 -0700)]
Merge branch 'rds-mprds-foundations'
Sowmini Varadhan says:
====================
RDS: multiple connection paths for scaling
Today RDS-over-TCP is implemented by demux-ing multiple PF_RDS sockets
between any 2 endpoints (where endpoint == [IP address, port]) over a
single TCP socket between the 2 IP addresses involved. This has the
limitation that it ends up funneling multiple RDS flows over a single
TCP flow, thus the rds/tcp connection is
(a) upper-bounded to the single-flow bandwidth,
(b) suffers from head-of-line blocking for the RDS sockets.
Better throughput (for a fixed small packet size, MTU) can be achieved
by having multiple TCP/IP flows per rds/tcp connection, i.e., multipathed
RDS (mprds). Each such TCP/IP flow constitutes a path for the rds/tcp
connection. RDS sockets will be attached to a path based on some hash
(e.g., of local address and RDS port number) and packets for that RDS
socket will be sent over the attached path using TCP to segment/reassemble
RDS datagrams on that path.
The table below, generated using a prototype that implements mprds,
shows that this is significant for scaling to 40G. Packet sizes
used were: 8K byte req, 256 byte resp. MTU: 1500. The parameters for
RDS-concurrency used below are described in the rds-stress(1) man page-
the number listed is proportional to the number of threads at which max
throughput was attained.
-------------------------------------------------------------------
RDS-concurrency Num of tx+rx K/s (iops) throughput
(-t N -d N) TCP paths
-------------------------------------------------------------------
16 1 600K - 700K 4 Gbps
28 8 5000K - 6000K 32 Gbps
-------------------------------------------------------------------
FAQ: what is the relation between mprds and mptcp?
mprds is orthogonal to mptcp. Whereas mptcp creates
sub-flows for a single TCP connection, mprds parallelizes tx/rx
at the RDS layer. MPRDS with N paths will allow N datagrams to
be sent in parallel; each path will continue to send one
datagram at a time, with sender and receiver keeping track of
the retransmit and dgram-assembly state based on the RDS header.
If desired, mptcp can additionally be used to speed up each TCP
path. That acceleration is orthogonal to the parallelization benefits
of mprds.
This patch series lays down the foundational data-structures to support
mprds in the kernel. It implements the changes to split up the
rds_connection structure into a common (to all paths) part,
and a per-path rds_conn_path. All I/O workqs are driven from
the rds_conn_path.
Note that this patchset does not (yet) actually enable multipathing
for any of the transports; all transports will continue to use a
single path with the refactored data-structures. A subsequent patchset
will add the changes to the rds-tcp module to actually use mprds
in rds-tcp.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:42 +0000 (09:44 -0700)]
RDS: Update rds_conn_destroy to be MP capable
Refactor rds_conn_destroy() so that the per-path dismantling
is done in rds_conn_path_destroy, and then iterate as needed
over rds_conn_path_destroy().
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:41 +0000 (09:44 -0700)]
RDS: Update rds_conn_shutdown to work with rds_conn_path
This commit changes rds_conn_shutdown to take a rds_conn_path *
argument, allowing it to shutdown paths other than c_path[0] for
MP-capable transports.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:40 +0000 (09:44 -0700)]
RDS: Initialize all RDS_MPATH_WORKERS in __rds_conn_create
Add a for() loop in __rds_conn_create to initialize all the
conn_paths, in preparate for MP capable transports.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:39 +0000 (09:44 -0700)]
RDS: Add rds_conn_path_error()
rds_conn_path_error() is the MP-aware analog of rds_conn_error,
to be used by multipath-capable callers.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:38 +0000 (09:44 -0700)]
RDS: update rds-info related functions to traverse multiple conn_paths
This commit updates the callbacks related to the rds-info command
so that they walk through all the rds_conn_path structures and
report the requested info.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:37 +0000 (09:44 -0700)]
RDS: Add rds_conn_path_connect_if_down() for MP-aware callers
rds_conn_path_connect_if_down() works on the rds_conn_path
that it is passed. Callers who are not t_m_capable may continue
calling rds_conn_connect_if_down, which will invoke
rds_conn_path_connect_if_down() with the default c_path[0].
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:36 +0000 (09:44 -0700)]
RDS: Make rds_send_pong() take a rds_conn_path argument
This commit allows rds_send_pong() callers to send back
the rds pong message on some path other than c_path[0] by
passing in a struct rds_conn_path * argument. It also
removes the last dependency on the #defines in rds_single.h
from send.c
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:35 +0000 (09:44 -0700)]
RDS: Extract rds_conn_path from i_conn_path in rds_send_drop_to() for MP-capable transports
Explicitly set up rds_conn_path, either from i_conn_path (for
MP capable transpots) or as c_path[0], and use this in
rds_send_drop_to()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:34 +0000 (09:44 -0700)]
RDS: Pass rds_conn_path to rds_send_xmit()
Pass a struct rds_conn_path to rds_send_xmit so that MP capable
transports can transmit packets on something other than c_path[0].
The eventual goal for MP capable transports is to hash the rds
socket to a path based on the bound local address/port, and use
this path as the argument to rds_send_xmit()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:33 +0000 (09:44 -0700)]
RDS: Make rds_send_queue_rm() rds_conn_path aware
Pass the rds_conn_path to rds_send_queue_rm, and use it to initialize
the i_conn_path field in struct rds_incoming. This commit also makes
rds_send_queue_rm() MP capable, because it now takes locks
specific to the rds_conn_path passed in, instead of defaulting to
the c_path[0] based defines from rds_single_path.h
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:32 +0000 (09:44 -0700)]
RDS: Remove stale function rds_send_get_message()
The only caller of rds_send_get_message() was
rds_iw_send_cq_comp_handler() which was removed as part of
commit
dcdede0406d3 ("RDS: Drop stale iWARP RDMA transport"),
so remove rds_send_get_message() for the same reason.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:31 +0000 (09:44 -0700)]
RDS: Add rds_send_path_drop_acked()
rds_send_path_drop_acked() is the path-specific version of
rds_send_drop_acked() to be invoked by MP capable callers.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:30 +0000 (09:44 -0700)]
RDS: Add rds_send_path_reset()
rds_send_path_reset() is the path specific version of rds_send_reset()
intended for MP capable callers.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:29 +0000 (09:44 -0700)]
RDS: rds_inc_path_init() helper function for MP capable transports
t_mp_capable transports can use rds_inc_path_init to initialize
all fields in struct rds_incoming, including the i_conn_path.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:28 +0000 (09:44 -0700)]
RDS: recv path gets the conn_path from rds_incoming for MP capable transports
Transports that are t_mp_capable should set the rds_conn_path
on which the datagram was recived in the ->i_conn_path field
of struct rds_incoming.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:27 +0000 (09:44 -0700)]
RDS: add t_mp_capable bit to be set by MP capable transports
The t_mp_capable bit will be used in the core rds module
to support multipathing logic when the transport supports it.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 13 Jun 2016 16:44:26 +0000 (09:44 -0700)]
RDS: split out connection specific state from rds_connection to rds_conn_path
In preparation for multipath RDS, split the rds_connection
structure into a base structure, and a per-path struct rds_conn_path.
The base structure tracks information and locks common to all
paths. The workqs for send/recv/shutdown etc are tracked per
rds_conn_path. Thus the workq callbacks now work with rds_conn_path.
This commit allows for one rds_conn_path per rds_connection, and will
be extended into multiple conn_paths in subsequent commits.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Mon, 13 Jun 2016 15:20:35 +0000 (11:20 -0400)]
tcp: return sizeof tcp_dctcp_info in dctcp_get_info()
Make sure that dctcp_get_info() returns only the size of the
info->dctcp struct that it zeroes out and fills in. Previously it had
been returning the size of the enclosing tcp_cc_info union,
sizeof(*info). There is no problem yet, but that union that may one
day be larger than struct tcp_dctcp_info, in which case the
TCP_CC_INFO code might accidentally copy uninitialized bytes from the
stack.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 13 Jun 2016 15:08:26 +0000 (23:08 +0800)]
sctp: fix error return code in sctp_init()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Jun 2016 06:30:32 +0000 (23:30 -0700)]
Merge tag 'rxrpc-rewrite-
20160613' of git://git./linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Rename rxrpc source files
Here's the next part of the AF_RXRPC rewrite. In this set I rename some of
the files in the net/rxrpc/ directory and adjust the Makefile and
ar-internal.h to reflect the changes.
The aim is twofold:
(1) Remove the "ar-" prefix on those files that have it as it's not really
useful, especially now that I'm building rxkad in.
(2) To aid splitting the local, peer, connection and call handling code
into separate files for object and event handling in future patches by
making it easier to come up with new filenames.
There are two commits:
(1) The first commit does a bunch of renames of .c files and alters the
Makefile. ar-internal.h isn't renamed at this time to avoid having to
change the contents of the files being renamed.
(2) The second commit changes the section label comments in ar-internal.h
to reflect the changed filenames and reorders the file so that the
sections are back in filename order.
The patches can be found here also:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite
Tagged thusly:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-
20160613
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Mon, 13 Jun 2016 08:56:20 +0000 (16:56 +0800)]
net: hns: update the dependency
After the patchset about adding support of ACPI (commit id is
6343488)
being applied, HNS does not depend on OF. It depends on OF or ACPI, so
the Kconfig file needs to be updated.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Jun 2016 05:38:02 +0000 (22:38 -0700)]
Merge branch 'r8152-phy-adjustments'
Hayes Wang says:
====================
r8152: code adjustment for PHY
These patches are for adjusting the code about PHY and setting speed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Mon, 13 Jun 2016 09:49:38 +0000 (17:49 +0800)]
r8152: save the speed
The user may change the speed. Use it to replace the default one.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Mon, 13 Jun 2016 09:49:37 +0000 (17:49 +0800)]
r8152: move the setting for the default speed
Move calling set_speed() from open() to rtl_hw_phy_work_func_t().
Then, we would set the default speed only for first initialization
or after resuming.
Besides, the set_speed() could handle the flag of PHY_RESET which
would be set in rtl_ops.hw_phy_cfg().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Mon, 13 Jun 2016 09:49:36 +0000 (17:49 +0800)]
r8152: move the settings of PHY to a work queue
Move the settings of PHY to a work queue and schedule it after
rtl_ops.init().
There are some reasons for this. First, the settings are only
needed for the first time initialization or after the power
down occurs.
Second, the settings are independent with the others.
Last, the settings may take more time than the others. Leave
they in probe() or open() may delay the following flows.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Mon, 13 Jun 2016 09:06:39 +0000 (12:06 +0300)]
net/sched: flower: Return error when hw can't offload and skip_sw is set
When skip_sw is set and hardware fails to apply filter, return error to
user. This will make error propagation logic similar to the one
currently used in u32 classifier.
Also, changed code to use tc_skip_sw() utility function.
Signed-off-by: Amir Vadai <amirva@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Jun 2016 23:16:18 +0000 (19:16 -0400)]
Merge branch 'bnxt_en-updates'
Michael Chan says:
====================
bnxt_en: Updates for net-next.
-Add default VLAN support for VFs.
-Add NPAR (NIC partioning) support.
-Add support for new device 5731x and 5741x. GRO logic is different.
-Support new ETHTOOL_{G|S}LINKSETTINGS.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 13 Jun 2016 06:25:38 +0000 (02:25 -0400)]
bnxt_en: Support new ETHTOOL_{G|S}LINKSETTINGS API.
To fully support 25G and 50G link settings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 13 Jun 2016 06:25:37 +0000 (02:25 -0400)]
bnxt_en: Don't allow autoneg on cards that don't support it.
Some cards do not support autoneg. The current code does not prevent the
user from enabling autoneg with ethtool on such cards, causing confusion.
Firmware provides the autoneg capability information and we just need to
store it in the support_auto_speeds field in bnxt_link_info struct.
The ethtool set_settings() call will check this field before proceeding
with autoneg.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 13 Jun 2016 06:25:36 +0000 (02:25 -0400)]
bnxt_en: Add BCM5731X and BCM5741X device IDs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 13 Jun 2016 06:25:35 +0000 (02:25 -0400)]
bnxt_en: Add GRO logic for BCM5731X chips.
Add bnxt_gro_func_5731x() to handle GRO packets for this chip. The
completion structures used in the new chip have new data to help determine
the header offsets. The offsets can be off by 4 if the packet is an
internal loopback packet (e.g. from one VF to another VF). Some additional
logic is added to adjust the offsets if it is a loopback packet.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 13 Jun 2016 06:25:34 +0000 (02:25 -0400)]
bnxt_en: Refactor bnxt_gro_skb().
Newer chips require different logic to handle GRO packets. So refactor
the code so that we can call different functions depending on the chip.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 13 Jun 2016 06:25:33 +0000 (02:25 -0400)]
bnxt_en: Define the supported chip numbers.
Define all the supported chip numbers and chip categories. Store the
chip_num returned by firmware. If the call to get the version and chip
number fails, we should abort.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 13 Jun 2016 06:25:32 +0000 (02:25 -0400)]
bnxt_en: Add PCI device ID for 57404 NPAR devices.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Satish Baddipadige [Mon, 13 Jun 2016 06:25:31 +0000 (02:25 -0400)]
bnxt_en: Enable NPAR (NIC Partitioning) Support.
NPAR type is read from bnxt_hwrm_func_qcfg. Do not allow changing link
parameters if in NPAR mode sinc ethe port is shared among multiple
partitions. The link parameters are set up by firmware.
Signed-off-by: Satish Baddipadige <sbaddipa@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 13 Jun 2016 06:25:30 +0000 (02:25 -0400)]
bnxt_en: Handle VF_CFG_CHANGE event from firmware.
When the VF driver gets this event, the VF configuration has changed (such
as default VLAN). The VF driver will initiate a silent reset to pick up
the new configuration.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 13 Jun 2016 06:25:29 +0000 (02:25 -0400)]
bnxt_en: Add new function bnxt_reset().
When a default VLAN is added to the VF, the VF driver needs to reset to
pick up the default VLAN ID. We can use the same tx timeout reset logic
to do that, without the debug output. This new function, with the
silent parameter to suppress debug output will now serve both purposes.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 13 Jun 2016 06:25:28 +0000 (02:25 -0400)]
bnxt_en: Add function for VF driver to query default VLAN.
The PF can setup a default VLAN for a VF. The default VLAN tag is
automatically inserted and stripped without the knowledge of the
stack running on the VF. The VF driver needs to know that default
VLAN is enabled as VLAN acceleration on the RX side is no longer
supported. Call netdev_update_features() to fix up the VLAN features
as necessary. Also, VLAN strip mode must be enabled to strip out
the default VLAN tag.
Only allow VF default VLAN to be set if the firmware spec is >= 1.2.1.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sun, 12 Jun 2016 21:30:54 +0000 (23:30 +0200)]
net: ethernet: enic: move to new ethtool api {get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move the enic driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Rapoport [Tue, 14 Jun 2016 05:29:38 +0000 (08:29 +0300)]
virtio_net: fix csum generation for virtio-net devices
The commit
e858fae2b0b8 ("virtio_net: use common code for virtio_net_hdr
and skb GSO conversion") replaced the tun code for header manipulation
with the generic helpers. While doing so, it implictly moved the
skb_partial_csum_set() invocation after eth_type_trans(), which
invalidate the current gso start/offset values.
Fix it by moving the helper invocation before the mac pulling.
Fixes:
e858fae2b0b8 ("virtio_net: use common code for virtio_net_hdr and
skb GSO conversion")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Mon, 13 Jun 2016 12:30:30 +0000 (13:30 +0100)]
rxrpc: Update the comments in ar-internal.h to reflect renames
Update the section comments in ar-internal.h that indicate the locations of
the referenced items to reflect the renames done to the .c files in
net/rxrpc/.
This also involves some rearrangement to reflect keep the sections in order
of filename.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 13 Jun 2016 11:16:05 +0000 (12:16 +0100)]
rxrpc: Rename files matching ar-*.c to git rid of the "ar-" prefix
Rename files matching net/rxrpc/ar-*.c to get rid of the "ar-" prefix.
This will aid splitting those files by making easier to come up with new
names.
Note that the not all files are simply renamed from ar-X.c to X.c. The
following exceptions are made:
(*) ar-call.c -> call_object.c
ar-ack.c -> call_event.c
call_object.c is going to contain the core of the call object
handling. Call event handling is all going to be in call_event.c.
(*) ar-accept.c -> call_accept.c
Incoming call handling is going to be here.
(*) ar-connection.c -> conn_object.c
ar-connevent.c -> conn_event.c
The former file is going to have the basic connection object handling,
but there will likely be some differentiation between client
connections and service connections in additional files later. The
latter file will have all the connection-level event handling.
(*) ar-local.c -> local_object.c
This will have the local endpoint object handling code. The local
endpoint event handling code will later be split out into
local_event.c.
(*) ar-peer.c -> peer_object.c
This will have the peer endpoint object handling code. Peer event
handling code will be placed in peer_event.c (for the moment, there is
none).
(*) ar-error.c -> peer_event.c
This will become the peer event handling code, though for the moment
it's actually driven from the local endpoint's perspective.
Note that I haven't renamed ar-transport.c to transport_object.c as the
intention is to delete it when the rxrpc_transport struct is excised.
The only file that actually has its contents changed is net/rxrpc/Makefile.
net/rxrpc/ar-internal.h will need its section marker comments updating, but
I'll do that in a separate patch to make it easier for git to follow the
history across the rename. I may also want to rename ar-internal.h at some
point - but that would mean updating all the #includes and I'd rather do
that in a separate step.
Signed-off-by: David Howells <dhowells@redhat.com.
Florian Westphal [Sat, 11 Jun 2016 10:46:04 +0000 (12:46 +0200)]
sched: remove NET_XMIT_POLICED
sch_atm returns this when TC_ACT_SHOT classification occurs.
But all other schedulers that use tc_classify
(htb, hfsc, drr, fq_codel ...) return NET_XMIT_SUCCESS | __BYPASS
in this case so just do that in atm.
BATMAN uses it as an intermediate return value to signal
forwarding vs. buffering, but it did not return POLICED to
callers outside of BATMAN.
Reviewed-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Wahren [Wed, 8 Jun 2016 20:42:46 +0000 (20:42 +0000)]
net: fec: handle small PHY reset durations more precisely
Since msleep is based on jiffies the PHY reset could take longer
than expected. So use msleep for values greater than 20 msec otherwise
usleep_range.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sat, 11 Jun 2016 18:08:19 +0000 (20:08 +0200)]
ipv6: use TOS marks from sockets for routing decision
In IPv6 the ToS values are part of the flowlabel in flowi6 and get
extracted during fib rule lookup, but we forgot to correctly initialize
the flowlabel before the routing lookup.
Reported-by: <liam.mcbirnie@boeing.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Jun 2016 06:58:29 +0000 (23:58 -0700)]
Merge branch 'remove-qdisc-throttle'
Eric Dumazet says:
====================
net_sched: remove qdisc_is_throttled()
HTB, CBQ and HFSC pay a very high cost updating the qdisc 'throttled'
status that nothing but CBQ seems to use.
CBQ usage is flaky anyway, since no qdisc ->enqueue() updates the
'throttled' qdisc status.
This looks like some 'optimization' that actually cost more than code
without the optimization, and might cause latency issues with CBQ.
In my tests, I could achieve a 8 % performance increase in TCP_RR
workload through HTB qdisc, in presence of throttled classes,
and 5 % without throttled classes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 10 Jun 2016 23:41:39 +0000 (16:41 -0700)]
net_sched: remove generic throttled management
__QDISC_STATE_THROTTLED bit manipulation is rather expensive
for HTB and few others.
I already removed it for sch_fq in commit
f2600cf02b5b
("net: sched: avoid costly atomic operation in fq_dequeue()")
and so far nobody complained.
When one ore more packets are stuck in one or more throttled
HTB class, a htb dequeue() performs two atomic operations
to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
lock is held.
Removing this pair of atomic operations bring me a 8 % performance
increase on 200 TCP_RR tests, in presence of throttled classes.
This patch has no side effect, since nothing actually uses
disc_is_throttled() anymore.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 10 Jun 2016 23:41:38 +0000 (16:41 -0700)]
net_sched: netem: remove qdisc_is_throttled() use
Looks like it is only there as some optimization attempt.
Since __QDISC_STATE_THROTTLED set/unset is way too expensive,
and netem is the last user, just remove this check.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 10 Jun 2016 23:41:37 +0000 (16:41 -0700)]
net_sched: cbq: remove a flaky use of qdisc_is_throttled()
So far no qdisc ever unset the throttled bit at enqueue() time,
so CBQ usage of qdisc_is_throttled() was flaky.
Since __QDISC_STATE_THROTTLED set/unset is way too expensive
considering that only CBQ was eventually caring for this status,
it would make sense to implement a Qdisc ops ->is_throttled()
if we find that this is needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 10 Jun 2016 23:41:36 +0000 (16:41 -0700)]
net_sched: sch_plug: use a private throttled status
We want to get rid of generic qdisc throttled management,
so this qdisc has to use a private flag.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Jun 2016 06:24:55 +0000 (23:24 -0700)]
Merge branch 'mdio-iProc-SOC'
Pramod Kumar says:
====================
Add MDIO bus multiplexer support for iProc SoCs
Broadcom iProc based SoCs use a MDIO bus multiplexer where child buses
could be internal as well external to SoCs. These buses could supports
MDIO transaction compatible to C-22/C-45.
Broadcom MDIO bus multiplexer is an integrated multiplexer where child bus
selection and mdio transaction logic lies inside multiplexer itself.
To accommodate this multiplexer in existing mux framework below changes
were required-
1. Passed MDIO parent bus via mdio_mux_init to MDIO mux framework.
This patch set includes MDIO bus multiplexer driver along with above
framework change. It includes one external bus node having Ethernet PHY
attached and two internal bus node holding PCIe PHYs.
This patch series is based on v4.7-rc1 and is available from github-
repo: https://github.com/Broadcom/arm64-linux.git
branch:mdio-mux-v5
-Changes from v4:
- disabled PCIe PHYs from dtsi and enabled in dts file.
-Changes from v3:
- Unregister and free the parent MDIO bus.
- rebased on net-next/master branch.
Reason for resend:
-Rebased on v4.7-rc1
Changes from v2:
-Addressed Rob's comments in this patch regarding typo/grammers.
-Addressed David's comments regarding local variables order.
-Removed property "mdio-integrated-mux" and used mdiobus_register()
in place of of_mdiobus_regsiter().
-removed usage of IS_ERR_OR_NULL to IS_ERR in PCIe PHY driver.
Changes from v1:
- stop using "brcm,is_c45" from bus node as suggested by Andrew. MDIO
PHY driver will logically OR MII_ADDR_C45 into the address when issues
any C45 MDIO read/write transaction.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pramod Kumar [Fri, 10 Jun 2016 05:33:51 +0000 (11:03 +0530)]
phy: Add Northstar2 PCI Phy support
Add PCI Phy support for Broadcom Northstar2 SoCs. This driver uses the
interface from the iproc mdio mux driver to enable the devices
respective phys.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jon Mason <jonmason@broadcom.com>
Signed-off-by: Pramod Kumar <pramod.kumar@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pramod Kumar [Fri, 10 Jun 2016 05:33:50 +0000 (11:03 +0530)]
binding: PHY: Binding doc for NS2 PCIe PHYs.
Binding doc for NS2 PCIe PHYs.
Signed-off-by: Pramod Kumar <pramod.kumar@broadcom.com>
Signed-off-by: Jon Mason <jonmason@broadcom.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pramod Kumar [Fri, 10 Jun 2016 05:33:49 +0000 (11:03 +0530)]
net: mdio-mux: Add MDIO mux driver for iProc SoCs
iProc based SoCs supports the integrated mdio multiplexer which
has the bus selection as well as mdio transaction generation logic
inside.
This multiplexer has child buses for PCIe, SATA, USB and ETH. These
buses could be internal or external to SOC where PHYs are attached.
These buses could use C-45 or C-22 mdio transaction.
Signed-off-by: Pramod Kumar <pramod.kumar@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pramod Kumar [Fri, 10 Jun 2016 05:33:48 +0000 (11:03 +0530)]
dt: mdio-mux: Add mdio multiplexer driver node
Add integrated MDIO multiplexer driver node which contains
two mux PCIe bus and one ethernet bus along with phys
lying on these bus.
Signed-off-by: Pramod Kumar <pramod.kumar@broadcom.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pramod Kumar [Fri, 10 Jun 2016 05:33:47 +0000 (11:03 +0530)]
binding: mdio-mux: Add DT binding doc for Broadcom MDIO bus multiplexer
Add DT binding doc for Broadcom MDIO bus multiplexer driver.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Pramod Kumar <pramod.kumar@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pramod Kumar [Fri, 10 Jun 2016 05:33:46 +0000 (11:03 +0530)]
binding: Make "mdio-parent-bus" property from mandatory to optional
Change "mdio-parent-bus" from mandatory section to optional
as it won't be required by integrated MDIO multiplexer
which has bus selection and mdio transaction generation logic,
integrated inside.
Signed-off-by: Pramod Kumar <pramod.kumar@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pramod Kumar [Fri, 10 Jun 2016 05:33:45 +0000 (11:03 +0530)]
mdio: mux: Enhanced MDIO mux framework for integrated multiplexers
An integrated multiplexer uses same address space for
"muxed bus selection" and "generation of mdio transaction"
hence its good to register parent bus from mux driver.
Hence added a mechanism where mux driver could register a
parent bus and pass it down to framework via mdio_mux_init api.
Signed-off-by: Pramod Kumar <pramod.kumar@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 9 Jun 2016 14:48:18 +0000 (22:48 +0800)]
sctp: sctp should change socket state when shutdown is received
Now sctp doesn't change socket state upon shutdown reception. It changes
just the assoc state, even though it's a TCP-style socket.
For some cases, if we really need to check sk->sk_state, it's necessary to
fix this issue, at least when we use ss or netstat to dump, we can get a
more exact information.
As an improvement, we will change sk->sk_state when we change asoc->state
to SHUTDOWN_RECEIVED, and also do it in sctp_shutdown to keep consistent
with sctp_close.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Jun 2016 06:13:32 +0000 (23:13 -0700)]
Merge tag 'mac80211-next-for-davem-2016-06-09' of git://git./linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
For the next cycle, we have the following:
* the biggest change is Michał's work on integrating FQ/codel
with the mac80211 internal software queues
* cfg80211 connect result gets clarified for the
"no connection at all" case
* advertisement of per-interface type capabilities, in case
they differ (which makes a lot of sense for some capabilities)
* most of the nl80211 & hwsim unprivileged namespace operation
changes
* human-readable VHT capabilities in debugfs
* some other cleanups, like spelling
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Jun 2016 06:11:50 +0000 (23:11 -0700)]
Merge branch 'arm64-bpf'
Zi Shen Lim says:
====================
arm64 BPF JIT updates
Updates for arm64 eBPF JIT.
The main addition here is implementation of bpf_tail_call.
Changes since v2:
- None. Resubmit per David Miller.
Changes since v1:
- Added patch #1 to address build error due to missing header inclusion
in linux/bpf.h. (Thanks to suggestion and ack by Daniel Borkmann)
Ordered it ahead of bpf_tail_call patch #2 so build error is not
triggered.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zi Shen Lim [Thu, 9 Jun 2016 04:18:50 +0000 (21:18 -0700)]
arm64: bpf: optimize LD_ABS, LD_IND
Remove superfluous stack frame, saving us 3 instructions for every
LD_ABS or LD_IND.
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zi Shen Lim [Thu, 9 Jun 2016 04:18:49 +0000 (21:18 -0700)]
arm64: bpf: optimize JMP_CALL
Remove superfluous stack frame, saving us 3 instructions for
every JMP_CALL.
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zi Shen Lim [Thu, 9 Jun 2016 04:18:48 +0000 (21:18 -0700)]
arm64: bpf: implement bpf_tail_call() helper
Add support for JMP_CALL_X (tail call) introduced by commit
04fd61ab36ec
("bpf: allow bpf programs to tail-call other bpf programs").
bpf_tail_call() arguments:
ctx - context pointer passed to next program
array - pointer to map which type is BPF_MAP_TYPE_PROG_ARRAY
index - index inside array that selects specific program to run
In this implementation arm64 JIT jumps into callee program after prologue,
so callee program reuses the same stack. For tail_call_cnt, we use the
callee-saved R26 (which was already saved/restored but previously unused
by JIT).
With this patch a tail call generates the following code on arm64:
if (index >= array->map.max_entries)
goto out;
34: mov x10, #0x10 // #16
38: ldr w10, [x1,x10]
3c: cmp w2, w10
40: b.ge 0x0000000000000074
if (tail_call_cnt > MAX_TAIL_CALL_CNT)
goto out;
tail_call_cnt++;
44: mov x10, #0x20 // #32
48: cmp x26, x10
4c: b.gt 0x0000000000000074
50: add x26, x26, #0x1
prog = array->ptrs[index];
if (prog == NULL)
goto out;
54: mov x10, #0x68 // #104
58: ldr x10, [x1,x10]
5c: ldr x11, [x10,x2]
60: cbz x11, 0x0000000000000074
goto *(prog->bpf_func + prologue_size);
64: mov x10, #0x20 // #32
68: ldr x10, [x11,x10]
6c: add x10, x10, #0x20
70: br x10
74:
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zi Shen Lim [Thu, 9 Jun 2016 04:18:47 +0000 (21:18 -0700)]
bpf: fix missing header inclusion
Commit
0fc174dea545 ("ebpf: make internal bpf API independent of
CONFIG_BPF_SYSCALL ifdefs") introduced usage of ERR_PTR() in
bpf_prog_get(), however did not include linux/err.h.
Without this patch, when compiling arm64 BPF without CONFIG_BPF_SYSCALL:
...
In file included from arch/arm64/net/bpf_jit_comp.c:21:0:
include/linux/bpf.h: In function 'bpf_prog_get':
include/linux/bpf.h:235:9: error: implicit declaration of function 'ERR_PTR' [-Werror=implicit-function-declaration]
return ERR_PTR(-EOPNOTSUPP);
^
include/linux/bpf.h:235:9: warning: return makes pointer from integer without a cast [-Wint-conversion]
In file included from include/linux/rwsem.h:17:0,
from include/linux/mm_types.h:10,
from include/linux/sched.h:27,
from arch/arm64/include/asm/compat.h:25,
from arch/arm64/include/asm/stat.h:23,
from include/linux/stat.h:5,
from include/linux/compat.h:12,
from include/linux/filter.h:10,
from arch/arm64/net/bpf_jit_comp.c:22:
include/linux/err.h: At top level:
include/linux/err.h:23:35: error: conflicting types for 'ERR_PTR'
static inline void * __must_check ERR_PTR(long error)
^
In file included from arch/arm64/net/bpf_jit_comp.c:21:0:
include/linux/bpf.h:235:9: note: previous implicit declaration of 'ERR_PTR' was here
return ERR_PTR(-EOPNOTSUPP);
^
...
Fixes:
0fc174dea545 ("ebpf: make internal bpf API independent of CONFIG_BPF_SYSCALL ifdefs")
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Jun 2016 06:07:50 +0000 (23:07 -0700)]
Merge branch 'tcp_nv'
Lawrence Brakmo says:
====================
tcp: add NV congestion control
Removed most of the module parameters
Tested in a rack using between 1 and 380 active TCP-NV flows.
Consists of the following patches:
[PATCH net-next v2 1/2] tcp: add in_flight to tcp_skb_cb
[PATCH net-next v2 2/2] tcp: add NV congestion control
====================
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Lawrence Brakmo [Thu, 9 Jun 2016 04:16:45 +0000 (21:16 -0700)]
tcp: add NV congestion control
TCP-NV (New Vegas) is a major update to TCP-Vegas.
An earlier version of NV was presented at 2010's LPC.
It is a delayed based congestion avoidance for the
data center. This version has been tested within a
10G rack where the HW RTTs are 20-50us and with
1 to 400 flows.
A description of TCP-NV, including implementation
details as well as experimental results, can be found at:
http://www.brakmo.org/networking/tcp-nv/TCPNV.html
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lawrence Brakmo [Thu, 9 Jun 2016 04:16:44 +0000 (21:16 -0700)]
tcp: add in_flight to tcp_skb_cb
Add in_flight (bytes in flight when packet was sent) field
to tx component of tcp_skb_cb and make it available to
congestion modules' pkts_acked() function through the
ack_sample function argument.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Jun 2016 06:03:56 +0000 (23:03 -0700)]
Merge branch 'virtio_net-GSO-helpers'
Mike Rapoport says:
====================
virtio_net: use common code for virtio_net_hdr and skb GSO conversion
This patches introduce virtio_net_hdr_{from,to}_skb functions for
conversion of GSO information between skb and virtio_net_hdr.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Rapoport [Wed, 8 Jun 2016 13:09:22 +0000 (16:09 +0300)]
packet: use common code for virtio_net_hdr and skb GSO conversion
Replace open coded conversion between virtio_net_hdr to skb GSO info with
virtio_net_hdr_from_skb
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Rapoport [Wed, 8 Jun 2016 13:09:21 +0000 (16:09 +0300)]
virtio_net: use common code for virtio_net_hdr and skb GSO conversion
Replace open coded conversion between virtio_net_hdr to skb GSO info with
virtio_net_hdr_{from,to}_skb
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Rapoport [Wed, 8 Jun 2016 13:09:20 +0000 (16:09 +0300)]
tuntap: use common code for virtio_net_hdr and skb GSO conversion
Replace open coded conversion between virtio_net_hdr to skb GSO info with
virtio_net_hdr_{from,to}_skb
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Rapoport [Wed, 8 Jun 2016 13:09:19 +0000 (16:09 +0300)]
macvtap: use common code for virtio_net_hdr and skb GSO conversion
Replace open coded conversion between virtio_net_hdr to skb GSO info with
virtio_net_hdr_{from,to}_skb
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Rapoport [Wed, 8 Jun 2016 13:09:18 +0000 (16:09 +0300)]
virtio_net: introduce virtio_net_hdr_{from,to}_skb
The code for conversion between virtio_net_hdr and skb GSO info is
duplicated at several places. Let's put it to a common place to allow
reuse.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Rapoport [Wed, 8 Jun 2016 13:09:17 +0000 (16:09 +0300)]
virtio_net: add _UAPI prefix to virtio_net header guards
This gives better namespacing and prevents conflicts with no-uapi version
of virtio_net header that will be introduced in the following patch.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhaktipriya Shridhar [Tue, 7 Jun 2016 19:33:45 +0000 (01:03 +0530)]
RDS: IB: Remove deprecated create_workqueue
alloc_workqueue replaces deprecated create_workqueue().
Since the driver is infiniband which can be used as block device and the
workqueue seems involved in regular operation of the device, so a
dedicated workqueue has been used with WQ_MEM_RECLAIM set to guarantee
forward progress under memory pressure.
Since there are only a fixed number of work items, explicit concurrency
limit is unnecessary here.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Sun, 5 Jun 2016 21:41:11 +0000 (23:41 +0200)]
NET: PHY: adds driver for Intel XWAY PHY
This adds support for the Intel (former Lantiq) XWAY 11G and 22E PHYs.
These PHYs are also named PEF 7061, PEF 7071, PEF 7072.
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Fri, 10 Jun 2016 21:30:37 +0000 (22:30 +0100)]
rxrpc: Limit the listening backlog
Limit the socket incoming call backlog queue size so that a remote client
can't pump in sufficient new calls that the server runs out of memory. Note
that this is partially theoretical at the moment since whilst the number of
calls is limited, the number of packets trying to set up new calls is not.
This will be addressed in a later patch.
If the caller of listen() specifies a backlog INT_MAX, then they get the
current maximum; anything else greater than max_backlog or anything
negative incurs EINVAL.
The limit on the maximum queue size can be set by:
echo N >/proc/sys/net/rxrpc/max_backlog
where 4<=N<=32.
Further, set the default backlog to 0, requiring listen() to be called
before we start actually queueing new calls. Whilst this kind of is a
change in the UAPI, the caller can't actually *accept* new calls anyway
unless they've first called listen() to put the socket into the LISTENING
state - thus the aforementioned new calls would otherwise just sit there,
eating up kernel memory. (Note that sockets that don't have a non-zero
service ID bound don't get incoming calls anyway.)
Given that the default backlog is now 0, make the AFS filesystem call
kernel_listen() to set the maximum backlog for itself.
Possible improvements include:
(1) Trimming a too-large backlog to max_backlog when listen is called.
(2) Trimming the backlog value whenever the value is used so that changes
to max_backlog are applied to an open socket automatically. Note that
the AFS filesystem opens one socket and keeps it open for extended
periods, so would miss out on changes to max_backlog.
(3) Having a separate setting for the AFS filesystem.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Fri, 10 Jun 2016 21:30:27 +0000 (22:30 +0100)]
rxrpc: Trim line-terminal whitespace
Trim line-terminal whitespace in net/rxrpc/
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 10 Jun 2016 21:10:22 +0000 (23:10 +0200)]
net, cls: allow for deleting all filters for given parent
Add a possibility where the user can just specify the parent and
all filters under that parent are then being purged. Currently,
for example for scripting, one needs to specify pref/prio to have
a well-defined number for 'tc filter del' command for addressing
the previously created instance or additionally filter handle in
case of priorities being the same. Improve usage by allowing the
option for tc to specify the parent and removing the whole chain
for that given parent.
Example usage after patch, no tc changes required:
# tc qdisc replace dev foo clsact
# tc filter add dev foo egress bpf da obj ./bpf.o
# tc filter add dev foo egress bpf da obj ./bpf.o
# tc filter show dev foo egress
filter protocol all pref 49151 bpf
filter protocol all pref 49151 bpf handle 0x1 bpf.o:[classifier] direct-action
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bpf.o:[classifier] direct-action
# tc filter del dev foo egress
# tc filter show dev foo egress
#
Previously, RTM_DELTFILTER requests with invalid prio of 0 were
rejected, so only netlink requests with RTM_NEWTFILTER and NLM_F_CREATE
flag were allowed where the kernel would auto-generate a pref/prio.
We can piggyback on that and use prio of 0 as a wildcard for
requests of RTM_DELTFILTER.
For notifying tc netlink monitoring users (e.g. libnl uses this
for caching), there are two options, that is, sending individual
tfilter_notify() notifications for each tcf_proto, or sending a
single one indicating wildcard removal. I tried both and there
are pros and cons for each, eventually I decided for sending
individual tfilter_notify(), so that user space can support this
seamlessly and there won't be a mess of changing each and every
application to make sure expectations from the kernel won't break
when they don't understand single notification. Since linear chains
don't really scale, I expect only a handful of classifiers to be
attached at max for a given parent anyway.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Jun 2016 01:00:57 +0000 (18:00 -0700)]
Merge branch 'bpf-fixes'
Daniel Borkmann says:
====================
bpf: couple of fixes
These are two fixes for BPF, one to introduce xmit recursion limiter for
tc bpf programs and the other one to reject filters a bit earlier. For
more details please see individual patches. I have no strong opinion
to which tree they should go, they apply to both, but I think net-next
seems okay to me.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 10 Jun 2016 19:19:07 +0000 (21:19 +0200)]
bpf: reject wrong sized filters earlier
Add a bpf_check_basics_ok() and reject filters that are of invalid
size much earlier, so we don't do any useless work such as invoking
bpf_prog_alloc(). Currently, rejection happens in bpf_check_classic()
only, but it's really unnecessarily late and they should be rejected
at earliest point. While at it, also clean up one bpf_prog_size() to
make it consistent with the remaining invocations.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 10 Jun 2016 19:19:06 +0000 (21:19 +0200)]
bpf: enforce recursion limit on redirects
Respect the stack's xmit_recursion limit for calls into dev_queue_xmit().
Currently, they are not handeled by the limiter when attached to clsact's
egress parent, for example, and a buggy program redirecting it to the
same device again could run into stack overflow eventually. It would be
good if we could notify an admin to give him a chance to react. We reuse
xmit_recursion instead of having one private to eBPF, so that the stack's
current recursion depth will be taken into account as well. Follow-up to
commit
3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") and
27b29f63058d ("bpf: add bpf_redirect() helper").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Fri, 10 Jun 2016 18:49:33 +0000 (11:49 -0700)]
openvswitch: Add packet truncation support.
The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
truncate packets. A 'max_len' is added for setting up the maximum
packet size, and a 'cutlen' field is to record the number of bytes
to trim the packet when the packet is outputting to a port, or when
the packet is sent to userspace.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Jun 2016 18:52:24 +0000 (11:52 -0700)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
net/sched/act_police.c
net/sched/sch_drr.c
net/sched/sch_hfsc.c
net/sched/sch_prio.c
net/sched/sch_red.c
net/sched/sch_tbf.c
In net-next the drop methods of the packet schedulers got removed, so
the bug fixes to them in 'net' are irrelevant.
A packet action unload crash fix conflicts with the addition of the
new firstuse timestamp.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 10 Jun 2016 15:32:24 +0000 (08:32 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) nfnetlink timestamp taken from wrong skb, fix from Florian Westphal.
2) Revert some msleep conversions in rtlwifi as these spots are in
atomic context, from Larry Finger.
3) Validate that NFTA_SET_TABLE attribute is actually specified when we
call nf_tables_getset(). From Phil Turnbull.
4) Don't do mdio_reset in stmmac driver with spinlock held as that can
sleep, from Vincent Palatin.
5) sk_filter() does things other than run a BPF filter, so we should
not elide it's call just because sk->sk_filter is NULL. Fix from
Eric Dumazet.
6) Fix missing backlog updates in several packet schedulers, from Cong
Wang.
7) bnx2x driver should allow VLAN add/remove while the interface is
down, from Michal Schmidt.
8) Several RDS/TCP race fixes from Sowmini Varadhan.
9) fq_codel scheduler doesn't return correct queue length in dumps,
from Eric Dumazet.
10) Fix TCP stats for tail loss probe and early retransmit in ipv6, from
Yuchung Cheng.
11) Properly initialize udp_tunnel_socket_cfg in l2tp_tunnel_create(),
from Guillaume Nault.
12) qfq scheduler leaks SKBs if a kzalloc fails, fix from Florian
Westphal.
13) sock_fprog passed into PACKET_FANOUT_DATA needs compat handling,
from Willem de Bruijn.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
vmxnet3: segCnt can be 1 for LRO packets
packet: compat support for sock_fprog
stmmac: fix parameter to dwmac4_set_umac_addr()
net/mlx5e: Fix blue flame quota logic
net/mlx5e: Use ndo_stop explicitly at shutdown flow
net/mlx5: E-Switch, always set mc_promisc for allmulti vports
net/mlx5: E-Switch, Modify node guid on vf set MAC
net/mlx5: E-Switch, Fix vport enable flow
net/mlx5: E-Switch, Use the correct error check on returned pointers
net/mlx5: E-Switch, Use the correct free() function
net/mlx5: Fix E-Switch flow steering capabilities check
net/mlx5: Fix flow steering NIC capabilities check
net/mlx5: Fix root flow table update
net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly
net/mlx5: Fix masking of reserved bits in XRCD number
net/mlx5: Fix the size of modify QP mailbox
mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name()
mlxsw: spectrum: Make split flow match firmware requirements
wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel
cfg80211: remove get/set antenna and tx power warnings
...
Linus Torvalds [Fri, 10 Jun 2016 15:27:30 +0000 (08:27 -0700)]
Merge tag 'sound-4.7-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"We have only few, mainly HD-audio device-specific fixes. Realtek
codec driver got a slightly more LOC, but they are all for the new
codec chip, and won't affect others at all"
* tag 'sound-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add PCI ID for Kabylake
ALSA: hda/realtek: Add T560 docking unit fixup
ALSA: hda - Fix headset mic detection problem for Dell machine
ALSA: uapi: Add three missing header files to Kbuild file
ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703
ALSA: hda/realtek - ALC256 speaker noise issue
Linus Torvalds [Fri, 10 Jun 2016 15:21:06 +0000 (08:21 -0700)]
Merge tag 'drm-fixes-for-v4.7-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This weeks instalment of fixes:
amdgpu:
Lots of memory leak and firmware leak fixes
nouveau:
Collection of display fixes, KASAN fixes
vc4:
vblank/pageflipping fixes
fsl-dcu:
Regmap cache fix
omap:
Unused variable warning fix.
Nothing too surprising so far"
* tag 'drm-fixes-for-v4.7-rc3' of git://people.freedesktop.org/~airlied/linux: (46 commits)
drm/amdgpu: fix warning with powerplay disabled.
drm/amd/powerplay: delete useless code as pptable changed in vbios.
drm/amd/powerplay: fix bug visit array out of bounds
drm/amdgpu: fix smu ucode memleak (v2)
drm/amdgpu: add release firmware for cgs
drm/amdgpu: fix tonga smu_fini mem leak
drm/amdgpu: fix fiji smu fini mem leak
drm/amdgpu: fix cik sdma ucode memleak
drm/amdgpu: fix sdma24 ucode mem leak
drm/amdgpu: fix sdma3 ucode mem leak
drm/amdgpu: fix uvd fini mem leak
drm/amdgpu: fix gfx 7 ucode mem leak
drm/amdgpu: fix gfx8 ucode mem leak
drm/amdgpu: fix missing free wb for cond_exec
drm/amdgpu: fix memleak in pptable_init
drm/amdgpu: fix mem leak in atombios
drm/amdgpu: fix mem leak in pplib/hwmgr
drm/amdgpu: fix mem leak in smumgr
drm/amdgpu: add pipeline sync while vmid switch in same ctx
drm/amdgpu: vBIOS post only call when mem_size zero
...
Linus Torvalds [Fri, 10 Jun 2016 15:15:37 +0000 (08:15 -0700)]
Merge tag 'acpi-4.7-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"A recently introduced boot regression related to the ACPI EC
initialization is addressed by restoring the previous behavior (Lv
Zheng)"
* tag 'acpi-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / EC: Fix a boot EC regresion by restoring boot EC support for the DSDT EC
Linus Torvalds [Fri, 10 Jun 2016 15:09:12 +0000 (08:09 -0700)]
Merge tag 'pm-4.7-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Stable-candidate fixes for the intel_pstate driver and the cpuidle
core.
Specifics:
- Fix two intel_pstate initialization issues, one of which was
introduced during the 4.4 cycle (Srinivas Pandruvada)
- Fix kernel build with CONFIG_UBSAN set and CONFIG_CPU_IDLE unset
(Catalin Marinas)"
* tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()
cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE
Linus Torvalds [Fri, 10 Jun 2016 15:00:47 +0000 (08:00 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED
mm: introduce dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_all
kernel/relay.c: fix potential memory leak
mm: thp: broken page count after commit
aa88b68c3b1d
revert "mm: memcontrol: fix possible css ref leak on oom"
kasan: change memory hot-add error messages to info messages
mm/hugetlb: fix huge page reserve accounting for private mappings
Shrikrishna Khare [Wed, 8 Jun 2016 14:40:53 +0000 (07:40 -0700)]
vmxnet3: segCnt can be 1 for LRO packets
The device emulation may send segCnt of 1 for LRO packets.
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Jin Heo <heoj@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhaktipriya Shridhar [Tue, 7 Jun 2016 19:59:46 +0000 (01:29 +0530)]
mlxsw: core: Remove deprecated create_workqueue
alloc_workqueue replaces deprecated create_workqueue().
A dedicated workqueue has been used since the workqueue
mlxsw_wq is used for FDB notif. processing with workitems that are
involved in normal device operation && because it's a network device
which can be depended upon during memory reclaim.
Workitems &trans->timeout_dw and &mlxsw_sp->fdb_notify.dw,
map to mlxsw_sp_fdb_notify_work (processes FDB notifications from the
underlying device and resolves the netdev to which the entry points to
and notifies the bridge using the switchdev notifier) and
mlxsw_emad_trans_timeout_work (provides async EMAD register access)
respectively. They require forward progress under memory pressure and
hence, WQ_MEM_RECLAIM has been set.
Since there are only a fixed number of work items, explicit concurrency
limit is unnecessary here.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhaktipriya Shridhar [Tue, 7 Jun 2016 20:17:59 +0000 (01:47 +0530)]
net: cavium: liquidio: Remove deprecated create_workqueue
alloc_workqueue replaces deprecated create_workqueue().
A dedicated workqueue has been used since the workitem viz
(&lio->txq_status_wq.wk.work which maps to octnet_poll_check_txq_status)
is involved in a brief poll routine for checking transmit queue status
and is an intergral part of normal device operation.
WQ_MEM_RECLAIM has been set to guarantee forward progress under memory
pressure, which is a requirement here.
Since there are only a fixed number of work items, explicit concurrency
limit is unnecessary.
flush_workqueue is unnecessary since destroy_workqueue() itself calls
drain_workqueue() which flushes repeatedly till the workqueue
becomes empty.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Tue, 7 Jun 2016 16:06:34 +0000 (12:06 -0400)]
packet: compat support for sock_fprog
Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument
if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains
a pointer into user memory. If userland is 32-bit and kernel is 64-bit
the two disagree about the layout of struct sock_fprog.
Add compat setsockopt support to convert a 32-bit compat_sock_fprog to
a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for
SO_REUSEPORT added in commit
1957598840f4 ("soreuseport: add compat
case for setsockopt SO_ATTACH_REUSEPORT_CBPF").
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 8 Jun 2016 04:24:18 +0000 (21:24 -0700)]
net/mlx4_en: fix ethtool -x
mlx4 RSS is limited to spread incoming packets to a power of two number
of queues.
An uniformly distibuted traffic would be split on queues 0 to N-1, N
being a power of two, each queue having a 1/N weight.
If number of RX queues is not a power of two, upper RX queues do not
receive traffic.
ethtool -x is lying, because it pretends some queues have higher weight.
Before patch:
lpaa24:~# ethtool -L eth1 rx 24
lpaa24:~# ethtool -x eth1
RX flow hash indirection table for eth1 with 24 RX ring(s):
0: 0 1 2 3 4 5 6 7
8: 8 9 10 11 12 13 14 15
16: 0 1 2 3 4 5 6 7
RSS hash key:
e0:7c:3a:89:07:55:b6:58:69:cc:f4:e5:24:62:e3:25:88:6c:42:5b:d2:cb:9a:d2:e0:06:e1:dc:f9:09:a1:89:0f:a0:30:43:73:6f:0c:b6
If this information was correct, user space tools could expect queues 0
to 7 to receive twice more traffic than queues 8 to 15
After patch :
lpaa24:~# ethtool -L eth1 rx 24
lpaa24:~# ethtool -x eth1
RX flow hash indirection table for eth1 with 24 RX ring(s):
0: 0 1 2 3 4 5 6 7
8: 8 9 10 11 12 13 14 15
RSS hash key:
da:7b:09:60:f1:ac:67:b4:d0:72:d4:ec:a2:e5:80:0a:ad:50:22:1a:f8:f9:66:54:5f:22:45:c3:88:f4:57:82:c1:c1:90:ed:70:cb:40:ce
lpaa24:~# ethtool -X eth1 equal 8
lpaa24:~# ethtool -x eth1
RX flow hash indirection table for eth1 with 24 RX ring(s):
0: 0 1 2 3 4 5 6 7
8: 0 1 2 3 4 5 6 7
RSS hash key:
da:7b:09:60:f1:ac:67:b4:d0:72:d4:ec:a2:e5:80:0a:ad:50:22:1a:f8:f9:66:54:5f:22:45:c3:88:f4:57:82:c1:c1:90:ed:70:cb:40:ce
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>