GitHub/moto-9609/android_kernel_motorola_exynos9610.git
9 years agowil6210: detailed statistics for Rx reorder drop
Vladimir Kondratiev [Thu, 30 Jul 2015 10:52:01 +0000 (13:52 +0300)]
wil6210: detailed statistics for Rx reorder drop

Rx drops may be for 2 reasons: frame is old,
or it is duplicate. On the debugfs "stations" entry,
provide counters per reorder buffer for total
frames processed, drops for these 2 reasons.
Also add debug print for dropped frames.

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: unify wmi_set_ie() error handling
Vladimir Kondratiev [Thu, 30 Jul 2015 10:52:00 +0000 (13:52 +0300)]
wil6210: unify wmi_set_ie() error handling

When printing error message, provide string describing IE kind.
Derive it from IE type
This allows removing of error messages printing
in callers

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: sort IEs handling
Vladimir Kondratiev [Thu, 30 Jul 2015 10:51:59 +0000 (13:51 +0300)]
wil6210: sort IEs handling

sort overall IE's handling
prepare code (disabled for now) to add IEs for the beacon

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: treat "unhandled event" as warning instead of error
Dedy Lansky [Thu, 30 Jul 2015 10:51:58 +0000 (13:51 +0300)]
wil6210: treat "unhandled event" as warning instead of error

FW is allowed to generate WMI events that are not handled by this driver.
Treat such case as warning instead of error.

Signed-off-by: Dedy Lansky <qca_dlansky@qca.qualcomm.com>
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: improve mgmt frame handling
Vladimir Kondratiev [Thu, 30 Jul 2015 10:51:57 +0000 (13:51 +0300)]
wil6210: improve mgmt frame handling

Check event length;
hex dump both Rx and Tx frames

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: TSO implementation
Vladimir Kondratiev [Thu, 30 Jul 2015 10:51:56 +0000 (13:51 +0300)]
wil6210: TSO implementation

Driver report supported TSO (v4 & v6) and IP checksum offload
in addition to previously supported features. In data path
skbs are checked for non-zero gso_size, and when detected sent
to additional function for processing TSO SKBs. Since HW does not
fully support TSO, additional effort is required from the driver.
Driver partitions the data into mss sized descriptors which are
then DMAed to the HW.

Signed-off-by: Vladimir Shulman <QCA_shulmanv@QCA.qualcomm.com>
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: skip HW version check for chip debugging
Vladimir Kondratiev [Thu, 30 Jul 2015 10:51:55 +0000 (13:51 +0300)]
wil6210: skip HW version check for chip debugging

When loading with debug_fw flag, do not bail out on
unknown chipId

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: use wil_fw_error_recovery()
Vladimir Kondratiev [Thu, 30 Jul 2015 10:51:54 +0000 (13:51 +0300)]
wil6210: use wil_fw_error_recovery()

Use function wil_fw_error_recovery() instead of inline equivalent code

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: wait for del_station to complete
Vladimir Kondratiev [Thu, 30 Jul 2015 10:51:53 +0000 (13:51 +0300)]
wil6210: wait for del_station to complete

Multiple del_station requests may be sent to the driver by the
supplicant when turning down AP. This may overflow mailbox
between the FW and ucode

Wait till disconnect of one STA completed before sending next command.

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: use <> vs. "" for global include
Vladimir Kondratiev [Thu, 30 Jul 2015 10:51:52 +0000 (13:51 +0300)]
wil6210: use <> vs. "" for global include

linux/device.h should be included using <>, not ""
since it is not local include

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: print "ulong" fields in hex format in the debugfs
Vladimir Kondratiev [Thu, 30 Jul 2015 10:51:51 +0000 (13:51 +0300)]
wil6210: print "ulong" fields in hex format in the debugfs

In the debugfs, there is "ulong" attribute printing.
It is used for bitmap printing, and more appropriate format
would be hexadecimal, not decimal.

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: count drops in Rx block ack reorder
Vladimir Kondratiev [Thu, 30 Jul 2015 10:51:50 +0000 (13:51 +0300)]
wil6210: count drops in Rx block ack reorder

When performing Rx reordering, count skb's dropped
per reorder buffer; and print dropped packets count
on the "stations" debugfs entry

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agowil6210: support boot loader struct v0 & v1
Vladimir Kondratiev [Thu, 30 Jul 2015 10:51:49 +0000 (13:51 +0300)]
wil6210: support boot loader struct v0 & v1

There are 2 versions of boot loader struct: v0 and v1.
In the v1, boot loader build version added; as well as
RF status.

Support both versions.

Boot loader structure v1 has RF status; ignore RF error if firmware
not going to be loaded; driver can still be used to interact with the HW

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: initialize msdu ext. descriptor before use
Peter Oh [Wed, 29 Jul 2015 08:58:50 +0000 (11:58 +0300)]
ath10k: initialize msdu ext. descriptor before use

Initial QCA99X0 support has a known issue with TCP Tx throughput.
All other path such as UDP Tx/Rx and TCP Rx meet their expectation
(> 900Mbps), but TCP Tx marked as low as 5Mbps when single pair is
used on iperf.

The root cause is turned out because TSO flag is not initialized
properly so that firmware configures TSO in wrong way.
TSO flags in msdu extension descriptor is required to be reset
to indicate firmware there is no TSO is enabled, otherwise it
could act as TSO is enabled which causes huge throughput drop.

In fact, it's enough by resetting TSO flags only to prevent the
unexpected behavior, but initializing whole msdu ext. descriptor
will help to clear uncertainty of firmware could bring on as it
constantly updated.

Signed-off-by: Peter Oh <poh@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: add QCA99X0 to supported device list
Vasanthakumar Thiagarajan [Wed, 29 Jul 2015 08:40:39 +0000 (11:40 +0300)]
ath10k: add QCA99X0 to supported device list

Add vendor/device id of QCA99X0 V2.0 to pci id table and
QCA99X0_HW_2_0_CHIP_ID_REV to ath10k_pci_supp_chips[] for
QCA99X0 to get detected by the driver.

kvalo: now QCA99X0 family of chipsets is supported by ath10k.
Tested client, AP and monitor mode with QCA9990.

Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: increase max client to 512 in qca99x0
Raja Mani [Wed, 29 Jul 2015 08:40:38 +0000 (11:40 +0300)]
ath10k: increase max client to 512 in qca99x0

When max client was set to 512 in qca99x0, there was host memory
alloc failure during wmi service ready event handling. This issue
got resolved now, increasing max client limit from 256 to 512.

Signed-off-by: Raja Mani <rmani@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: fix memory alloc failure in qca99x0 during wmi svc rdy event
Raja Mani [Wed, 29 Jul 2015 08:40:38 +0000 (11:40 +0300)]
ath10k: fix memory alloc failure in qca99x0 during wmi svc rdy event

Host memory required for firmware is allocated while handling
wmi service ready event. Right now, wmi service ready is handled
in tasklet context and it calls dma_alloc_coherent() with atomic
flag (GFP_ATOMIC) to allocate memory in host needed for firmware.
The problem is, dma_alloc_coherent() with GFP_ATOMIC fails in
the platform (at least in AP platform) where it has less atomic
pool memory (< 2mb). QCA99X0 requires around 2 MB of host memory
for one card, having additional QCA99X0 card in the same platform
will require similarly amount of memory. So, it's not guaranteed that
all the platform will have enough atomic memory pool.

Fix this issue, by handling wmi service ready event in workqueue
context and calling dma_alloc_coherent() with GFP_KERNEL. mac80211 work
queue will not be ready at the time of handling wmi service ready.
So, it can't be used to handle wmi service ready. Also, register work
gets scheduled during insmod in existing ath10k_wq and waits for
wmi service ready to completed. Both workqueue can't be used for
this purpose. New auxiliary workqueue is added to handle wmi service
ready.

Signed-off-by: Raja Mani <rmani@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: enable raw encap mode and software crypto engine
David Liu [Fri, 24 Jul 2015 17:25:32 +0000 (20:25 +0300)]
ath10k: enable raw encap mode and software crypto engine

This patch enables raw Rx/Tx encap mode to support software based
crypto engine. This patch introduces a new module param 'cryptmode'.

 cryptmode:

   0: Use hardware crypto engine globally with native Wi-Fi mode TX/RX
      encapsulation to the firmware. This is the default mode.
   1: Use sofware crypto engine globally with raw mode TX/RX
      encapsulation to the firmware.

Known limitation:
   A-MSDU must be disabled for RAW Tx encap mode to perform well when
   heavy traffic is applied.

Testing: (by Michal Kazior <michal.kazior@tieto.com>)

     a) Performance Testing

      cryptmode=1
       ap=qca988x sta=killer1525
        killer1525  ->  qca988x     194.496 mbps [tcp1 ip4]
        killer1525  ->  qca988x     238.309 mbps [tcp5 ip4]
        killer1525  ->  qca988x     266.958 mbps [udp1 ip4]
        killer1525  ->  qca988x     477.468 mbps [udp5 ip4]
        qca988x     ->  killer1525  301.378 mbps [tcp1 ip4]
        qca988x     ->  killer1525  297.949 mbps [tcp5 ip4]
        qca988x     ->  killer1525  331.351 mbps [udp1 ip4]
        qca988x     ->  killer1525  371.528 mbps [udp5 ip4]
       ap=killer1525 sta=qca988x
        qca988x     ->  killer1525  331.447 mbps [tcp1 ip4]
        qca988x     ->  killer1525  328.783 mbps [tcp5 ip4]
        qca988x     ->  killer1525  375.309 mbps [udp1 ip4]
        qca988x     ->  killer1525  403.379 mbps [udp5 ip4]
        killer1525  ->  qca988x     203.689 mbps [tcp1 ip4]
        killer1525  ->  qca988x     222.339 mbps [tcp5 ip4]
        killer1525  ->  qca988x     264.199 mbps [udp1 ip4]
        killer1525  ->  qca988x     479.371 mbps [udp5 ip4]

      Note:
       - only open network tested for RAW vs nwifi performance comparison
       - killer1525 (qca6174 hw2.2) is 2x2 device (hence max 866mbps)
       - used iperf
       - OTA, devices a few cm apart from each other, no shielding
       - tcpX/udpX, X - means number of threads used

      Overview:
       - relative Tx performance drop is seen but is within reasonable and
         expected threshold (A-MSDU must be disabled with RAW Tx)

     b) Connectivity Testing

      cryptmode=1
       ap=iwl6205 sta1=qca988x crypto=open     topology-1ap1sta          OK
       ap=iwl6205 sta1=qca988x crypto=wep1     topology-1ap1sta          OK
       ap=iwl6205 sta1=qca988x crypto=wpa      topology-1ap1sta          OK
       ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta          OK
       ap=qca988x sta1=iwl6205 crypto=open     topology-1ap1sta          OK
       ap=qca988x sta1=iwl6205 crypto=wep1     topology-1ap1sta          OK
       ap=qca988x sta1=iwl6205 crypto=wpa      topology-1ap1sta          OK
       ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta          OK
       ap=iwl6205 sta1=qca988x crypto=open     topology-1ap1sta2br       OK
       ap=iwl6205 sta1=qca988x crypto=wep1     topology-1ap1sta2br       OK
       ap=iwl6205 sta1=qca988x crypto=wpa      topology-1ap1sta2br       OK
       ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta2br       OK
       ap=qca988x sta1=iwl6205 crypto=open     topology-1ap1sta2br       OK
       ap=qca988x sta1=iwl6205 crypto=wep1     topology-1ap1sta2br       OK
       ap=qca988x sta1=iwl6205 crypto=wpa      topology-1ap1sta2br       OK
       ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta2br       OK
       ap=iwl6205 sta1=qca988x crypto=open     topology-1ap1sta2br1vlan  OK
       ap=iwl6205 sta1=qca988x crypto=wep1     topology-1ap1sta2br1vlan  OK
       ap=iwl6205 sta1=qca988x crypto=wpa      topology-1ap1sta2br1vlan  OK
       ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta2br1vlan  OK
       ap=qca988x sta1=iwl6205 crypto=open     topology-1ap1sta2br1vlan  OK
       ap=qca988x sta1=iwl6205 crypto=wep1     topology-1ap1sta2br1vlan  OK
       ap=qca988x sta1=iwl6205 crypto=wpa      topology-1ap1sta2br1vlan  OK
       ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta2br1vlan  OK

      Note:
       - each test takes all possible endpoint pairs and pings
       - each pair-ping flushes arp table
       - ip6 is used

     c) Testbed Topology:

      1ap1sta:
        [ap] ---- [sta]

        endpoints: ap, sta

      1ap1sta2br:
        [veth0] [ap] ---- [sta] [veth2]
           |     |          |     |
        [veth1]  |          \   [veth3]
            \   /            \  /
            [br0]            [br1]

        endpoints: veth0, veth2, br0, br1
        note: STA works in 4addr mode, AP has wds_sta=1

      1ap1sta2br1vlan:
        [veth0] [ap] ---- [sta] [veth2]
           |     |          |     |
        [veth1]  |          \   [veth3]
            \   /            \  /
          [br0]              [br1]
            |                  |
          [vlan0_id2]        [vlan1_id2]

        endpoints: vlan0_id2, vlan1_id2
        note: STA works in 4addr mode, AP has wds_sta=1

Credits:

    Thanks to Michal Kazior <michal.kazior@tieto.com> who helped find the
    amsdu issue, contributed a workaround (already squashed into this
    patch), and contributed the throughput and connectivity tests results.

Signed-off-by: David Liu <cfliu.tw@gmail.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Tested-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: Improve performance by reducing tx_lock contention
Qi Zhou [Wed, 22 Jul 2015 20:38:24 +0000 (16:38 -0400)]
ath10k: Improve performance by reducing tx_lock contention

During tx completion, tx_lock is held for longer than required, preventing
efficient refill of htt->pending_tx. Refactor the code so that only MSDU
related operations are protected by the lock.

Improves downstream performance on a dual-core ARM Freescale LS1024A
(f.k.a. Mindspeed Comcerto 2000) AP with a 3x3 client from 495 to 580 Mbps.
Other CPU bound multicore systems may also benefit.

Signed-off-by: Denton Gentry <dgentry@google.com>
Signed-off-by: Avery Pennarun <apenwarr@google.com>
[mfaltesek@google.com: removed conflicting code for tracking msdu_ids.]
Signed-off-by: Marty Faltesek <mfaltesek@google.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: suppress 'failed to process fft' warning messages
Kevin Darbyshire-Bryant [Tue, 21 Jul 2015 14:50:15 +0000 (15:50 +0100)]
ath10k: suppress 'failed to process fft' warning messages

When using DFS channels on Ath10k, kernel log has repeated warning message
'failed to process fft: -22' typically under medium/heavy traffic.

This patch switches the warnings to driver debug (WMI events) mode only
thus reducing log file noise.

DFS and spectral scan share underlying HW mechanisms and enabling one
(DFS) enables the other (spectral scan) as far as event reporting from
firmware to driver is concerned. Spectral scan events take no part in
processing of DFS radar pulses which are delivered as distinct events,
so the fft (spectral event) warning is harmless and DFS interference
detection/protection still occurs.

Symptoms seen & fix tested in both debug & non-debug modes on TP-Link
Archer C7 v2 platform.

Signed-off-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath6kl: spell "distribution" correctly in a comment.
Nik Nyby [Mon, 29 Jun 2015 17:30:37 +0000 (13:30 -0400)]
ath6kl: spell "distribution" correctly in a comment.

This fixes two misspellings of "distribution" in a comment.

Signed-off-by: Nik Nyby <nikolas@gnu.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: extend struct htt_mgmt_tx_dec for qca99x0
Raja Mani [Tue, 21 Jul 2015 05:22:00 +0000 (10:52 +0530)]
ath10k: extend struct htt_mgmt_tx_dec for qca99x0

HTT_H2T_MSG_TYPE_MGMT_TX msg in 10.4 firmware carries additional
4 byte in htt_mgmt_tx_desc where it tells to firmware that at what
rate mgmt frame has to go out in the air. It's an optional parameter,
setting this field to zero will force firmware to choose auto rate
and send the frame out.

Those 4 byte info is missed out in the current code and 10.4 firmware
ended up reading some junk in those 4 byte and sometime malfunctioning.

Fix it by adding 4 byte in struct htt_mgmt_tx_desc. Non 10.4 firmware
will not process those four byte. So, adding 4 byte at the end of
struct htt_mgmt_tx_desc will not create any impact on other chipset.

Signed-off-by: Raja Mani <rmani@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: add TCP/UDP Checksum offload support for QCA99x0
Manikanta Pubbisetty [Mon, 20 Jul 2015 12:26:12 +0000 (17:56 +0530)]
ath10k: add TCP/UDP Checksum offload support for QCA99x0

The patch adds support to offload TCP/UDP checksum
calculations for QCA99x0.

Signed-off-by: Manikanta Pubbisetty <c_mpubbi@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: fix wrong initialization of struct channel
Maninder Singh [Thu, 16 Jul 2015 03:55:33 +0000 (09:25 +0530)]
ath10k: fix wrong initialization of struct channel

chandef is initialized with NULL and on the very next line, we are using it to
get channel, which is not correct. Channel should be initialized after
obtaining chandef.

Found by cppcheck:

ath/ath10k/mac.c:839]: (error) Possible null pointer dereference: chandef

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: add support for qca99x0 Rx descriptors
Peter Oh [Thu, 16 Jul 2015 02:01:21 +0000 (19:01 -0700)]
ath10k: add support for qca99x0 Rx descriptors

QCA99X0 chip has an extra 4 bytes in rx_msdu_start,
20 bytes in rx_msdu_end and 20 bytes in rx_ppdu_end structure
which are used in htt_rx_desc and HTT Rx ring offset setup.
This is necessary for correct Rx for QCA99X0 or Rx descriptors
will be overwritten and corrupted.

With this patch QCA988X and QCA6174 will have extra 44 bytes
padding in Rx descriptor layout which is harmless.

Signed-off-by: Peter Oh <poh@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: redefine rx_ppdu_end_common structure to cover qca99x0
Peter Oh [Thu, 16 Jul 2015 02:01:20 +0000 (19:01 -0700)]
ath10k: redefine rx_ppdu_end_common structure to cover qca99x0

rx_ppdu_end_common structure is valid for both of qca998x and
qca6174, but not for qca99x0 since it has new additional members.
Hence update the common structure to cover qca99x0 as well.

Signed-off-by: Peter Oh <poh@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: update tx path to support QCA99X0
Peter Oh [Thu, 16 Jul 2015 02:01:19 +0000 (19:01 -0700)]
ath10k: update tx path to support QCA99X0

Since QCA99X0 uses fragmentation descriptor differently from
other ones on tx path, we need to handle it separately.

QCA99X0 is using 48 bits for address and 16 bits for length
out of 2 dword and each values have to be programmed by frag
desc base addr + msdu id, so that hardware can retrieve
corresponding frag data.

Signed-off-by: Peter Oh <poh@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: delay device access after cold reset
Vasanthakumar Thiagarajan [Fri, 10 Jul 2015 09:01:20 +0000 (14:31 +0530)]
ath10k: delay device access after cold reset

It is observed that during cold reset pcie access right
after a write operation to SOC_GLOBAL_RESET_ADDRESS causes
Data Bus Error and system hard lockup. The reason
for bus error is that pcie needs some time to get
back to stable state for any transaction during cold reset. Add
delay of 20 msecs after write of SOC_GLOBAL_RESET_ADDRESS
to fix this issue. This patch is tested on QCA988X. This is
also tested on QCA99X0 which is WIP.

Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoMerge ath-next from ath.git.
Kalle Valo [Tue, 21 Jul 2015 08:36:56 +0000 (11:36 +0300)]
Merge ath-next from ath.git.

Major changes in ath10k:

* enable VHT for IBSS
* initial work to support qca99x0 and the corresponding 10.4 firmware branch

9 years agoath9k: DFS - add pulse chirp detection for FCC
Zefir Kurtisi [Tue, 16 Jun 2015 10:52:16 +0000 (12:52 +0200)]
ath9k: DFS - add pulse chirp detection for FCC

FCC long pulse radar (type 5) requires pulses to be
checked for chirping. This patch implements chirp
detection based on the FFT data provided for long
pulses.

A chirp is detected when a set of criteria defined
by FCC pulse characteristics is met, including
* have at least 4 FFT samples
* max_bin index moves equidistantly between samples
* the gradient is within defined range

The chirp detection has been tested with reference
radar generating devices and proved to work reliably.

Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
9 years agoath9k: DFS - consider ext_channel pulses only in HT40 mode
Zefir Kurtisi [Tue, 16 Jun 2015 09:46:42 +0000 (11:46 +0200)]
ath9k: DFS - consider ext_channel pulses only in HT40 mode

The chip reports radar pulses on extension channel
even if operating in HT20 mode. This patch adds a
sanity check for HT40 mode before it feeds pulses
on extension channel to the pattern detector.

Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
9 years agoipw2100: fix timeout bug - always evaluated to 0
Nicholas Mc Guire [Mon, 15 Jun 2015 17:24:35 +0000 (19:24 +0200)]
ipw2100: fix timeout bug - always evaluated to 0

commit 2c86c275015c ("Add ipw2100 wireless driver.") introduced
HW_PHY_OFF_LOOP_DELAY (HZ / 5000) which always evaluated to 0. Clarified
by Stanislav Yakovlev <stas.yakovlev@gmail.com> that it should be 50
milliseconds thus fixed up to msecs_to_jiffies(50).

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Acked-by: Stanislav Yakovlev <stas.yakovlev@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
9 years agoath10k: tweak interface combinations
Michal Kazior [Thu, 9 Jul 2015 11:08:39 +0000 (13:08 +0200)]
ath10k: tweak interface combinations

Concurrent AP/GO operation on different channels
isn't really supported well by the firmware so
it's better to remove it from being advertised.

Also tune the way station and p2p client interface
limits are expressed to allow station + 2x p2p
client or station + p2p client + p2p go.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: fix per-vif queue locking
Michal Kazior [Thu, 9 Jul 2015 11:08:38 +0000 (13:08 +0200)]
ath10k: fix per-vif queue locking

Whenever any vdev was supposed to be paused all Tx
queues were stopped (except offchannel) instead of
only these associated with the given vdev.

This caused subtle issues with
multi-channel/multi-vif scenarios, e.g.
authentication of station vif could sometimes fail
depending on fw tx pause request timing.

Fixes: b4aa539dd8f2 ("ath10k: implement tx pause wmi event")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: update vdev ps state on start
Michal Kazior [Thu, 9 Jul 2015 11:08:37 +0000 (13:08 +0200)]
ath10k: update vdev ps state on start

Psmode can be forcefully enabled when vdev isn't
started. It isn't guaranteed that mac80211 will
re-issue psmode setting after vdev is started
unless actual bss_conf.ps value has changed.

Even if this doesn't fix any problems now it may
prevent future breakage.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: fix hw roc expiration notifcation
Michal Kazior [Thu, 9 Jul 2015 11:08:36 +0000 (13:08 +0200)]
ath10k: fix hw roc expiration notifcation

The expiration function must not be called when
roc is explicitly cancelled by mac80211. However
since fcf9844636be ("ath10k: fix hw roc
expiration") the notification was never sent when
roc actually expired.

This fixes some P2P connection setup issues.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: limit multi-vif ps more aggresivelly
Michal Kazior [Thu, 9 Jul 2015 11:08:35 +0000 (13:08 +0200)]
ath10k: limit multi-vif ps more aggresivelly

Further testing proved that multi-channel AP+STA
on QCA6174 with RM.2.0-00088 should have powersave
force-disabled to avoid beacon misses/skipping on
either side which in turn could disrupt
communication.

Since AP never has arvif->ps don't even bother
checking it. Other combinations may be broken as
well so disallow powersave with multivif outright
unless firmware advertises otherwise.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: don't set cck/ofdm scan flags
Michal Kazior [Thu, 9 Jul 2015 11:08:34 +0000 (13:08 +0200)]
ath10k: don't set cck/ofdm scan flags

mac80211 already does provide complete IEs for
Probe Requests for hw scan and ath10k firmware was
appending duplicate Supported Rates IEs
unnecessarily.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: Fix target to cpu address conversion logic
Vasanthakumar Thiagarajan [Fri, 3 Jul 2015 13:55:27 +0000 (19:25 +0530)]
ath10k: Fix target to cpu address conversion logic

In commit 418ca5992e2f ("ath10k: Make target cpu address to
CE address conversion chip specific") mask 0x7fff is added
by mistake instead of 0x7ff. Fix this regression.

Fixes: 418ca5992e2f ("ath10k: Make target cpu address to CE address conversion chip specific")
Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoath10k: fix QCA61X4 boot up
Bartosz Markowski [Fri, 3 Jul 2015 13:33:49 +0000 (15:33 +0200)]
ath10k: fix QCA61X4 boot up

commit a521ee983d312db7 ("ath10k: Add new reg_address/mask to hw register
table") broke QCA61x4 support by providing wrong
fw_indicator_address, which should have been 0x0003a028 instead of 0x00009028.

User experience was a failing boot up sequence (crashing device during
initialization):

[  181.663874] ath10k_pci 0000:02:00.0: enabling device (0000 -> 0002)
[  181.664787] ath10k_pci 0000:02:00.0: pci irq msi-x interrupts 8 irq_mode 0 reset_mode 0
[  181.688886] ath10k_pci 0000:02:00.0: device has crashed during init
[  181.688897] ath10k_pci 0000:02:00.0: failed to wait for target after cold reset: -70
[  181.688902] ath10k_pci 0000:02:00.0: failed to reset chip: -70
[  181.689774] ath10k_pci: probe of 0000:02:00.0 failed with error -70

Fix it by updating the address with correct value.

Fixes: a521ee983d31 ("ath10k: Add new reg_address/mask to hw register table")
Signed-off-by: Bartosz Markowski <bartosz.markowski@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
9 years agoMerge branch 'be2net-next'
David S. Miller [Sat, 11 Jul 2015 06:24:31 +0000 (23:24 -0700)]
Merge branch 'be2net-next'

Sathya Perla says:

====================
be2net: patch set

Hi David, the following patch set has code cleanup patches, minor enhancements
and non-critical fixes. Pls consider applying to the net-next tree. Thanks!

Patch 1 removes duplicate code in be_setup_wol() routine making it simpler
and more readable.

Patch 2 fixes the the bridge mode return value for the ndo_bridge_getlink()
call. Instead of just relying on the SRIOV enabled state, the driver now
queries the FW, for the actual mode of bridge.

Patch 3 removes code for setting D0 power state as it's already done
in pci_enable_device()

Patch 4 fixes a bad return value in be_check_ufi_compatibility() routine
introduced by an earlier commit.

Patch 5 fixes a field in udp header being accessed while in network endian
format.

Patch 6 fixes the be_mcc_notify() routine to return an error status when
the FW/HW is in an error state.

Patch 7 fixes the be_cmd_rx_filter() routine to issue the RX_FILTER cmd
and not wait for a completion from the FW. If the FW/adapter
is in an error state, this change helps in not holding up the rtnl_lock
and keeping bottom halves disabled while the driver timesout waiting for
a response from the FW.

Patch 8 fixes the be_cmd_set_loopback() routine to issue the LOOPBACK cmd
and not wait for the FW completion while spin_lock_bh() is held on the
mcc_lock. As the cmd is always issued from ethtool in a process context,
it can sleep till the FW completion is received.

Patch 9 bumps up the driver version to 10.6.0.3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: bump up the driver version to 10.6.0.3
Sathya Perla [Fri, 10 Jul 2015 09:32:51 +0000 (05:32 -0400)]
be2net: bump up the driver version to 10.6.0.3

Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: make SET_LOOPBACK_MODE cmd asynchrounous
Suresh Reddy [Fri, 10 Jul 2015 09:32:50 +0000 (05:32 -0400)]
be2net: make SET_LOOPBACK_MODE cmd asynchrounous

The SET_LOOPBACK_MODE command is always issued from ethtool only in a
process context. So, while waiting for the cmd to complete, the driver
can sleep instead of holding spin_lock_bh() on the mcc_lock. This is done
by calling be_mcc_notify() instead of be_mcc_notify_wait() (that returns
only after the cmd completes while the MCCQ is locked).

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: make the RX_FILTER command asynchronous
Suresh Reddy [Fri, 10 Jul 2015 09:32:49 +0000 (05:32 -0400)]
be2net: make the RX_FILTER command asynchronous

This fix makes the RX_FILTER cmd asynchronous, i.e., the caller issues
this cmd and doesn't wait for a completion from the FW. If the FW/adapter
is in an error state, this change helps in not holding up the rtnl_lock
and keeping bottom halves disabled while the driver timesout waiting for
a response from the FW.

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: return error status from be_mcc_notify()
Suresh Reddy [Fri, 10 Jul 2015 09:32:48 +0000 (05:32 -0400)]
be2net: return error status from be_mcc_notify()

When the adapter is in error state, return error from be_mcc_notify()
so that the caller routines need not sleep waiting for a response.

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: convert dest field in udp-hdr to host-endian
Venkat Duvvuru [Fri, 10 Jul 2015 09:32:47 +0000 (05:32 -0400)]
be2net: convert dest field in udp-hdr to host-endian

The "dest" field in the UDP-hdr of a TX skb is in network endian format.
Convert it to host endian before accessing it. The os2bmc patch,
mentioned below introduced this code.

Fixes: 760c295e0e8d ("be2net: Support for OS2BMC")
Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: fix wrong return value in be_check_ufi_compatibility()
Vasundhara Volam [Fri, 10 Jul 2015 09:32:46 +0000 (05:32 -0400)]
be2net: fix wrong return value in be_check_ufi_compatibility()

In the commit a6e6ff6eee12f3e
("be2net: simplify UFI compatibility checking"), a return value of "-1"
was incorrectly used in place of "false". This patch fixes it.

Fixes: a6e6ff6eee12f3e ("be2net: simplify UFI compatibility checking")
Signed-off-by: Vasundhara Volam <vasundhara.volam@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: remove redundant D0 power state set
Kalesh Purayil [Fri, 10 Jul 2015 09:32:45 +0000 (05:32 -0400)]
be2net: remove redundant D0 power state set

pci_enable_device() call sets device power state to D0; there is no need
doing it again.

Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: query FW to check if EVB is enabled
Kalesh Purayil [Fri, 10 Jul 2015 09:32:44 +0000 (05:32 -0400)]
be2net: query FW to check if EVB is enabled

The current code assumes that bridge functionality (EVB) in the adapter
is enabled only when SR-IOV is enabled. This is not always true.
This patch uses the GET_HSW_CONFIG FW cmd to query this from the FW.

Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: remove duplicate code in be_setup_wol()
Kalesh Purayil [Fri, 10 Jul 2015 09:32:43 +0000 (05:32 -0400)]
be2net: remove duplicate code in be_setup_wol()

This change will make be_setup_wol() routine more compact and readable
by removing some duplicate code.

Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: Do not iterate over all interfaces when finding source address on specific...
YOSHIFUJI Hideaki/吉藤英明 [Fri, 10 Jul 2015 07:58:31 +0000 (16:58 +0900)]
ipv6: Do not iterate over all interfaces when finding source address on specific interface.

If outgoing interface is specified and the candidate address is
restricted to the outgoing interface, it is enough to iterate
over that given interface only.

Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Acked-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Drop owner assignment from platform_driver
Krzysztof Kozlowski [Fri, 10 Jul 2015 06:29:23 +0000 (15:29 +0900)]
net: Drop owner assignment from platform_driver

platform_driver does not need to set an owner because
platform_driver_register() will set it.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: Support setting polarity in marvell phy driver
David Thomson [Fri, 10 Jul 2015 04:28:25 +0000 (16:28 +1200)]
net: phy: Support setting polarity in marvell phy driver

Support manually setting the polarity to mdi or mdix

Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: Pass mdix ethtool setting through to phy driver
David Thomson [Fri, 10 Jul 2015 01:56:54 +0000 (13:56 +1200)]
net: phy: Pass mdix ethtool setting through to phy driver

Pass the mdix setting from ethtool down to the phy driver, to allow
driver specific implementations of manually setting the polarity.

Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: do not export tcp_init_xmit_timers()
Eric Dumazet [Thu, 9 Jul 2015 16:01:40 +0000 (18:01 +0200)]
tcp: do not export tcp_init_xmit_timers()

After commit 900f65d361d3 ("tcp: move duplicate code from
tcp_v4_init_sock()/tcp_v6_init_sock()"), we no longer
need to export tcp_init_xmit_timers()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: mdb: fill state in br_mdb_notify
Nikolay Aleksandrov [Thu, 9 Jul 2015 10:11:10 +0000 (03:11 -0700)]
bridge: mdb: fill state in br_mdb_notify

Fill also the port group state when sending notifications.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoroute: remove unsed variable in __mkroute_input
Masatake YAMATO [Thu, 9 Jul 2015 03:46:35 +0000 (12:46 +0900)]
route: remove unsed variable in __mkroute_input

flags local variable in __mkroute_input is not used as a variable.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: Nonlocal bind
Tom Herbert [Wed, 8 Jul 2015 23:58:22 +0000 (16:58 -0700)]
ipv6: Nonlocal bind

Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.

This add the ip_nonlocal_bind sysctl under ipv6.

Testing:

Set up nonlocal binding and receive routing on a host, e.g.:

ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1

Set up routing to 2001:0:0:1::/64 on peer to go to first host

ping6 -I 2001:0:0:1::1 peer-address -- to verify

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tw_cleanups'
David S. Miller [Thu, 9 Jul 2015 22:12:21 +0000 (15:12 -0700)]
Merge branch 'tw_cleanups'

Eric Dumazet says:

====================
inet: timewait cleanups

Another round of patches to make tw handling simpler.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoinet: inet_twsk_deschedule factorization
Eric Dumazet [Wed, 8 Jul 2015 21:28:30 +0000 (14:28 -0700)]
inet: inet_twsk_deschedule factorization

inet_twsk_deschedule() calls are followed by inet_twsk_put().

Only particular case is in inet_twsk_purge() but there is no point
to defer the inet_twsk_put() after re-enabling BH.

Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put()
and move the inet_twsk_put() inside.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoinet: simplify timewait refcounting
Eric Dumazet [Wed, 8 Jul 2015 21:28:29 +0000 (14:28 -0700)]
inet: simplify timewait refcounting

timewait sockets have a complex refcounting logic.
Once we realize it should be similar to established and
syn_recv sockets, we can use sk_nulls_del_node_init_rcu()
and remove inet_twsk_unhash()

In particular, deferred inet_twsk_put() added in commit
13475a30b66cd ("tcp: connect() race with timewait reuse")
looks unecessary : When removing a timewait socket from
ehash or bhash, caller must own a reference on the socket
anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoinet: remove BUG_ON() in twsk_destructor()
Eric Dumazet [Wed, 8 Jul 2015 21:28:28 +0000 (14:28 -0700)]
inet: remove BUG_ON() in twsk_destructor()

Kernel will crash the same if one of the pointer is NULL anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: use flag instead of u16 for hop in inet6_skb_parm
Florian Westphal [Wed, 8 Jul 2015 21:32:12 +0000 (23:32 +0200)]
ipv6: use flag instead of u16 for hop in inet6_skb_parm

Hop was always either 0 or sizeof(struct ipv6hdr).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodsa: mv88e6352/mv88e6xxx: Add support for Marvell 88E6320 and 88E6321
Aleksey S. Kazantsev [Wed, 8 Jul 2015 03:38:15 +0000 (20:38 -0700)]
dsa: mv88e6352/mv88e6xxx: Add support for Marvell 88E6320 and 88E6321

MV88E6320 and MV88E6321 are largely compatible to MV886352,
but are members of a different chip family.

Signed-off-by: Aleksey S. Kazantsev <ioctl@yandex.ru>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tcp-in-slow-start'
David S. Miller [Thu, 9 Jul 2015 21:22:53 +0000 (14:22 -0700)]
Merge branch 'tcp-in-slow-start'

Yuchung Cheng says:

====================
tcp: fixes some congestion control corner cases

This patch series fixes corner cases of TCP congestion control.
First issue is to avoid continuing slow start when cwnd reaches ssthresh.
Second issue is incorrectly processing order of congestion state and
cwnd update when entering fast recovery or undoing cwnd.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: update congestion state first before raising cwnd
Yuchung Cheng [Thu, 9 Jul 2015 20:16:31 +0000 (13:16 -0700)]
tcp: update congestion state first before raising cwnd

The congestion state and cwnd can be updated in the wrong order.
For example, upon receiving a dubious ACK, we incorrectly raise
the cwnd first (tcp_may_raise_cwnd()/tcp_cong_avoid()) because
the state is still Open, then enter recovery state to reduce cwnd.

For another example, if the ACK indicates spurious timeout or
retransmits, we first revert the cwnd reduction and congestion
state back to Open state.  But we don't raise the cwnd even though
the ACK does not indicate any congestion.

To fix this problem we should first call tcp_fastretrans_alert() to
process the dubious ACK and update the congestion state, then call
tcp_may_raise_cwnd() that raises cwnd based on the current state.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: do not slow start when cwnd equals ssthresh
Yuchung Cheng [Thu, 9 Jul 2015 20:16:30 +0000 (13:16 -0700)]
tcp: do not slow start when cwnd equals ssthresh

In the original design slow start is only used to raise cwnd
when cwnd is stricly below ssthresh. It makes little sense
to slow start when cwnd == ssthresh: especially
when hystart has set ssthresh in the initial ramp, or after
recovery when cwnd resets to ssthresh. Not doing so will
also help reduce the buffer bloat slightly.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: add tcp_in_slow_start helper
Yuchung Cheng [Thu, 9 Jul 2015 20:16:29 +0000 (13:16 -0700)]
tcp: add tcp_in_slow_start helper

Add a helper to test the slow start condition in various congestion
control modules and other places. This is to prepare a slight improvement
in policy as to exactly when to slow start.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: skb_defer_rx_timestamp should check for phydev before setting up classify
Alexander Duyck [Thu, 9 Jul 2015 18:02:52 +0000 (11:02 -0700)]
net: skb_defer_rx_timestamp should check for phydev before setting up classify

This change makes it so that the call skb_defer_rx_timestamp will first
check for a phydev before going in and manipulating the skb->data and
skb->len values.  By doing this we can avoid unnecessary work on network
devices that don't support phydev.  As a result we reduce the total
instruction count needed to process this on most devices.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: v1 always send a quick ack when quickacks are enabled
Jon Maxwell [Wed, 8 Jul 2015 00:12:28 +0000 (10:12 +1000)]
tcp: v1 always send a quick ack when quickacks are enabled

V1 of this patch contains Eric Dumazet's suggestion to move the per
dst RTAX_QUICKACK check into tcp_in_quickack_mode(). Thanks Eric.

I ran some tests and after setting the "ip route change quickack 1"
knob there were still many delayed ACKs sent. This occured
because when icsk_ack.quick=0 the !icsk_ack.pingpong value is
subsequently ignored as tcp_in_quickack_mode() checks both these
values. The condition for a quick ack to trigger requires
that both icsk_ack.quick != 0 and icsk_ack.pingpong=0. Currently
only icsk_ack.pingpong is controlled by the knob. But the
icsk_ack.quick value changes dynamically depending on heuristics.
The crux of the matter is that delayed acks still cannot be entirely
disabled even with the RTAX_QUICKACK per dst knob enabled. This
patch ensures that a quick ack is always sent when the RTAX_QUICKACK
per dst knob is turned on.

The "ip route change quickack 1" knob was recently added to enable
quickacks. It was modeled around the TCP_QUICKACK setsockopt() option.
This issue is that even with "ip route change quickack 1" enabled
we still see delayed ACKs under some conditions. It would be nice
to be able to completely disable delayed ACKs.

Here is an example:

# netstat -s|grep dela
    3 delayed acks sent

For all routes enable the knob

# ip route change quickack 1

Generate some traffic across a slow link and we still see the delayed
acks.

# netstat -s|grep dela
    106 delayed acks sent
    1 delayed acks further delayed because of locked socket

The issue is that both the "ip route change quickack 1" knob and
the TCP_QUICKACK option set the icsk_ack.pingpong variable to 0.
However at the business end in the __tcp_ack_snd_check() routine,
tcp_in_quickack_mode() checks that both icsk_ack.quick != 0
and icsk_ack.pingpong=0 in order to trigger a quickack. As
icsk_ack.quick is determined by heuristics it can be 0. When
that occurs the icsk_ack.pingpong value is ignored and a delayed
ACK is sent regardless.

This patch moves the RTAX_QUICKACK per dst check into the
tcp_in_quickack_mode() routine which ensures that a quickack is
always sent when the quickack knob is enabled for that dst.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorocker: add change MTU support
Scott Feldman [Wed, 8 Jul 2015 23:06:47 +0000 (16:06 -0700)]
rocker: add change MTU support

Implement ndo_change_mtu: on MTU change, reallocate Rx ring bufs and signal
HW of new port MTU value.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Tested-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoneterion: s2io: Use module_pci_driver
Vaishali Thakkar [Thu, 9 Jul 2015 04:55:39 +0000 (10:25 +0530)]
neterion: s2io: Use module_pci_driver

Use module_pci_driver for drivers whose init and exit functions
only register and unregister, respectively.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@a@
identifier f, x;
@@
-static f(...) { return pci_register_driver(&x); }

@b depends on a@
identifier e, a.x;
statement S;
@@
-static e(...) {
-pci_unregister_driver(&x);
-DBG_PRINT(INIT_DBG,"S");
- }

@c depends on a && b@
identifier a.f;
declarer name module_init;
@@
-module_init(f);

@d depends on a && b && c@
identifier b.e, a.x;
declarer name module_exit;
declarer name module_pci_driver;
@@
-module_exit(e);
+module_pci_driver(x);

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4vf: Fix check to use new User Doorbell mechanism
Hariprasad Shenai [Thu, 9 Jul 2015 09:25:46 +0000 (14:55 +0530)]
cxgb4vf: Fix check to use new User Doorbell mechanism

If we don't have access to the new User GTS (T5+), use the old doorbell
mechanism; otherwise use the new BAR2 mechanism.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotest_bpf: extend tests for 32-bit endianness conversion
Xi Wang [Wed, 8 Jul 2015 21:00:56 +0000 (14:00 -0700)]
test_bpf: extend tests for 32-bit endianness conversion

Currently "ALU_END_FROM_BE 32" and "ALU_END_FROM_LE 32" do not test if
the upper bits of the result are zeros (the arm64 JIT had such bugs).
Extend the two tests to catch this.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'cxgb4-t6'
David S. Miller [Wed, 8 Jul 2015 23:13:55 +0000 (16:13 -0700)]
Merge branch 'cxgb4-t6'

Hariprasad Shenai says:

====================
Cleanup, T6 changes and register range update

This patch series adds the following:
Don't use entire L2T table, update register ranges for T6 adapter,
read stats for only available channels for T6 and enable cim_la dump for
T6 adapter also.

This patch series has been created against net-next tree and includes
patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Enable cim_la dump to support T6
Hariprasad Shenai [Tue, 7 Jul 2015 16:19:21 +0000 (21:49 +0530)]
cxgb4: Enable cim_la dump to support T6

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Read stats for only available channels
Hariprasad Shenai [Tue, 7 Jul 2015 16:19:20 +0000 (21:49 +0530)]
cxgb4: Read stats for only available channels

Updating the driver to read the stats of only available channels. T6 and
later has only 2 channels

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Update register ranges for T6 adapter
Hariprasad Shenai [Tue, 7 Jul 2015 16:19:19 +0000 (21:49 +0530)]
cxgb4: Update register ranges for T6 adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Don't use entire L2T table, use only its slice
Hariprasad Shenai [Tue, 7 Jul 2015 16:19:18 +0000 (21:49 +0530)]
cxgb4: Don't use entire L2T table, use only its slice

The driver was retrieving the parameters for the bounds of its
slice of the L2T from the firmware and then throwing those away and
using the entire table. This corrects that problem.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ec_bhf: Use module_pci_driver
Vaishali Thakkar [Tue, 7 Jul 2015 07:02:54 +0000 (12:32 +0530)]
net: ec_bhf: Use module_pci_driver

Use module_pci_driver for drivers whose init and exit functions
only register and unregister, respectively.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@a@
identifier f, x;
@@
-static f(...) { return pci_register_driver(&x); }

@b depends on a@
identifier e, a.x;
@@
-static e(...) { pci_unregister_driver(&x); }

@c depends on a && b@
identifier a.f;
declarer name module_init;
@@
-module_init(f);

@d depends on a && b && c@
identifier b.e, a.x;
declarer name module_exit;
declarer name module_pci_driver;
@@
-module_exit(e);
+module_pci_driver(x);

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agohv_netvsc: Add support to set MTU reservation from guest side
Haiyang Zhang [Mon, 6 Jul 2015 21:11:37 +0000 (14:11 -0700)]
hv_netvsc: Add support to set MTU reservation from guest side

When packet encapsulation is in use, the MTU needs to be reduced for
headroom reservation.
The existing code takes the updated MTU value only from the host side.
But vSwitch extensions, such as Open vSwitch, require the flexibility
to change the MTU to different values from within a guest during the
lifecycle of a vNIC, when the encapsulation protocol is changed. The
patch supports this kind of MTU changes.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoifb: add multiqueue operation
Eric Dumazet [Mon, 6 Jul 2015 20:05:28 +0000 (22:05 +0200)]
ifb: add multiqueue operation

Add multiqueue capabilities to ifb netdevice.

This removes last bottleneck for ingress when mq qdisc can be used
to shard load from multiple RX queues on physical device.

Tested:

# netem based setup, installed at receiver side
ETH=eth0
IFB=ifb10
EST="est 1sec 4sec" # Optional rate estimator
RTT_HALF=2ms
#REORDER=20us
#LOSS="loss 1"
TXQ=8

ip link add ifb10 numtxqueues $TXQ type ifb
ip link set dev $IFB up

tc qdisc add dev $ETH ingress 2>/dev/null

tc filter add dev $ETH parent ffff: \
   protocol ip u32 match u32 0 0 flowid 1:1 \
action mirred egress redirect dev $IFB

tc qdisc del dev $IFB root 2>/dev/null

tc qdisc add dev $IFB root handle 1: mq
for i in `seq 1 $TXQ`
do
 slot=$( printf %x $(( i )) )
 tc qd add dev $IFB parent 1:$slot $EST netem \
limit 100000 delay $RTT_HALF $REORDER $LOSS
done

lpaa24:~# tc -s -d qd sh dev ifb10
qdisc mq 1: root
 Sent 316544766 bytes 5265927 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 98880b 1648p requeues 0
qdisc netem 8002: parent 1:1 limit 100000 delay 2.0ms
 Sent 39601416 bytes 658721 pkt (dropped 0, overlimits 0 requeues 0)
 rate 38235Kbit 79657pps backlog 12240b 204p requeues 0
qdisc netem 8003: parent 1:2 limit 100000 delay 2.0ms
 Sent 39472866 bytes 657227 pkt (dropped 0, overlimits 0 requeues 0)
 rate 38234Kbit 79655pps backlog 10620b 176p requeues 0
qdisc netem 8004: parent 1:3 limit 100000 delay 2.0ms
 Sent 39703417 bytes 659699 pkt (dropped 0, overlimits 0 requeues 0)
 rate 38320Kbit 79831pps backlog 12780b 213p requeues 0
qdisc netem 8005: parent 1:4 limit 100000 delay 2.0ms
 Sent 39565149 bytes 658011 pkt (dropped 0, overlimits 0 requeues 0)
 rate 38174Kbit 79530pps backlog 11880b 198p requeues 0
qdisc netem 8006: parent 1:5 limit 100000 delay 2.0ms
 Sent 39506078 bytes 657354 pkt (dropped 0, overlimits 0 requeues 0)
 rate 38195Kbit 79571pps backlog 12480b 208p requeues 0
qdisc netem 8007: parent 1:6 limit 100000 delay 2.0ms
 Sent 39675994 bytes 658849 pkt (dropped 0, overlimits 0 requeues 0)
 rate 38323Kbit 79838pps backlog 12600b 210p requeues 0
qdisc netem 8008: parent 1:7 limit 100000 delay 2.0ms
 Sent 39532042 bytes 658367 pkt (dropped 0, overlimits 0 requeues 0)
 rate 38177Kbit 79536pps backlog 13140b 219p requeues 0
qdisc netem 8009: parent 1:8 limit 100000 delay 2.0ms
 Sent 39488164 bytes 657705 pkt (dropped 0, overlimits 0 requeues 0)
 rate 38192Kbit 79568pps backlog 13Kb 222p requeues 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add PCI device ids for few more T5 and T6 adapters
Hariprasad Shenai [Mon, 6 Jul 2015 17:08:34 +0000 (22:38 +0530)]
cxgb4: Add PCI device ids for few more T5 and T6 adapters

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx4_core: Add extra check for total vfs for SRIOV
Carol Soto [Mon, 6 Jul 2015 14:20:19 +0000 (09:20 -0500)]
net/mlx4_core: Add extra check for total vfs for SRIOV

Add extra check for total vfs for SRIOV to check if that value is
bigger than total vfs in pci SRIOV capabalities. Fix a check and
print of the number of maximum vfs that hw can handle. Fix a check
and print of the number of maximum vfs per port that driver can handle.

Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosamples: bpf: enable trace samples for s390x
Michael Holzheu [Mon, 6 Jul 2015 14:20:07 +0000 (16:20 +0200)]
samples: bpf: enable trace samples for s390x

The trace bpf samples do not compile on s390x because they use x86
specific fields from the "pt_regs" structure.

Fix this and access the fields via new PT_REGS macros.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: macb: Add SG support for Zynq SOC family
Punnaiah Choudary Kalluri [Mon, 6 Jul 2015 04:32:53 +0000 (10:02 +0530)]
net: macb: Add SG support for Zynq SOC family

Enable SG support for Zynq SOC family devices.

Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoxen-netback: remove duplicated function definition
Li, Liang Z [Mon, 6 Jul 2015 00:42:56 +0000 (08:42 +0800)]
xen-netback: remove duplicated function definition

There are two duplicated xenvif_zerocopy_callback() definitions.
Remove one of them.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'sch_act_lockless'
David S. Miller [Wed, 8 Jul 2015 20:50:42 +0000 (13:50 -0700)]
Merge branch 'sch_act_lockless'

Eric Dumazet says:

====================
net_sched: act: lockless operation

As mentioned by Alexei last week in Budapest, it is a bit weird
to take a spinlock in order to drop a packet in a tc filter...

Lets add percpu infra for tc actions and use it for gact & mirred.

Before changes, my host with 8 RX queues was handling 5 Mpps with gact,
and more than 11 Mpps after.

Mirred change is not yet visible if ifb+qdisc is used, as ifb is
not yet multi queue enabled, but is a step forward.
====================

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet_sched: act_mirred: remove spinlock in fast path
Eric Dumazet [Mon, 6 Jul 2015 12:18:09 +0000 (05:18 -0700)]
net_sched: act_mirred: remove spinlock in fast path

Like act_gact, act_mirred can be lockless in packet processing

1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) use rcu to protect tcfm_dev
4) Remove spinlock usage, as it is no longer needed.

Next step : add multi queue capability to ifb device

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet_sched: act_gact: remove spinlock in fast path
Eric Dumazet [Mon, 6 Jul 2015 12:18:08 +0000 (05:18 -0700)]
net_sched: act_gact: remove spinlock in fast path

Final step for gact RCU operation :

1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) Remove spinlock acquisition, as it is no longer needed.

Since this is the last contended lock in packet RX when tc gact is used,
this gives impressive gain.

My host with 8 RX queues was handling 5 Mpps before the patch,
and more than 11 Mpps after patch.

Tested:

On receiver :

dev=eth0
tc qdisc del dev $dev ingress 2>/dev/null
tc qdisc add dev $dev ingress
tc filter del dev $dev root pref 10 2>/dev/null
tc filter del dev $dev pref 10 2>/dev/null
tc filter add dev $dev est 1sec 4sec parent ffff: protocol ip prio 1 \
u32 match ip src 7.0.0.0/8 flowid 1:15 action drop

Sender sends packets flood from 7/8 network

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet_sched: act_gact: read tcfg_ptype once
Eric Dumazet [Mon, 6 Jul 2015 12:18:07 +0000 (05:18 -0700)]
net_sched: act_gact: read tcfg_ptype once

Third step for gact RCU operation :

Following patch will get rid of spinlock protection,
so we need to read tcfg_ptype once.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet_sched: act_gact: use a separate packet counters for gact_determ()
Eric Dumazet [Mon, 6 Jul 2015 12:18:06 +0000 (05:18 -0700)]
net_sched: act_gact: use a separate packet counters for gact_determ()

Second step for gact RCU operation :

We want to get rid of the spinlock protecting gact operations.
Stats (packets/bytes) will soon be per cpu.

gact_determ() would not work without a central packet counter,
so lets add it for this mode.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet_sched: act_gact: make tcfg_pval non zero
Eric Dumazet [Mon, 6 Jul 2015 12:18:05 +0000 (05:18 -0700)]
net_sched: act_gact: make tcfg_pval non zero

First step for gact RCU operation :

Instead of testing if tcfg_pval is zero or not, just make it 1.

No change in behavior, but slightly faster code.

The smp_rmb()/smp_wmb() barriers, while not strictly needed at this
stage are added for upcoming spinlock removal.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: sched: add percpu stats to actions
Eric Dumazet [Mon, 6 Jul 2015 12:18:04 +0000 (05:18 -0700)]
net: sched: add percpu stats to actions

Reuse existing percpu infrastructure John Fastabend added for qdisc.

This patch adds a new cpustats parameter to tcf_hash_create() and all
actions pass false, meaning this patch should have no effect yet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: sched: extend percpu stats helpers
Eric Dumazet [Mon, 6 Jul 2015 12:18:03 +0000 (05:18 -0700)]
net: sched: extend percpu stats helpers

qdisc_bstats_update_cpu() and other helpers were added to support
percpu stats for qdisc.

We want to add percpu stats for tc action, so this patch add common
helpers.

qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update()
qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlx4: TCP/UDP packets have L4 hash
Eric Dumazet [Thu, 2 Jul 2015 11:24:44 +0000 (13:24 +0200)]
mlx4: TCP/UDP packets have L4 hash

Mellanox driver has the knowledge if rxhash is a L4 hash,
if it receives a non fragmented TCP or UDP frame and
NETIF_F_RXCSUM is enabled on netdev.

ip_summed value is CHECKSUM_UNNECESSARY in this case.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Ido Shamay <idos@mellanox.com>
Acked-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tcp-policer-drops'
David S. Miller [Wed, 8 Jul 2015 20:29:46 +0000 (13:29 -0700)]
Merge branch 'tcp-policer-drops'

Yuchung Cheng says:

====================
tcp: reducing lost retransmits in recovery

This patch series reduces lost retransmits in recovery, in particular
when dealing with traffic policers. The main problem is that
slow start in recovery under policing can cause massive lost and
retransmit storms: any excess sending rate turns into drops. The
solution is to avoid doing slow start when lost retransmit is
detected and use packet conservation instead.

On networks with traffic policers the patches have lowered the
TCP loss rates by ~20% from Google servers without latency regressions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: PRR uses CRB mode by default and SS mode conditionally
Yuchung Cheng [Wed, 1 Jul 2015 21:11:15 +0000 (14:11 -0700)]
tcp: PRR uses CRB mode by default and SS mode conditionally

PRR slow start is often too aggressive especially when drops are
caused by traffic policers. The policers mainly use token bucket
to enforce the rate so sending (twice) faster than the delivery
rate causes excessive drops.

This patch changes PRR to the conservative reduction bound
(CRB) mode in RFC 6937 by default. CRB follows the packet
conservation rule to send at most the delivery rate by default.

But if many packets are lost and the pipe is empty, CRB may take N
round trips to repair N losses. We conditionally turn on slow start
mode if all these conditions are made to speed up the recovery:

  1) on the second round or later in recovery
  2) retransmission sent in the previous round is delivered on this ACK
  3) no retransmission is marked lost on this ACK

By using packet conservation by default, this change reduces the loss
retransmits signicantly on networks that deploy traffic policers,
up to 20% reduction of overall loss rate.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: reduce cwnd if retransmit is lost in CA_Loss
Yuchung Cheng [Wed, 1 Jul 2015 21:11:14 +0000 (14:11 -0700)]
tcp: reduce cwnd if retransmit is lost in CA_Loss

If the retransmission in CA_Loss is lost again, we should not
continue to slow start or raise cwnd in congestion avoidance mode.
Instead we should enter fast recovery and use PRR to reduce cwnd,
following the principle in RFC5681:

"... or the loss of a retransmission, should be taken as two
 indications of congestion and, therefore, cwnd (and ssthresh) MUST
 be lowered twice in this case."

This is especially important to reduce loss when the CA_Loss
state was caused by a traffic policer dropping the entire inflight.
The CA_Loss state has a problem where a loss of L packets causes the
sender to send a burst of L packets. So a policer that's dropping
most packets in a given RTT can cause a huge retransmit storm. By
contrast, PRR includes logic to bound the number of outbound packets
that result from a given ACK. So switching to CA_Recovery on lost
retransmits in CA_Loss avoids this retransmit storm problem when
in CA_Loss.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Fix incorrect sequence numbers shown in devlog
Hariprasad Shenai [Fri, 3 Jul 2015 10:40:51 +0000 (16:10 +0530)]
cxgb4: Fix incorrect sequence numbers shown in devlog

Part of commit 49aa284fe64c4c1 ("cxgb4: Add support for devlog")
change introduced a real bug where the Device Log Sequence Numbers are
no longer being converted from firmware Big-Endian to local CPU-Endian
format.

This patch moves all of the translation into the devlog_show() routine.
The only endianness code now in devlog_open() is the small loop to find the
earliest (lowest Sequence Number) Device Log entry in the circular buffer.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>