Lendacky, Thomas [Fri, 20 Mar 2015 16:50:34 +0000 (11:50 -0500)]
amd-xgbe: Fix Rx coalescing reporting
The Rx coalescing value is internally converted from usecs to a value
that the hardware can use. When reporting the Rx coalescing value, this
internal value is converted back to usecs. During the conversion from
and back to usecs some rounding occurs. So, for example, when setting an
Rx usec of 30, it will be reported as 29. Fix this reporting issue by
keeping the original usec value and using that during reporting.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:28 +0000 (11:50 -0500)]
amd-xgbe: Remove Tx coalescing
The Tx coalescing support in the driver was a software implementation
for something lacking in the hardware. Using hrtimers, the idea was to
trigger a timer interrupt after having queued a packet for transmit.
Unfortunately, as the timer value was lowered, the timer expired before
the hardware actually did the transmit and so it was racey and resulted
in unnecessary interrupts.
Remove the Tx coalescing support and hrtimer and replace with a Tx timer
that is used as a reclaim timer in case of inactivity.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:22 +0000 (11:50 -0500)]
amd-xgbe: Set DMA mask based on hardware register value
The hardware supplies a value that indicates the DMA range that it
is capable of using. Use this value rather than hard-coding it in
the driver.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:16 +0000 (11:50 -0500)]
amd-xgbe: Use the new DMA memory barriers where appropriate
Use the new lighter weight memory barriers when working with the device
descriptors.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:10 +0000 (11:50 -0500)]
amd-xgbe: Clarify output message about queues
Clarify that the queues referred to in a message when the device is
brought up are hardware queues and not necessarily related to the
Linux network queues.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:04 +0000 (11:50 -0500)]
amd-xgbe-phy: Provide support for auto-negotiation timeout
Currently, there is no interrupt code that indicates auto-negotiation
has timed out. If the auto-negotiation has timed out then the start of
a new auto-negotiation will begin again with a new base page being
received. The state machine could be in a state that is not expecting
this interrupt code which results in an error during auto-negotiation.
Update the code to timestamp when the auto-negotiation starts. Should
another page received interrupt code occur before auto-negotiation has
completed but after the auto-negotiation timeout, then reset the state
machine to allow the auto-negotiation to continue.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:49:53 +0000 (11:49 -0500)]
amd-xgbe-phy: Use the phy_driver flags field
Remove the setting of the transceiver type when retrieving the device
settings using ethtool and instead set the transceiver type in the
phy_driver structure flags field. Change the transceiver type to be
internal, also.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:49:42 +0000 (11:49 -0500)]
amd-xgbe-phy: Use phydev advertising field vs supported
With ethtool being able to control what is advertised, the advertising
field is what should be used for priming the auto-negotiation registers
and for various other checks, instead of the supported field.
Also, move the initial setting of the supported and advertising fields
into the probe function so that they are not reset each time the device
is brought up, thus allowing the user to set as desired before bringing
the device up.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Mar 2015 20:16:32 +0000 (16:16 -0400)]
Merge branch 'rhashtable-inlined-interface'
Herbert Xu says:
====================
rhashtable: Introduce inlined interface
This series of patches introduces the inlined rhashtable interface.
The idea is to make all the function pointers visible to the compiler
by providing the rhashtable_params structure explicitly to each
inline rhashtable function. For example, instead of doing
obj = rhashtable_lookup(ht, key);
you would now do
obj = rhashtable_lookup_fast(ht, key, params);
Where params is the same data that you would give to rhashtable_init.
In particular, within rhashtable.c itself we would simply supply
ht->p.
So to convert users over, you simply have to make params globally
accessible, e.g., by placing it in a static const variable, which
can then be used at each inlined call site, as well as by the
rhashtable_init call.
The only ticky bit is that some users (i.e., netfilter) has a
dynamic key length. This is dealt with by using params.key_len
in the inline functions when it is non-zero, and otherwise falling
back on ht->p.key_len.
Note that I've only tested this on one compiler, gcc 4.7.2. So
please test this with your compilers as well and make sure that
the code is actually inlined without indirect function calls.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:06 +0000 (21:57 +1100)]
rhashtable: Rip out obsolete out-of-line interface
Now that all rhashtable users have been converted over to the
inline interface, this patch removes the unused out-of-line
interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:05 +0000 (21:57 +1100)]
tipc: Use inlined rhashtable interface
This patch converts tipc to the inlined rhashtable interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:04 +0000 (21:57 +1100)]
test_rhashtable: Use inlined rhashtable interface
This patch converts test_rhashtable to the inlined rhashtable
interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:02 +0000 (21:57 +1100)]
netfilter: Convert nft_hash to inlined rhashtable
This patch converts nft_hash to the inlined rhashtable interface.
This patch also replaces the call to rhashtable_lookup_compare with
a straight rhashtable_lookup_fast because it's simply doing a memcmp
(in fact nft_hash_lookup already uses memcmp instead of nft_data_cmp).
Furthermore, the compare function is only meant to compare, it is not
supposed to have side-effects. The current side-effect code can
simply be moved into the nft_hash_get.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:01 +0000 (21:57 +1100)]
netlink: Move namespace into hash key
Currently the name space is a de facto key because it has to match
before we find an object in the hash table. However, it isn't in
the hash value so all objects from different name spaces with the
same port ID hash to the same bucket.
This is bad as the number of name spaces is unbounded.
This patch fixes this by using the namespace when doing the hash.
Because the namespace field doesn't lie next to the portid field
in the netlink socket, this patch switches over to the rhashtable
interface without a fixed key.
This patch also uses the new inlined rhashtable interface where
possible.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:00 +0000 (21:57 +1100)]
rhashtable: Allow hash/comparison functions to be inlined
This patch deals with the complaint that we make indirect function
calls on the fast paths unnecessarily in rhashtable. We resolve
it by moving the fast paths into inline functions that take struct
rhashtable_param (which obviously must be the same set of parameters
supplied to rhashtable_init) as an argument.
The only remaining indirect call is to obj_hashfn (or key_hashfn it
obj_hashfn is unset) on the rehash as well as the insert-during-
rehash slow path.
This patch also extends the support of vairable-length keys to
include those where the key is fixed but scattered in the object.
For example, in netlink we want to key off the namespace and the
portid but they're not next to each other.
This patch does this by directly using the object hash function
as the indicator of whether the key is accessible or not. It
also adds a new function obj_cmpfn to compare a key against an
object. This means that the caller no longer needs to supply
explicit compare functions.
All this is done in a backwards compatible manner so no existing
users are affected until they convert to the new interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:56:59 +0000 (21:56 +1100)]
rhashtable: Make rhashtable_init params argument const
This patch marks the rhashtable_init params argument const as
there is no reason to modify it since we will always make a copy
of it in the rhashtable.
This patch also fixes a bug where we don't actually round up the
value of min_size unless it is less than HASH_MIN_SIZE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 19 Mar 2015 18:38:27 +0000 (19:38 +0100)]
ebpf, filter: do not convert skb->protocol to host endianess during runtime
Commit
c24973957975 ("bpf: allow BPF programs access 'protocol' and 'vlan_tci'
fields") has added support for accessing protocol, vlan_present and vlan_tci
into the skb offset map.
As referenced in the below discussion, accessing skb->protocol from an eBPF
program should be converted without handling endianess.
The reason for this is that an eBPF program could simply do a check more
naturally, by f.e. testing skb->protocol == htons(ETH_P_IP), where the LLVM
compiler resolves htons() against a constant automatically during compilation
time, as opposed to an otherwise needed run time conversion.
After all, the way of programming both from a user perspective differs quite
a lot, i.e. bpf_asm ["ld proto"] versus a C subset/LLVM.
Reference: https://patchwork.ozlabs.org/patch/450819/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 20 Mar 2015 14:37:17 +0000 (11:37 -0300)]
ipv6: invert join/leave anycast rtnl/socket locking order
Commit
baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
missed to update two setsockopt options, IPV6_JOIN_ANYCAST and
IPV6_LEAVE_ANYCAST, causing a lock inverstion regarding to the updated ones.
As ipv6_sock_ac_join and ipv6_sock_ac_leave are only called from
do_ipv6_setsockopt, we are good to just move the rtnl lock upper.
Fixes:
baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 20 Mar 2015 13:26:21 +0000 (10:26 -0300)]
vxlan: fix possible use of uninitialized in vxlan_igmp_{join, leave}
Test robot noticed that we check the return of vxlan_igmp_join and leave
but inside them there was a path that it could be used initialized.
It's not really possible because those if() inside these igmp functions
would always match as we can't have sockets of other type in there, but
this way we keep the compiler happy.
Fixes:
56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Palik, Imre [Thu, 19 Mar 2015 10:05:42 +0000 (11:05 +0100)]
xen-netback: making the bandwidth limiter runtime settable
With the current netback, the bandwidth limiter's parameters are only
settable during vif setup time. This patch register a watch on them, and
thus makes them runtime changeable.
When the watch fires, the timer is reset. The timer's mutex is used for
fencing the change.
Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: Imre Palik <imrep@amazon.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Mar 2015 16:40:33 +0000 (12:40 -0400)]
Merge branch 'listener_refactor_part_14'
Eric Dumazet says:
====================
inet: tcp listener refactoring part 14
OK, we have serious patches here.
We get rid of the central timer handling SYNACK rtx,
which is killing us under even medium SYN flood.
We still use the listener specific hash table.
This will be done in next round ;)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 20 Mar 2015 02:04:21 +0000 (19:04 -0700)]
net: increase sk_[max_]ack_backlog
sk_ack_backlog & sk_max_ack_backlog were 16bit fields, meaning
listen() backlog was limited to 65535.
It is time to increase the width to allow much bigger backlog,
if admins change /proc/sys/net/core/somaxconn &
/proc/sys/net/ipv4/tcp_max_syn_backlog default values.
Tested:
echo
5000000 >/proc/sys/net/core/somaxconn
echo
5000000 >/proc/sys/net/ipv4/tcp_max_syn_backlog
Ran a SYNFLOOD test against a listener using listen(fd,
5000000)
myhost~# grep request_sock_TCP /proc/slabinfo
request_sock_TCP
4185642 4411940 304 13 1 : tunables 54 27 8 : slabdata 339380 339380 0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 20 Mar 2015 02:04:20 +0000 (19:04 -0700)]
inet: get rid of central tcp/dccp listener timer
One of the major issue for TCP is the SYNACK rtx handling,
done by inet_csk_reqsk_queue_prune(), fired by the keepalive
timer of a TCP_LISTEN socket.
This function runs for awful long times, with socket lock held,
meaning that other cpus needing this lock have to spin for hundred of ms.
SYNACK are sent in huge bursts, likely to cause severe drops anyway.
This model was OK 15 years ago when memory was very tight.
We now can afford to have a timer per request sock.
Timer invocations no longer need to lock the listener,
and can be run from all cpus in parallel.
With following patch increasing somaxconn width to 32 bits,
I tested a listener with more than 4 million active request sockets,
and a steady SYNFLOOD of ~200,000 SYN per second.
Host was sending ~830,000 SYNACK per second.
This is ~100 times more what we could achieve before this patch.
Later, we will get rid of the listener hash and use ehash instead.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 20 Mar 2015 02:04:19 +0000 (19:04 -0700)]
inet: drop prev pointer handling in request sock
When request sock are put in ehash table, the whole notion
of having a previous request to update dl_next is pointless.
Also, following patch will get rid of big purge timer,
so we want to delete a request sock without holding listener lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 19 Mar 2015 22:31:13 +0000 (22:31 +0000)]
rhashtable: Round up/down min/max_size to ensure we respect limit
Round up min_size respectively round down max_size to the next power
of two to make sure we always respect the limit specified by the
user. This is required because we compare the table size against the
limit before we expand or shrink.
Also fixes a minor bug where we modified min_size in the params
provided instead of the copy stored in struct rhashtable.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 19 Mar 2015 21:32:01 +0000 (14:32 -0700)]
i40e: add NVM update events to AQ clean
Quit complaining about a couple of events that we actually expect to see
during an NVM update.
Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Thu, 19 Mar 2015 19:47:58 +0000 (16:47 -0300)]
tipc: fix build issue when building without IPv6
We can't directly call ipv6_sock_mc_join() but should use the stub
instead and protect it around IS_ENABLED.
Fixes:
d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Mar 2015 19:30:25 +0000 (15:30 -0400)]
Merge branch 'cxgb4-next'
Hariprasad Shenai says:
====================
Add Device ID and make device ID table const
This patch series adds new device ID and makes device ID table const
The patches series is created against 'net-next' tree.
And includes patches on cxgb4 and csiostor driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Thu, 19 Mar 2015 16:57:36 +0000 (22:27 +0530)]
cxgb4/cxgb4vf/csiostor: Make PCI Device ID Tables be "const"
Make PCI Device ID Tables be "const" to move them out of the data segment and
remove a redundant check on CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN in
t4_pci_id_tbl.h to guard the contents of the include file.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Thu, 19 Mar 2015 16:57:35 +0000 (22:27 +0530)]
cxgb4: Add device ID for new adapter
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Mar 2015 19:18:04 +0000 (15:18 -0400)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-03-19
This wont the last 4.1 bluetooth-next pull request, but we've piled up
enough patches in less than a week that I wanted to save you from a
single huge "last-minute" pull somewhere closer to the merge window.
The main changes are:
- Simultaneous LE & BR/EDR discovery support for HW that can do it
- Complete LE OOB pairing support
- More fine-grained mgmt-command access control (normal user can now do
harmless read-only operations).
- Added RF power amplifier support in cc2520 ieee802154 driver
- Some cleanups/fixes in ieee802154 code
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Mar 2015 16:26:01 +0000 (12:26 -0400)]
Merge branch 'tipc-next'
Erik Hugne says:
====================
tipc: small bugfix an support for datagram connect()
Most notable in this series is patch#3 that allows programs to associate
a tipc address with a connectionless (RDM/DGRAM) socket.
v2: Fix indent issue in patch#3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Thu, 19 Mar 2015 08:02:19 +0000 (09:02 +0100)]
tipc: add support for connect() on dgram/rdm sockets
Following the example of ip4_datagram_connect, we store the
address in the socket structure for dgram/rdm sockets and use
that as the default destination for subsequent send() calls.
It is allowed to connect to any address types, and the behaviour
of send() will be the same as a normal sendto() with this address
provided. Binding to an AF_UNSPEC address clears the association.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Thu, 19 Mar 2015 08:02:18 +0000 (09:02 +0100)]
tipc: do not report -EHOSTUNREACH for failed local delivery
Since commit
1186adf7df04 ("tipc: simplify message forwarding and
rejection in socket layer") -EHOSTUNREACH is propagated back to
the sending process if we fail to deliver the message to another
socket local to the node.
This is wrong, host unreachable should only be reported when the
destination port/name does not exist in the cluster, and that
check is always done before sending the message. Also, this
introduces inconsistent sendmsg() behavior for local/remote
destinations. Errors occurring on the receiving side should not
trickle up to the sender. If message delivery fails TIPC should
either discard the packet or reject it back to the sender based
on the destination droppable option.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Thu, 19 Mar 2015 08:02:17 +0000 (09:02 +0100)]
tipc: remove redundant call to tipc_node_remove_conn
tipc_node_remove_conn may be called twice if shutdown() is
called on a socket that have messages in the receive queue.
Calling this function twice does no harm, but is unnecessary
and we remove the redundant call.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Iooss [Thu, 19 Mar 2015 13:23:40 +0000 (21:23 +0800)]
mac802154: fix typo in header guard
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Fixes:
b6eea9ca354a ("mac802154: introduce driver-ops header")
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Wu Fengguang [Thu, 19 Mar 2015 00:51:27 +0000 (08:51 +0800)]
net/mlx4_en: mlx4_en_set_tx_maxrate() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jörg Thalheim [Wed, 18 Mar 2015 09:06:58 +0000 (10:06 +0100)]
bridge: add ageing_time, stp_state, priority over netlink
Signed-off-by: Jörg Thalheim <joerg@higgsboson.tk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Mar 2015 02:52:33 +0000 (22:52 -0400)]
net: Fix high overhead of vlan sub-device teardown.
When a networking device is taken down that has a non-trivial number
of VLAN devices configured under it, we eat a full synchronize_net()
for every such VLAN device.
This is because of the call chain:
NETDEV_DOWN notifier
--> vlan_device_event()
--> dev_change_flags()
--> __dev_change_flags()
--> __dev_close()
--> __dev_close_many()
--> dev_deactivate_many()
--> synchronize_net()
This is kind of rediculous because we already have infrastructure for
batching doing operation X to a list of net devices so that we only
incur one sync.
So make use of that by exporting dev_close_many() and adjusting it's
interfaace so that the caller can fully manage the batch list. Use
this in vlan_device_event() and all the overhead goes away.
Reported-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Mar 2015 00:40:51 +0000 (17:40 -0700)]
inet: add a schedule point in inet_twsk_purge()
On a large hash table, we can easily spend seconds to
walk over all entries. Add a cond_resched() to yield
cpu if necessary.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 18 Mar 2015 02:23:16 +0000 (20:23 -0600)]
rocker: add support for phys_port_name
Implement the phys_port_name operation. Port names are pulled from the
rocker hardware model in qemu and default to the qemu name + port id.
e.g.,
sw1p1: flags=4098<BROADCAST,MULTICAST> mtu 1500
ether 52:54:00:12:35:01 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
where 'sw1' comes from the qemu command line -device rocker,name=sw1, and
'p1' is port 1.
Patch is adapted from Scott's phys_port_id patch.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 18 Mar 2015 02:23:15 +0000 (20:23 -0600)]
net: add support for phys_port_name
Similar to port id allow netdevices to specify port names and export
the name via sysfs. Drivers can implement the netdevice operation to
assist udev in having sane default names for the devices using the
rule:
$ cat /etc/udev/rules.d/80-net-setup-link.rules
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_port_name}!="",
NAME="$attr{phys_port_name}"
Use of phys_name versus phys_id was suggested-by Jiri Pirko.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Wed, 18 Mar 2015 17:50:44 +0000 (14:50 -0300)]
vxlan: Move socket initialization to within rtnl scope
Currently, if a multicast join operation fail, the vxlan interface will
be UP but not functional, without even a log message informing the user.
Now that we can grab socket lock after already having rntl, we don't
need to defer socket creation and multicast operations. By not deferring
we can do proper error reporting to the user through ip exit code.
This patch thus removes all deferred work that vxlan had and put it back
inline. Now the socket will only be created, bound and join multicast
group when one bring the interface up, and will undo all that as soon as
one put the interface down.
As vxlan_sock_hold() is not used after this patch, it was removed too.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Wed, 18 Mar 2015 17:50:43 +0000 (14:50 -0300)]
ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}
in favor of their inner __ ones, which doesn't grab rtnl.
As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.
So this patch:
- move rtnl handling to callers instead while already fixing some
reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
__ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
__ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Wed, 18 Mar 2015 17:50:42 +0000 (14:50 -0300)]
ipv4,ipv6: grab rtnl before locking the socket
There are some setsockopt operations in ipv4 and ipv6 that are grabbing
rtnl after having grabbed the socket lock. Yet this makes it impossible
to do operations that have to lock the socket when already within a rtnl
protected scope, like ndo dev_open and dev_stop.
We normally take coarse grained locks first but setsockopt inverted that.
So this patch invert the lock logic for these operations and makes
setsockopt grab rtnl if it will be needed prior to grabbing socket lock.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Mar 2015 02:00:44 +0000 (22:00 -0400)]
Merge branch 'listen_refactor_part_13'
Eric Dumazet says:
====================
inet: tcp listener refactoring, part 13
inet_hash functions are in a bad state : Too much IPv6/IPv4 copy/pasting.
Lets refactor a bit.
Idea is that we do not want to have an equivalent of inet_csk(sk)->icsk_af_ops
for request socks in order to be able to use the right variant.
In this patch series, I started to let IPv6/IPv4 converge to common helpers.
Idea is to use ipv6_addr_set_v4mapped() even for AF_INET sockets, so that
we can test
if (sk->sk_family == AF_INET6 &&
!ipv6_addr_v4mapped(&sk->sk_v6_daddr))
to tell if we deal with an IPv6 socket, or IPv4 one, at least in slow paths.
Ideally, we could save 8 bytes per struct sock_common, if we
alias skc_daddr & skc_rcv_saddr to skc_v6_daddr[3]/skc_v6_rcv_saddr[3].
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 21:05:38 +0000 (14:05 -0700)]
inet: request sock should init IPv6/IPv4 addresses
In order to be able to use sk_ehashfn() for request socks,
we need to initialize their IPv6/IPv4 addresses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 21:05:37 +0000 (14:05 -0700)]
inet: get rid of last __inet_hash_connect() argument
We now always call __inet_hash_nolisten(), no need to pass it
as an argument.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 21:05:36 +0000 (14:05 -0700)]
ipv6: get rid of __inet6_hash()
We can now use inet_hash() and __inet_hash() instead of private
functions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 21:05:35 +0000 (14:05 -0700)]
inet: add IPv6 support to sk_ehashfn()
Intent is to converge IPv4 & IPv6 inet_hash functions to
factorize code.
IPv4 sockets initialize sk_rcv_saddr and sk_v6_daddr
in this patch, thanks to new sk_daddr_set() and sk_rcv_saddr_set()
helpers.
__inet6_hash can now use sk_ehashfn() instead of a private
inet6_sk_ehashfn() and will simply use __inet_hash() in a
following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 21:05:34 +0000 (14:05 -0700)]
net: introduce sk_ehashfn() helper
Goal is to unify IPv4/IPv6 inet_hash handling, and use common helpers
for all kind of sockets (full sockets, timewait and request sockets)
inet_sk_ehashfn() becomes sk_ehashfn() but still only copes with IPv4
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 21:05:33 +0000 (14:05 -0700)]
netns: constify net_hash_mix() and various callers
const qualifiers ease code review by making clear
which objects are not written in a function.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2015 18:55:23 +0000 (14:55 -0400)]
Merge branch 'txq_max_rate'
Or Gerlitz says:
====================
Add max rate TXQ attribute
Add the ability to set a max-rate limitation for TX queues.
The attribute name is maxrate and the units are Mbs, to make
it similar to the existing max-rate limitation knobs (ETS and
SRIOV ndo calls).
changes from V2:
- added Documentation (thanks Florian and Tom)
- rebased to latest net-next to comply with the swdev ndo removal
- addressed more feedback from Dave on the comments style
changes from V1:
- addressed feedback from Dave
changes from V0:
- addressed feedback from Sergei
John Fastabend (1):
net: Add max rate tx queue attribute
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Wed, 18 Mar 2015 12:57:35 +0000 (14:57 +0200)]
net/mlx4_en: Add tx queue maxrate support
Add ndo_set_tx_maxrate support.
To support per tx queue maxrate limit, we use the update-qp firmware
command to do run-time rate setting for the qp that serves this tx ring.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Wed, 18 Mar 2015 12:57:34 +0000 (14:57 +0200)]
net/mlx4_core: Add basic support for QP max-rate limiting
Add the low-level device commands and definitions used for QP max-rate limiting.
This is done through the following elements:
- read rate-limit device caps in QUERY_DEV_CAP: number of different
rates and the min/max rates in Kbs/Mbs/Gbs units
- enhance the QP context struct to contain rate limit units and value
- allow to do run time rate-limit setting to QPs through the
update-qp firmware command
- QP rate-limiting is disallowed for VFs
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 18 Mar 2015 12:57:33 +0000 (14:57 +0200)]
net: Add max rate tx queue attribute
This adds a tx_maxrate attribute to the tx queue sysfs entry allowing
for max-rate limiting. Along with DCB-ETS and BQL this provides another
knob to tune queue performance. The limit units are Mbps.
By default it is disabled. To disable the rate limitation after it
has been set for a queue, it should be set to zero.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brad Campbell [Tue, 17 Mar 2015 20:25:46 +0000 (16:25 -0400)]
cc2520: Add support for CC2591 amplifier.
The TI CC2521 is an RF power amplifier that is designed to interface
with the CC2520. Conveniently, it directly interfaces with the CC2520
and does not require any pins to be connected to a
microcontroller/processor. Adding a CC2591 increases the CC2520's range,
which is useful for border router and other wall-powered applications.
Using the CC2591 with the CC2520 requires configuring the CC2520 GPIOs
that are connected to the CC2591 to correctly set the CC2591 into TX and
RX modes. Further, TI recommends that the CC2520_TXPOWER and
CC2520_AGCCTRL1 registers are set differently to maximize the CC2591's
performance. These settings are covered in TI Application Note AN065.
This patch adds an optional `amplified` field to the cc2520 entry in the
device tree. If present, the CC2520 will be configured to operate with a
CC2591.
The expected pin mapping is:
CC2520 GPIO0 --> CC2591 EN
CC2520 GPIO5 --> CC2591 PAEN
Signed-off-by: Brad Campbell <bradjc5@gmail.com>
Acked-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Brad Campbell [Tue, 17 Mar 2015 20:25:45 +0000 (16:25 -0400)]
cc2520: Do not store platform_data in spi_device
Storing the `platform_data` struct inside of the SPI struct when using
the device tree allows for a later function to edit the content of that
struct. This patch refactors the `cc2520_get_platformat_data` function
to accept a pointer to a `cc2520_platform_data` struct and populates
the fields inside of it.
This change mirrors commit
aaa1c4d226e4cd730075d3dac99a6d599a0190c7
("at86rf230: copy pdata to driver allocated space").
Signed-off-by: Brad Campbell <bradjc5@gmail.com>
Acked-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
David S. Miller [Wed, 18 Mar 2015 16:46:48 +0000 (12:46 -0400)]
Merge branch 'rhashtable_remove_shift'
Herbert Xu says:
====================
rhashtable: Kill redundant shift parameter
I was trying to squeeze bucket_table->rehash in by downsizing
bucket_table->size, only to find that my spot had been taken
over by bucket_table->shift. These patches kill shift and makes
me feel better :)
v2 corrects the typo in the test_rhashtable changelog and also
notes the min_shift parameter in the tipc patch changelog.
====================
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 18 Mar 2015 09:01:21 +0000 (20:01 +1100)]
rhashtable: Remove max_shift and min_shift
Now that nobody uses max_shift and min_shift, we can safely remove
them.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 18 Mar 2015 09:01:19 +0000 (20:01 +1100)]
test_rhashtable: Use rhashtable max_size instead of max_shift
This patch converts test_rhashtable to use rhashtable max_size
instead of the obsolete max_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 18 Mar 2015 09:01:18 +0000 (20:01 +1100)]
tipc: Use rhashtable max/min_size instead of max/min_shift
This patch converts tipc to use rhashtable max/min_size instead of
the obsolete max/min_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 18 Mar 2015 09:01:17 +0000 (20:01 +1100)]
netlink: Use rhashtable max_size instead of max_shift
This patch converts netlink to use rhashtable max_size instead
of the obsolete max_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 18 Mar 2015 09:01:16 +0000 (20:01 +1100)]
rhashtable: Introduce max_size/min_size
This patch adds the parameters max_size and min_size which are
meant to replace max_shift and min_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 18 Mar 2015 09:01:15 +0000 (20:01 +1100)]
rhashtable: Remove shift from bucket_table
Keeping both size and shift is silly. We only need one.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2015 16:44:13 +0000 (12:44 -0400)]
Merge branch 'xgene-next'
Keyur Chudgar says:
====================
drivers: net: xgene: Add second SGMII based 1G interface
This patch adds support for second SGMII based 1G interface.
====================
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keyur Chudgar [Tue, 17 Mar 2015 18:27:13 +0000 (11:27 -0700)]
drivers: net: xgene: Add second SGMII based 1G interface
- Added resource initialization based on port-id field
- Enabled second SGMII 1G interface
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keyur Chudgar [Tue, 17 Mar 2015 18:27:12 +0000 (11:27 -0700)]
dtb: xgene: Add second SGMII based 1G interface node
- Added new SGMII node for port 1
- Added port-id field
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keyur Chudgar [Tue, 17 Mar 2015 18:27:11 +0000 (11:27 -0700)]
Documentation: dtb: Add port-id field for APM X-Gene ethernet
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcel Holtmann [Tue, 17 Mar 2015 18:38:24 +0000 (11:38 -0700)]
Bluetooth: Fix potential NULL dereference in SMP channel setup
When the allocation of the L2CAP channel for the BR/EDR security manager
fails, then the smp variable might be NULL. In that case do not try to
free the non-existing crypto contexts
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
David S. Miller [Wed, 18 Mar 2015 02:12:10 +0000 (22:12 -0400)]
Merge branch 'tipc_netns_leak'
Ying Xue says:
====================
tipc: fix netns refcnt leak
The series aims to eliminate the issue of netns refcount leak. But
during fixing it, another two additional problems are found. So all
of known issues associated with the netns refcnt leak are resolved
at the same time in the patchset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 18 Mar 2015 01:32:59 +0000 (09:32 +0800)]
tipc: withdraw tipc topology server name when namespace is deleted
The TIPC topology server is a per namespace service associated with the
tipc name {1, 1}. When a namespace is deleted, that name must be withdrawn
before we call sk_release_kernel because the kernel socket release is
done in init_net and trying to withdraw a TIPC name published in another
namespace will fail with an error as:
[ 170.093264] Unable to remove local publication
[ 170.093264] (type=1, lower=1, ref=
2184244004, key=
2184244005)
We fix this by breaking the association between the topology server name
and socket before calling sk_release_kernel.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 18 Mar 2015 01:32:58 +0000 (09:32 +0800)]
tipc: fix a potential deadlock when nametable is purged
[ 28.531768] =============================================
[ 28.532322] [ INFO: possible recursive locking detected ]
[ 28.532322] 3.19.0+ #194 Not tainted
[ 28.532322] ---------------------------------------------
[ 28.532322] insmod/583 is trying to acquire lock:
[ 28.532322] (&(&nseq->lock)->rlock){+.....}, at: [<
ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
[ 28.532322]
[ 28.532322] but task is already holding lock:
[ 28.532322] (&(&nseq->lock)->rlock){+.....}, at: [<
ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
[ 28.532322]
[ 28.532322] other info that might help us debug this:
[ 28.532322] Possible unsafe locking scenario:
[ 28.532322]
[ 28.532322] CPU0
[ 28.532322] ----
[ 28.532322] lock(&(&nseq->lock)->rlock);
[ 28.532322] lock(&(&nseq->lock)->rlock);
[ 28.532322]
[ 28.532322] *** DEADLOCK ***
[ 28.532322]
[ 28.532322] May be due to missing lock nesting notation
[ 28.532322]
[ 28.532322] 3 locks held by insmod/583:
[ 28.532322] #0: (net_mutex){+.+.+.}, at: [<
ffffffff8163e30f>] register_pernet_subsys+0x1f/0x50
[ 28.532322] #1: (&(&tn->nametbl_lock)->rlock){+.....}, at: [<
ffffffffa000e091>] tipc_nametbl_stop+0xb1/0x1f0 [tipc]
[ 28.532322] #2: (&(&nseq->lock)->rlock){+.....}, at: [<
ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
[ 28.532322]
[ 28.532322] stack backtrace:
[ 28.532322] CPU: 1 PID: 583 Comm: insmod Not tainted 3.19.0+ #194
[ 28.532322] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 28.532322]
ffffffff82394460 ffff8800144cb928 ffffffff81792f3e 0000000000000007
[ 28.532322]
ffffffff82394460 ffff8800144cba28 ffffffff810a8080 ffff8800144cb998
[ 28.532322]
ffffffff810a4df3 ffff880013e9cb10 ffffffff82b0d330 ffff880013e9cb38
[ 28.532322] Call Trace:
[ 28.532322] [<
ffffffff81792f3e>] dump_stack+0x4c/0x65
[ 28.532322] [<
ffffffff810a8080>] __lock_acquire+0x740/0x1ca0
[ 28.532322] [<
ffffffff810a4df3>] ? __bfs+0x23/0x270
[ 28.532322] [<
ffffffff810a7506>] ? check_irq_usage+0x96/0xe0
[ 28.532322] [<
ffffffff810a8a73>] ? __lock_acquire+0x1133/0x1ca0
[ 28.532322] [<
ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
[ 28.532322] [<
ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[ 28.532322] [<
ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
[ 28.532322] [<
ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[ 28.532322] [<
ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
[ 28.532322] [<
ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
[ 28.532322] [<
ffffffffa000e11e>] tipc_nametbl_stop+0x13e/0x1f0 [tipc]
[ 28.532322] [<
ffffffffa000dfe5>] ? tipc_nametbl_stop+0x5/0x1f0 [tipc]
[ 28.532322] [<
ffffffffa0004bab>] tipc_init_net+0x13b/0x150 [tipc]
[ 28.532322] [<
ffffffffa0004a75>] ? tipc_init_net+0x5/0x150 [tipc]
[ 28.532322] [<
ffffffff8163dece>] ops_init+0x4e/0x150
[ 28.532322] [<
ffffffff810aa66d>] ? trace_hardirqs_on+0xd/0x10
[ 28.532322] [<
ffffffff8163e1d3>] register_pernet_operations+0xf3/0x190
[ 28.532322] [<
ffffffff8163e31e>] register_pernet_subsys+0x2e/0x50
[ 28.532322] [<
ffffffffa002406a>] tipc_init+0x6a/0x1000 [tipc]
[ 28.532322] [<
ffffffffa0024000>] ? 0xffffffffa0024000
[ 28.532322] [<
ffffffff810002d9>] do_one_initcall+0x89/0x1c0
[ 28.532322] [<
ffffffff811b7cb0>] ? kmem_cache_alloc_trace+0x50/0x1b0
[ 28.532322] [<
ffffffff810e725b>] ? do_init_module+0x2b/0x200
[ 28.532322] [<
ffffffff810e7294>] do_init_module+0x64/0x200
[ 28.532322] [<
ffffffff810e9353>] load_module+0x12f3/0x18e0
[ 28.532322] [<
ffffffff810e5890>] ? show_initstate+0x50/0x50
[ 28.532322] [<
ffffffff810e9a19>] SyS_init_module+0xd9/0x110
[ 28.532322] [<
ffffffff8179f3b3>] sysenter_dispatch+0x7/0x1f
Before tipc_purge_publications() calls tipc_nametbl_remove_publ() to
remove a publication with a name sequence, the name sequence's lock
is held. However, when tipc_nametbl_remove_publ() calling
tipc_nameseq_remove_publ() to remove the publication, it first tries
to query name sequence instance with the publication, and then holds
the lock of the found name sequence. But as the lock may be already
taken in tipc_purge_publications(), deadlock happens like above
scenario demonstrated. As tipc_nameseq_remove_publ() doesn't grab name
sequence's lock, the deadlock can be avoided if it's directly invoked
by tipc_purge_publications().
Fixes:
97ede29e80ee ("tipc: convert name table read-write lock to RCU")
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 18 Mar 2015 01:32:57 +0000 (09:32 +0800)]
tipc: fix netns refcnt leak
When the TIPC module is loaded, we launch a topology server in kernel
space, which in its turn is creating TIPC sockets for communication
with topology server users. Because both the socket's creator and
provider reside in the same module, it is necessary that the TIPC
module's reference count remains zero after the server is started and
the socket created; otherwise it becomes impossible to perform "rmmod"
even on an idle module.
Currently, we achieve this by defining a separate "tipc_proto_kern"
protocol struct, that is used only for kernel space socket allocations.
This structure has the "owner" field set to NULL, which restricts the
module reference count from being be bumped when sk_alloc() for local
sockets is called. Furthermore, we have defined three kernel-specific
functions, tipc_sock_create_local(), tipc_sock_release_local() and
tipc_sock_accept_local(), to avoid the module counter being modified
when module local sockets are created or deleted. This has worked well
until we introduced name space support.
However, after name space support was introduced, we have observed that
a reference count leak occurs, because the netns counter is not
decremented in tipc_sock_delete_local().
This commit remedies this problem. But instead of just modifying
tipc_sock_delete_local(), we eliminate the whole parallel socket
handling infrastructure, and start using the regular sk_create_kern(),
kernel_accept() and sk_release_kernel() calls. Since those functions
manipulate the module counter, we must now compensate for that by
explicitly decrementing the counter after module local sockets are
created, and increment it just before calling sk_release_kernel().
Fixes:
a62fbccecd62 ("tipc: make subscriber server support net namespace")
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reported-by: Cong Wang <cwang@twopensource.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2015 02:02:53 +0000 (22:02 -0400)]
Merge branch 'listener_refactor_part_12'
Eric Dumazet says:
====================
inet: tcp listener refactoring, part 12
By adding a pointer back to listener, we are preparing synack rtx
handling to no longer be governed by listener keepalive timer,
as this is the most problematic source of contention on listener
spinlock. Note that TCP FastOpen had such pointer anyway, so we
make it generic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 01:32:31 +0000 (18:32 -0700)]
inet: fix request sock refcounting
While testing last patch series, I found req sock refcounting was wrong.
We must set skc_refcnt to 1 for all request socks added in hashes,
but also on request sockets created by FastOpen or syncookies.
It is tricky because we need to defer this initialization so that
future RCU lookups do not try to take a refcount on a not yet
fully initialized request socket.
Also get rid of ireq_refcnt alias.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes:
13854e5a6046 ("inet: add proper refcounting to request sock")
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 01:32:30 +0000 (18:32 -0700)]
inet: avoid fastopen lock for regular accept()
It is not because a TCP listener is FastOpen ready that
all incoming sockets actually used FastOpen.
Avoid taking queue->fastopenq->lock if not needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 01:32:29 +0000 (18:32 -0700)]
tcp: rename struct tcp_request_sock listener
The listener field in struct tcp_request_sock is a pointer
back to the listener. We now have req->rsk_listener, so TCP
only needs one boolean and not a full pointer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 01:32:28 +0000 (18:32 -0700)]
inet: add rsk_listener field to struct request_sock
Once we'll be able to lookup request sockets in ehash table,
we'll need to get access to listener which created this request.
This avoid doing a lookup to find the listener, which benefits
for a more solid SO_REUSEPORT, and is needed once we no
longer queue request sock into a listener private queue.
Note that 'struct tcp_request_sock'->listener could be reduced
to a single bit, as TFO listener should match req->rsk_listener.
TFO will no longer need to hold a reference on the listener.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 01:32:27 +0000 (18:32 -0700)]
inet: uninline inet_reqsk_alloc()
inet_reqsk_alloc() is becoming fat and should not be inlined.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Mar 2015 01:32:26 +0000 (18:32 -0700)]
inet: add sk_listener argument to inet_reqsk_alloc()
listener socket can be used to set net pointer, and will
be later used to hold a reference on listener.
Add a const qualifier to first argument (struct request_sock_ops *),
and factorize all write_pnet(&ireq->ireq_net, sock_net(sk));
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 17 Mar 2015 19:18:12 +0000 (15:18 -0400)]
Merge branch 'listener_refactor_part_11'
Eric Dumazet says:
====================
inet: tcp listener refactoring, part 11
Before inserting request sockets into general (ehash) table,
we need to prepare netfilter to cope with them, as they are
not full sockets.
I'll later change xt_socket to get full support, including for
request sockets (NEW_SYN_RECV)
Save 8 bytes in inet_request_sock on 64bit arches. We'll soon add
a pointer to the listener socket.
I included two TCP changes in this patch series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 17 Mar 2015 04:06:20 +0000 (21:06 -0700)]
tcp: uninline tcp_oow_rate_limited()
tcp_oow_rate_limited() is hardly used in fast path, there is
no point inlining it.
Signed-of-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 17 Mar 2015 04:06:19 +0000 (21:06 -0700)]
tcp: move tcp_openreq_init() to tcp_input.c
This big helper is called once from tcp_conn_request(), there is no
point having it in an include. Compiler will inline it anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 17 Mar 2015 04:06:18 +0000 (21:06 -0700)]
inet: move ir_mark to fill a hole
On 64bit arches, we can save 8 bytes in inet_request_sock
by moving ir_mark to fill a hole.
While we are at it, inet_request_mark() can get a const qualifier
for listener socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 17 Mar 2015 04:06:17 +0000 (21:06 -0700)]
netfilter: xt_socket: prepare for TCP_NEW_SYN_RECV support
TCP request socks soon will be visible in ehash table.
xt_socket will be able to match them, but first we need
to make sure to not consider them as full sockets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 17 Mar 2015 04:06:16 +0000 (21:06 -0700)]
netfilter: tproxy: prepare TCP_NEW_SYN_RECV support
TCP request socks soon will be visible in ehash table.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 17 Mar 2015 04:06:15 +0000 (21:06 -0700)]
netfilter: use sk_fullsock() helper
Upcoming request sockets have TCP_NEW_SYN_RECV state and should
be special cased a bit like TCP_TIME_WAIT sockets.
Signed-off-by; Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 17 Mar 2015 01:06:02 +0000 (18:06 -0700)]
bpf: allow BPF programs access 'protocol' and 'vlan_tci' fields
as a follow on to patch
70006af95515 ("bpf: allow eBPF access skb fields")
this patch allows 'protocol' and 'vlan_tci' fields to be accessible
from extended BPF programs.
The usage of 'protocol', 'vlan_present' and 'vlan_tci' fields is the same as
corresponding SKF_AD_PROTOCOL, SKF_AD_VLAN_TAG_PRESENT and SKF_AD_VLAN_TAG
accesses in classic BPF.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 17 Mar 2015 19:00:28 +0000 (15:00 -0400)]
Merge branch 'const_of_device_id'
Fabian Frederick says:
====================
drivers/net: constify of_device_id array
This small patchset adds const to of_device_id arrays in
drivers/net branch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 17 Mar 2015 18:40:28 +0000 (19:40 +0100)]
via-velocity: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 17 Mar 2015 18:40:27 +0000 (19:40 +0100)]
net: via-rhine: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 17 Mar 2015 18:40:26 +0000 (19:40 +0100)]
ehea: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 17 Mar 2015 18:40:25 +0000 (19:40 +0100)]
IBM-EMAC: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 17 Mar 2015 18:40:24 +0000 (19:40 +0100)]
can: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 17 Mar 2015 18:40:23 +0000 (19:40 +0100)]
net: phy: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 17 Mar 2015 18:40:22 +0000 (19:40 +0100)]
orinoco: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 17 Mar 2015 18:37:40 +0000 (19:37 +0100)]
net: xilinx: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 17 Mar 2015 18:37:39 +0000 (19:37 +0100)]
net: greth: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Tue, 17 Mar 2015 18:37:38 +0000 (19:37 +0100)]
netdev: octeon_mgmt: constify of_device_id array
of_device_id is always used as const.
(See driver.of_match_table and open firmware functions)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>