Selvin Xavier [Thu, 29 Jun 2017 19:28:11 +0000 (12:28 -0700)]
RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails
This fix is added only to avoid system crash in some a
specific scenario. When bnxt_re driver is loaded and if
user tries to change interface mac address, delete GID
fails because QP1 is still associated with existing MAC
(default GID). If the above command fails GID tables are
not modified in the h/w or driver, but the GID context memory
is freed. Now, if the user changes the mac back to the original
value, another add_gid comes to the driver where the driver
reports that the GID is already present in its table
and tries to access the context which was already freed.
So, in this case, in order to avoid NULL pointer de-reference,
this patch removes the context memory free if delete_gid fails
and the same context memory is re-used in new add_gid.
Memory cleanup will be taken care during driver unload, while
deleting the GID table.
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Somnath Kotur [Thu, 29 Jun 2017 19:28:09 +0000 (12:28 -0700)]
RDMA/bnxt_re: Fix WQE Size posted to HW to prevent it from throwing error
Posting WQE size of 2 results in a WQE_FORMAT_ERROR
thrown by the HW as it requires host to supply WQE Size with room
for atleast one SGE so that the resulting WQE size be atleast 3.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 29 Jun 2017 19:28:08 +0000 (12:28 -0700)]
RDMA/bnxt_re: Free doorbell page index (DPI) during dealloc ucontext
The driver must free the DPI during the dealloc_ucontext
instead of freeing it during dealloc_pd. However, the DPI
allocation scheme remains unchanged.
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Wed, 14 Jun 2017 10:20:09 +0000 (13:20 +0300)]
IB/mlx5: Fix a warning message
"umem" is a valid pointer. We intended to print "*umem" or even just
"err" instead.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Thu, 13 Jul 2017 07:46:49 +0000 (10:46 +0300)]
RDMA/ocrdma: Fix error codes in ocrdma_create_srq()
If either of these allocations fail then we return ERR_PTR(0). That's
equivalent to NULL and results in a NULL pointer dereference in the
caller.
Fixes:
fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Thu, 13 Jul 2017 07:46:14 +0000 (10:46 +0300)]
RDMA/ocrdma: Fix an error code in ocrdma_alloc_pd()
We should preserve the original "status" error code instead of resetting
it to zero. Returning ERR_PTR(0) is the same as NULL and results in a
NULL dereference in the callers. I added a printk() on error instead.
Fixes:
45e86b33ec8b ("RDMA/ocrdma: Cache recv DB until QP moved to RTR")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Thu, 13 Jul 2017 07:48:00 +0000 (10:48 +0300)]
IB/cxgb3: Fix error codes in iwch_alloc_mr()
We accidentally don't set the error code on some error paths. It means
return ERR_PTR(0) which is NULL and results in a NULL dereference in the
caller.
Fixes:
13a239330abd ("RDMA/cxgb3: Don't ignore insert_handle() failures")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Thu, 13 Jul 2017 07:47:40 +0000 (10:47 +0300)]
cxgb4: Fix error codes in c4iw_create_cq()
If one of these kmalloc() calls fails then we return ERR_PTR(0) which is
NULL. It results in a NULL dereference in the callers.
Fixes:
cfdda9d76436 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Thu, 13 Jul 2017 07:47:22 +0000 (10:47 +0300)]
IB/i40iw: Fix error code in i40iw_create_cq()
We accidentally forgot to set the error code if ib_copy_from_udata()
fails. It means we return ERR_PTR(0) which is NULL and results in a
NULL dereference in the callers.
Fixes:
d37498417947 ("i40iw: add files for iwarp interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Thu, 13 Jul 2017 07:45:48 +0000 (10:45 +0300)]
IB/IPoIB: Fix error code in ipoib_add_port()
We accidentally don't see the error code on some of these error paths.
It means we return ERR_PTR(0) which is NULL and it results in a NULL
dereference in the caller.
This bug dates to pre-git days.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Mon, 10 Jul 2017 07:22:47 +0000 (10:22 +0300)]
RDMA/bnxt_re: checking for NULL instead of IS_ERR()
bnxt_re_alloc_mw() doesn't return NULL, it returns error pointers.
Fixes:
9152e0b722b2 ("RDMA/bnxt_re: HW workarounds for handling specific conditions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Tatyana Nikolova [Thu, 6 Jul 2017 02:25:33 +0000 (21:25 -0500)]
i40iw: Free QP PBLEs when the QP is destroyed
If the physical buffer list entries (PBLEs) of a QP are freed
up at i40iw_dereg_mr, they can be assigned to a newly
created QP before the previous QP is destroyed. Fix this
by freeing PBLEs only when the QP is destroyed.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Shiraz Saleem [Fri, 23 Jun 2017 21:04:02 +0000 (16:04 -0500)]
i40iw: Avoid memory leak of CQP request objects
Control Queue Pair (CQP) request objects, which have
not received a completion upon interface close, remain
in memory.
To fix this, identify and free all pending CQP request
objects during destroy CQP OP.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Henry Orosco [Fri, 23 Jun 2017 21:04:01 +0000 (16:04 -0500)]
i40iw: Update list correctly
To avoid infinite loop, in i40iw_ieq_handle_exception, update
plist inside while loop.
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Henry Orosco [Fri, 23 Jun 2017 21:04:00 +0000 (16:04 -0500)]
i40iw: Add missing memory barrier
Add missing write memory barrier before writing the
header containing valid bit to the WQE in i40iw_puda_send.
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Shiraz Saleem [Fri, 23 Jun 2017 21:03:59 +0000 (16:03 -0500)]
i40iw: Free QP resources on CQP destroy QP failure
Current flow leaves software QP structures in memory if
Control Queue Pair (CQP) destroy QP OP fails. To fix this,
free QP resources on fail of CQP destroy QP OP.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Shiraz Saleem [Fri, 23 Jun 2017 21:03:58 +0000 (16:03 -0500)]
i40iw: Release cm_id ref on PCI function reset
On PCI function reset, cm_id reference is not released
which causes an application hang, as it waits on the
cm_id to be released on rdma_destroy.
To fix this, call i40iw_cm_disconn during a PCI function
reset to clean-up resources and release cm_id reference.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Shiraz Saleem [Fri, 23 Jun 2017 21:03:57 +0000 (16:03 -0500)]
i40iw: Utilize iwdev->reset during PCI function reset
Utilize iwdev->reset on a PCI function reset notification
instead of passing in reset flag for resource clean-up.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mustafa Ismail [Fri, 23 Jun 2017 21:03:56 +0000 (16:03 -0500)]
i40iw: Do not poll CCQ after it is destroyed
Control Queue Pair (CQP) OPs, in this case - Update SDs,
cannot poll the Control Completion Queue (CCQ) after CCQ is
destroyed. Instead, poll via registers.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mustafa Ismail [Fri, 23 Jun 2017 21:03:55 +0000 (16:03 -0500)]
i40iw: Fix order of cleanup in close
The order for calling i40iw_destroy_pble_pool is incorrect.
Also, add PBLE_CHUNK_MEM init state to track pble pool
creation and destruction.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Tadeusz Struk [Tue, 30 May 2017 00:20:53 +0000 (17:20 -0700)]
IB/core: Allow QP state transition from reset to error
Playing with IP-O-IB interface can trigger a warning message:
"ib0: Failed to modify QP to ERROR state" to be logged.
This happens when the QP is in IB_QPS_RESET state and the stack
is trying to transition it to IB_QPS_ERR state in ipoib_ib_dev_stop().
According to the IB spec, Table 91 - "QP State Transition Properties"
it looks like the transition from reset to error is valid:
Transition: Any State to Error
Required Attributes: None
Optional Attributes: None allowed
Actions: Queue processing is stopped. Work Requests pending or in
process are completed in error, when possible.
This patch allows the transition and quiets the message.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
oulijun [Sat, 10 Jun 2017 10:49:25 +0000 (18:49 +0800)]
IB/hns: Fix for checkpatch.pl comment style warnings
This patch correct the comment style warnings caught by
checkpatch.pl script.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
oulijun [Sat, 10 Jun 2017 10:49:24 +0000 (18:49 +0800)]
IB/hns: Fix the bug with modifying the MAC address without removing the driver
When modified the MAC address used hns_roce_mac function, we release and create
reserved qp again, It is not necessary to use spin_lock_bh and spin_unlock_bh in
handle_en_event, Otherwise, it will occur a error. This patch mainly fixes it.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
oulijun [Sat, 10 Jun 2017 10:49:23 +0000 (18:49 +0800)]
IB/hns: Fix the bug with rdma operation
When opcode of work request is RDMA read and write, it
should use rdma_wr to get remote_addr and rkey. This
patch fixes it.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
oulijun [Sat, 10 Jun 2017 10:49:22 +0000 (18:49 +0800)]
IB/hns: Fix the bug with wild pointer when destroy rc qp
When destroyed rc qp, the hr_qp will be used after freed. This patch
will fix it.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
oulijun [Sat, 10 Jun 2017 10:49:21 +0000 (18:49 +0800)]
IB/hns: Fix the bug of polling cq failed for loopback Qps
In hip06 SoC, RoCE driver creates 8 reserved loopback QPs to
ensure zero wqe when free mr. However, if the enabled phy
port number is less than 6, it will fail in polling cqe with
8 reserved loopback QPs.
In order to solve this problem, the number of loopback Qps
will be adjusted based on the number of enabled phy port.
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
yonatanc [Thu, 22 Jun 2017 14:10:00 +0000 (17:10 +0300)]
IB/rxe: Set dma_mask and coherent_dma_mask
The RXE coupled with dummy device causes to the kernel panic attached
below. The panic happens when ib_register_device tries to set dma_mask
by accessing a NULLed parent device.
The RXE does not actually use DMA, so we can set the dma_mask
to architecture value.
[16240.199689] RIP: 0010:ib_register_device+0x468/0x5a0 [ib_core]
[16240.205289] RSP: 0018:
ffffc9000220fc10 EFLAGS:
00010246
[16240.209909] RAX:
0000000000000024 RBX:
ffff880220d1a2a8 RCX:
0000000000000000
[16240.212244] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000009
[16240.214385] RBP:
ffffc9000220fcb0 R08:
0000000000000000 R09:
000000000000023f
[16240.254465] R10:
0000000000000007 R11:
0000000000000000 R12:
0000000000000000
[16240.259467] R13:
0000000000000000 R14:
0000000000000000 R15:
ffff880220d1a2a8
[16240.263314] FS:
00007fd8ecca0740(0000) GS:
ffff8802364c0000(0000) knlGS:
0000000000000000
[16240.267292] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[16240.273503] CR2:
0000000000000218 CR3:
00000002253ba000 CR4:
00000000000006e0
[16240.277066] Call Trace:
[16240.281836] ? __kmalloc+0x26f/0x280
[16240.286596] rxe_register_device+0x297/0x300 [rdma_rxe]
[16240.291377] rxe_add+0x535/0x5b0 [rdma_rxe]
[16240.297586] rxe_net_add+0x3e/0xc0 [rdma_rxe]
[16240.302375] rxe_param_set_add+0x65/0x144 [rdma_rxe]
[16240.307769] param_attr_store+0x68/0xd0
[16240.311640] module_attr_store+0x1d/0x30
[16240.316421] sysfs_kf_write+0x3a/0x50
[16240.317802] kernfs_fop_write+0xff/0x180
[16240.322989] __vfs_write+0x37/0x140
[16240.328164] ? handle_mm_fault+0xce/0x240
[16240.333340] vfs_write+0xb2/0x1b0
[16240.335013] SyS_write+0x55/0xc0
[16240.340632] entry_SYSCALL_64_fastpath+0x1a/0xa9
Fixes:
8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Yonatan Cohen [Thu, 22 Jun 2017 14:09:59 +0000 (17:09 +0300)]
IB/rxe: Fix kernel panic from skb destructor
In the time between rxe_send has finished and skb destructor
called, the QP's ref count might be 0, leading to a possible
QP destruction. This will lead to a kernel panic when the destructor
dereferences the QP.
The operation of incrementing QP ref count at rxe_send and decrementing
from skb destructor will prevent this crash.
BUG: unable to handle kernel NULL pointer dereference at
000000000000072c
IP: [<
ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
PGD 0 [16240.211178]
Oops: 0002 [#1] SMP
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G OE 4.9.0-mlnx #1
Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
task:
ffff88042d6b1480 task.stack:
ffffc90001904000
RIP: 0010:[<
ffffffffa05df765>] [<
ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
RSP: 0018:
ffff88043fcc3df0 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
ffff880429684700 RCX:
ffff88042d248200
RDX:
00000000ffffffff RSI:
00000000fffffe01 RDI:
ffff880429684700
RBP:
ffff88043fcc3e00 R08:
ffff88043fcda240 R09:
00000000ff2d1de6
R10:
0000000000000000 R11:
00000000f49cf6fe R12:
ffff880429684700
R13:
ffffffff81893f96 R14:
ffffffff817d66f0 R15:
ffff880427f74200
FS:
0000000000000000(0000) GS:
ffff88043fcc0000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000000000000072c CR3:
000000041d3df000 CR4:
00000000000006e0
Stack:
ffffffff817b29cf ffff880429684700 ffff88043fcc3e18 ffffffff817b42c2
ffff880429684700 ffff88043fcc3e40 ffffffff817b4332 ffff880429684700
ffff880427f74238 ffff880427f74228 ffff88043fcc3e58 ffffffff81893f96
Call Trace:
<IRQ> [16240.336345] [<
ffffffff817b29cf>] ? skb_release_head_state+0x4f/0xb0
[<
ffffffff817b42c2>] skb_release_all+0x12/0x30
[<
ffffffff817b4332>] kfree_skb+0x32/0x90
[<
ffffffff81893f96>] ndisc_error_report+0x36/0x40
[<
ffffffff817d4de1>] neigh_invalidate+0x81/0xf0
[<
ffffffff817d68f7>] neigh_timer_handler+0x207/0x2b0
[<
ffffffff81109295>] call_timer_fn+0x35/0x120
[<
ffffffff81109db7>] run_timer_softirq+0x1d7/0x460
[<
ffffffff8106155e>] ? kvm_sched_clock_read+0x1e/0x30
[<
ffffffff810366b9>] ? sched_clock+0x9/0x10
[<
ffffffff810cfed2>] ? sched_clock_cpu+0x72/0xa0
[<
ffffffff818dd537>] __do_softirq+0xd7/0x289
[<
ffffffff810a6c95>] irq_exit+0xb5/0xc0
[<
ffffffff818dd372>] smp_apic_timer_interrupt+0x42/0x50
[<
ffffffff818dc682>] apic_timer_interrupt+0x82/0x90
<EOI> [16240.395776] [<
ffffffff818da156>] ? native_safe_halt+0x6/0x10
[<
ffffffff818d9e6e>] default_idle+0x1e/0xd0
[<
ffffffff8103797f>] arch_cpu_idle+0xf/0x20
[<
ffffffff818da2c5>] default_idle_call+0x35/0x40
[<
ffffffff810e3eb5>] cpu_startup_entry+0x185/0x210
[<
ffffffff81050433>] start_secondary+0x103/0x130
RIP [<
ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
Fixes:
8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Erez Shitrit [Mon, 12 Jun 2017 07:45:21 +0000 (10:45 +0300)]
IB/ipoib: Let lower driver handle get_stats64 call
The driver checks if the lower level driver supports get_stats, and if
so calls it to get the updated statistics, otherwise takes from the
current netdevice stats object.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Majd Dibbiny [Tue, 30 May 2017 06:58:06 +0000 (09:58 +0300)]
IB/core: Add ordered workqueue for RoCE GID management
Currently the RoCE GID management uses the ib_wq to do add and delete new GIDs
according to the netdev events.
The ib_wq isn't an ordered workqueue and thus two work elements can be executed
concurrently which will result in unexpected behavior and inconsistency of the
GIDs cache content.
Example:
ifconfig eth1 11.11.11.11/16 up
This command will invoke the following netdev events in the following order:
1. NETDEV_UP
2. NETDEV_DOWN
3. NETDEV_UP
If (2) and (3) will be executed concurrently or in reverse order, instead of
having a new GID with 11.11.11.11 IP, we will end up without any new GIDs.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 30 May 2017 06:44:48 +0000 (09:44 +0300)]
IB/mlx5: Clean mr_cache debugfs in case of failure
The failure in creation of debugfs entries for mr_cache left entries,
which were already created.
It caused to mismatch and misguiding for the end users. The solution
is to clean mr_cache debugfs root, so no leftovers will be in the
system. In addition, let's document why the error is not needed to be
forwarded to user in case of failure.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 23 May 2017 11:38:16 +0000 (14:38 +0300)]
IB/core: Remove NOIO QP create flag
There are no users for IB_QP_CREATE_USE_GFP_NOIO flag,
so let's remove it.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 23 May 2017 11:38:15 +0000 (14:38 +0300)]
{net, IB}/mlx4: Remove gfp flags argument
The caller to the driver marks GFP_NOIO allocations with help
of memalloc_noio-* calls now. This makes redundant to pass down
to the driver gfp flags, which can be GFP_KERNEL only.
The patch removes the gfp flags argument and updates all driver paths.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 23 May 2017 11:38:14 +0000 (14:38 +0300)]
IB/{rdmavt, qib, hfi1}: Remove gfp flags argument
The caller to the driver marks GFP_NOIO allocations with help
of memalloc_noio-* calls now. This makes redundant to pass down
to the driver gfp flags, which can be GFP_KERNEL only.
The patch removes the gfp flags argument and updates all driver paths.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 23 May 2017 11:38:13 +0000 (14:38 +0300)]
IB/IPoIB: Convert IPoIB to memalloc_noio_* calls
Commit
21caf2fc1931 ("mm: teach mm by current context info to not do I/O
during memory allocation") added the memalloc_noio_(save|restore) functions
to enable people to modify the MM behavior by disabling I/O during memory
allocation. This was further extended in Fixes:
934f3072c17c ("mm: clear
__GFP_FS when PF_MEMALLOC_NOIO is set"). memalloc_noio_* functions prevent
allocation paths recursing back into the filesystem without explicitly
changing the flags for every allocation site.
However the IPoIB hasn't been keeping up with the changes and missed
completely these memalloc_noio_* calls. This led to update of
allocation site with special QP creation flag, see commit
09b93088d750
("IB: Add a QP creation flag to use GFP_NOIO allocations"), while this
flag is supported by small number of drivers in IB stack.
Let's change it by updating to memalloc_noio_* calls and allow
for every driver underneath enjoy NOIO allocations.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Erez Shitrit [Tue, 23 May 2017 08:42:52 +0000 (11:42 +0300)]
IB/IPoIB: Forward MTU change to driver below
This patch checks if there is a driver below that
needs to be updated on the new MTU and calls it
accordingly.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 23 May 2017 08:29:42 +0000 (11:29 +0300)]
IB: Convert msleep below 20ms to usleep_range
The msleep(1) may do not sleep 1 ms as expected
and will sleep longer. The simple conversion from
msleep to usleep_range between 1ms and 2ms can solve an
issue.
The full and comprehensive explanation can be found at [1] and [2].
[1] https://lkml.org/lkml/2007/8/3/250
[2] Documentation/timers/timers-howto.txt
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Parav Pandit [Tue, 23 May 2017 08:26:09 +0000 (11:26 +0300)]
IB/uverbs: Make use of ib_modify_qp variant to avoid resolving DMAC
This patch makes use of IB core's ib_modify_qp_with_udata function that
also resolves the DMAC and handles udata.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Parav Pandit [Tue, 23 May 2017 08:26:08 +0000 (11:26 +0300)]
IB/core: Introduce modify QP operation with udata
This patch adds new function ib_modify_qp_with_udata so that
uverbs layer can avoid handling L2 mac address at verbs layer
and depend on the core layer to resolve the mac address consistently
for all required QPs.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Tue, 23 May 2017 07:48:45 +0000 (10:48 +0300)]
IB/core: Don't resolve IP address to the loopback device
When resolving an IP address that is on the host of the caller the
result from querying the routing table is the loopback device. This is
not a valid response, because it doesn't represent the RDMA device and
the port.
Therefore, callers need to check the resolved device and if it is a
loopback device find an alternative way to resolve it. To avoid this we
make sure that the response from rdma_resolve_ip() will not be the
loopback device.
While that, we fix an static checker warning about dereferencing an
unintitialized pointer using the same solution as in commit
abeffce90c7f
("net/mlx5e: Fix a -Wmaybe-uninitialized warning") as a reference.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Tue, 23 May 2017 07:48:44 +0000 (10:48 +0300)]
IB/core: Namespace is mandatory input for address resolution
In function addr_resolve() the namespace is a required input parameter
and not an output. It is passed later for searching the routing table
and device addresses. Also, it shouldn't be copied back to the caller.
Fixes:
565edd1d5555 ('IB/addr: Pass network namespace as a parameter')
Cc: <stable@vger.kernel.org> # v4.3+
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Vladimir Neyelov [Sun, 21 May 2017 16:17:31 +0000 (19:17 +0300)]
IB/iser: Fix connection teardown race condition
Under heavy iser target(scst) start/stop stress during login/logout
on iser intitiator side happened trace call provided below.
The function iscsi_iser_slave_alloc iser_conn pointer could be NULL,
due to the fact that function iscsi_iser_conn_stop can be called before
and free iser connection. Let's protect that flow by introducing global mutex.
BUG: unable to handle kernel paging request at
0000000000001018
IP: [<
ffffffffc0426f7e>] iscsi_iser_slave_alloc+0x1e/0x50 [ib_iser]
Call Trace:
? scsi_alloc_sdev+0x242/0x300
scsi_probe_and_add_lun+0x9e1/0xea0
? kfree_const+0x21/0x30
? kobject_set_name_vargs+0x76/0x90
? __pm_runtime_resume+0x5b/0x70
__scsi_scan_target+0xf6/0x250
scsi_scan_target+0xea/0x100
iscsi_user_scan_session.part.13+0x101/0x130 [scsi_transport_iscsi]
? iscsi_user_scan_session.part.13+0x130/0x130 [scsi_transport_iscsi]
iscsi_user_scan_session+0x1e/0x30 [scsi_transport_iscsi]
device_for_each_child+0x50/0x90
iscsi_user_scan+0x44/0x60 [scsi_transport_iscsi]
store_scan+0xa8/0x100
? common_file_perm+0x5d/0x1c0
dev_attr_store+0x18/0x30
sysfs_kf_write+0x37/0x40
kernfs_fop_write+0x12c/0x1c0
__vfs_write+0x18/0x40
vfs_write+0xb5/0x1a0
SyS_write+0x55/0xc0
Fixes:
318d311e8f01 ("iser: Accept arbitrary sg lists mapping if the device supports it")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Vladimir Neyelov <vladimirn@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Gustavo A. R. Silva [Fri, 5 May 2017 01:38:20 +0000 (20:38 -0500)]
RDMA/core: Document confusing code
While looking into Coverity ID
1351047 I ran into the following
piece of code at
drivers/infiniband/core/verbs.c:496:
ret = rdma_addr_find_l2_eth_by_grh(&dgid, &sgid,
ah_attr->dmac,
wc->wc_flags & IB_WC_WITH_VLAN ?
NULL : &vlan_id,
&if_index, &hoplimit);
The issue here is that the position of arguments in the call to
rdma_addr_find_l2_eth_by_grh() function do not match the order of
the parameters:
&dgid is passed to sgid
&sgid is passed to dgid
This is the function prototype:
int rdma_addr_find_l2_eth_by_grh(const union ib_gid *sgid,
const union ib_gid *dgid,
u8 *dmac, u16 *vlan_id, int *if_index,
int *hoplimit)
My question here is if this is intentional?
Answer:
Yes. ib_init_ah_from_wc() creates ah from the incoming packet.
Incoming packet has dgid of the receiver node on which this code is
getting executed and sgid contains the GID of the sender.
When resolving mac address of destination, you use arrived dgid as
sgid and use sgid as dgid because sgid contains destinations GID whom to
respond to.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Mon, 24 Apr 2017 22:15:28 +0000 (15:15 -0700)]
mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
than what fits into a single MR. .map_mr_sg() must not attempt to
map more SG-list elements than what fits into a single MR.
Hence make sure that mlx5_ib_sg_to_klms() does not write outside
the MR klms[] array.
Fixes:
b005d3164713 ("mlx5: Add arbitrary sg list support")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: <stable@vger.kernel.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 30 May 2017 00:18:14 +0000 (17:18 -0700)]
IB/hfi1: Ensure dd->gi_mask can not be overflowed
As the code stands today the array access in remap_intr() is OK. To
future proof the code though we should explicitly check to ensure the
index value is not outside of the valid range. This is not a straight
forward calculation so err on the side of caution.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Doug Ledford [Mon, 17 Jul 2017 15:26:58 +0000 (11:26 -0400)]
Merge tag 'v4.13-rc1' into k.o/for-4.13-rc
Linux v4.13-rc1
Linus Torvalds [Sat, 15 Jul 2017 22:22:10 +0000 (15:22 -0700)]
Linux v4.13-rc1
Linus Torvalds [Sat, 15 Jul 2017 19:58:58 +0000 (12:58 -0700)]
Merge tag 'standardize-docs' of git://git.lwn.net/linux
Pull documentation format standardization from Jonathan Corbet:
"This series converts a number of top-level documents to the RST format
without incorporating them into the Sphinx tree. The hope is to bring
some uniformity to kernel documentation and, perhaps more importantly,
have our existing docs serve as an example of the desired formatting
for those that will be added later.
Mauro has gone through and fixed up a lot of top-level documentation
files to make them conform to the RST format, but without moving or
renaming them in any way. This will help when we incorporate the ones
we want to keep into the Sphinx doctree, but the real purpose is to
bring a bit of uniformity to our documentation and let the top-level
docs serve as examples for those writing new ones"
* tag 'standardize-docs' of git://git.lwn.net/linux: (84 commits)
docs: kprobes.txt: Fix whitespacing
tee.txt: standardize document format
cgroup-v2.txt: standardize document format
dell_rbu.txt: standardize document format
zorro.txt: standardize document format
xz.txt: standardize document format
xillybus.txt: standardize document format
vfio.txt: standardize document format
vfio-mediated-device.txt: standardize document format
unaligned-memory-access.txt: standardize document format
this_cpu_ops.txt: standardize document format
svga.txt: standardize document format
static-keys.txt: standardize document format
smsc_ece1099.txt: standardize document format
SM501.txt: standardize document format
siphash.txt: standardize document format
sgi-ioc4.txt: standardize document format
SAK.txt: standardize document format
rpmsg.txt: standardize document format
robust-futexes.txt: standardize document format
...
Linus Torvalds [Sat, 15 Jul 2017 19:44:02 +0000 (12:44 -0700)]
Merge tag 'random_for_linus' of git://git./linux/kernel/git/tytso/random
Pull random updates from Ted Ts'o:
"Add wait_for_random_bytes() and get_random_*_wait() functions so that
callers can more safely get random bytes if they can block until the
CRNG is initialized.
Also print a warning if get_random_*() is called before the CRNG is
initialized. By default, only one single-line warning will be printed
per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
warning will be printed for each function which tries to get random
bytes before the CRNG is initialized. This can get spammy for certain
architecture types, so it is not enabled by default"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: reorder READ_ONCE() in get_random_uXX
random: suppress spammy warnings about unseeded randomness
random: warn when kernel uses unseeded randomness
net/route: use get_random_int for random counter
net/neighbor: use get_random_u32 for 32-bit hash random
rhashtable: use get_random_u32 for hash_rnd
ceph: ensure RNG is seeded before using
iscsi: ensure RNG is seeded before use
cifs: use get_random_u32 for 32-bit lock random
random: add get_random_{bytes,u32,u64,int,long,once}_wait family
random: add wait_for_random_bytes() API
Linus Torvalds [Sat, 15 Jul 2017 19:00:42 +0000 (12:00 -0700)]
Merge branch 'work.mount' of git://git./linux/kernel/git/viro/vfs
Pull ->s_options removal from Al Viro:
"Preparations for fsmount/fsopen stuff (coming next cycle). Everything
gets moved to explicit ->show_options(), killing ->s_options off +
some cosmetic bits around fs/namespace.c and friends. Basically, the
stuff needed to work with fsmount series with minimum of conflicts
with other work.
It's not strictly required for this merge window, but it would reduce
the PITA during the coming cycle, so it would be nice to have those
bits and pieces out of the way"
* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
isofs: Fix isofs_show_options()
VFS: Kill off s_options and helpers
orangefs: Implement show_options
9p: Implement show_options
isofs: Implement show_options
afs: Implement show_options
affs: Implement show_options
befs: Implement show_options
spufs: Implement show_options
bpf: Implement show_options
ramfs: Implement show_options
pstore: Implement show_options
omfs: Implement show_options
hugetlbfs: Implement show_options
VFS: Don't use save/replace_mount_options if not using generic_show_options
VFS: Provide empty name qstr
VFS: Make get_filesystem() return the affected filesystem
VFS: Clean up whitespace in fs/namespace.c and fs/super.c
Provide a function to create a NUL-terminated string from unterminated data
Linus Torvalds [Sat, 15 Jul 2017 18:47:27 +0000 (11:47 -0700)]
Merge branch 'work.__copy_to_user' of git://git./linux/kernel/git/viro/vfs
Pull more __copy_.._user elimination from Al Viro.
* 'work.__copy_to_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
drm_dp_aux_dev: switch to read_iter/write_iter
Linus Torvalds [Sat, 15 Jul 2017 18:17:52 +0000 (11:17 -0700)]
Merge branch 'work.uaccess-unaligned' of git://git./linux/kernel/git/viro/vfs
Pull uacess-unaligned removal from Al Viro:
"That stuff had just one user, and an exotic one, at that - binfmt_flat
on arm and m68k"
* 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kill {__,}{get,put}_user_unaligned()
binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail
Linus Torvalds [Sat, 15 Jul 2017 18:06:17 +0000 (11:06 -0700)]
Merge branch 'misc.compat' of git://git./linux/kernel/git/viro/vfs
Pull network field-by-field copy-in updates from Al Viro:
"This part of the misc compat queue was held back for review from
networking folks and since davem has jus ACKed those..."
* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
get_compat_bpf_fprog(): don't copyin field-by-field
get_compat_msghdr(): get rid of field-by-field copyin
copy_msghdr_from_user(): get rid of field-by-field copyin
Linus Torvalds [Sat, 15 Jul 2017 17:59:54 +0000 (10:59 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
"Boston platform support:
- Document DT bindings
- Add CLK driver for board clocks
CM:
- Avoid per-core locking with CM3 & higher
- WARN on attempt to lock invalid VP, not BUG
CPS:
- Select CONFIG_SYS_SUPPORTS_SCHED_SMT for MIPSr6
- Prevent multi-core with dcache aliasing
- Handle cores not powering down more gracefully
- Handle spurious VP starts more gracefully
DSP:
- Add lwx & lhx missaligned access support
eBPF:
- Add MIPS support along with many supporting change to add the
required infrastructure
Generic arch code:
- Misc sysmips MIPS_ATOMIC_SET fixes
- Drop duplicate HAVE_SYSCALL_TRACEPOINTS
- Negate error syscall return in trace
- Correct forced syscall errors
- Traced negative syscalls should return -ENOSYS
- Allow samples/bpf/tracex5 to access syscall arguments for sane
traces
- Cleanup from old Kconfig options in defconfigs
- Fix PREF instruction usage by memcpy for MIPS R6
- Fix various special cases in the FPU eulation
- Fix some special cases in MIPS16e2 support
- Fix MIPS I ISA /proc/cpuinfo reporting
- Sort MIPS Kconfig alphabetically
- Fix minimum alignment requirement of IRQ stack as required by
ABI / GCC
- Fix special cases in the module loader
- Perform post-DMA cache flushes on systems with MAARs
- Probe the I6500 CPU
- Cleanup cmpxchg and add support for 1 and 2 byte operations
- Use queued read/write locks (qrwlock)
- Use queued spinlocks (qspinlock)
- Add CPU shared FTLB feature detection
- Handle tlbex-tlbp race condition
- Allow storing pgd in C0_CONTEXT for MIPSr6
- Use current_cpu_type() in m4kc_tlbp_war()
- Support Boston in the generic kernel
Generic platform:
- yamon-dt: Pull YAMON DT shim code out of SEAD-3 board
- yamon-dt: Support > 256MB of RAM
- yamon-dt: Use serial* rather than uart* aliases
- Abstract FDT fixup application
- Set RTC_ALWAYS_BCD to 0
- Add a MAINTAINERS entry
core kernel:
- qspinlock.c: include linux/prefetch.h
Loongson 3:
- Add support
Perf:
- Add I6500 support
SEAD-3:
- Remove GIC timer from DT
- Set interrupt-parent per-device, not at root node
- Fix GIC interrupt specifiers
SMP:
- Skip IPI setup if we only have a single CPU
VDSO:
- Make comment match reality
- Improvements to time code in VDSO"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (86 commits)
locking/qspinlock: Include linux/prefetch.h
MIPS: Fix MIPS I ISA /proc/cpuinfo reporting
MIPS: Fix minimum alignment requirement of IRQ stack
MIPS: generic: Support MIPS Boston development boards
MIPS: DTS: img: Don't attempt to build-in all .dtb files
clk: boston: Add a driver for MIPS Boston board clocks
dt-bindings: Document img,boston-clock binding
MIPS: Traced negative syscalls should return -ENOSYS
MIPS: Correct forced syscall errors
MIPS: Negate error syscall return in trace
MIPS: Drop duplicate HAVE_SYSCALL_TRACEPOINTS select
MIPS16e2: Provide feature overrides for non-MIPS16 systems
MIPS: MIPS16e2: Report ASE presence in /proc/cpuinfo
MIPS: MIPS16e2: Subdecode extended LWSP/SWSP instructions
MIPS: MIPS16e2: Identify ASE presence
MIPS: VDSO: Fix a mismatch between comment and preprocessor constant
MIPS: VDSO: Add implementation of gettimeofday() fallback
MIPS: VDSO: Add implementation of clock_gettime() fallback
MIPS: VDSO: Fix conversions in do_monotonic()/do_monotonic_coarse()
MIPS: Use current_cpu_type() in m4kc_tlbp_war()
...
Linus Torvalds [Sat, 15 Jul 2017 17:49:33 +0000 (10:49 -0700)]
Merge branch 'for-linus-4.13-rc1' of git://git./linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
"Mostly fixes for UML:
- First round of fixes for PTRACE_GETRESET/SETREGSET
- A printf vs printk cleanup
- Minor improvements"
* 'for-linus-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Correctly check for PTRACE_GETRESET/SETREGSET
um: v2: Use generic NOTES macro
um: Add kerneldoc for userspace_tramp() and start_userspace()
um: Add kerneldoc for segv_handler
um: stub-data.h: remove superfluous include
um: userspace - be more verbose in ptrace set regs error
um: add dummy ioremap and iounmap functions
um: Allow building and running on older hosts
um: Avoid longjmp/setjmp symbol clashes with libpthread.a
um: console: Ignore console= option
um: Use os_warn to print out pre-boot warning/error messages
um: Add os_warn() for pre-boot warning/error messages
um: Use os_info for the messages on normal path
um: Add os_info() for pre-boot information messages
um: Use printk instead of printf in make_uml_dir
Linus Torvalds [Sat, 15 Jul 2017 17:46:14 +0000 (10:46 -0700)]
Merge tag 'upstream-4.13-rc1' of git://git.infradead.org/linux-ubifs
Pull UBIFS updates from Richard Weinberger:
- Updates and fixes for the file encryption mode
- Minor improvements
- Random fixes
* tag 'upstream-4.13-rc1' of git://git.infradead.org/linux-ubifs:
ubifs: Set double hash cookie also for RENAME_EXCHANGE
ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs
ubifs: Don't leak kernel memory to the MTD
ubifs: Change gfp flags in page allocation for bulk read
ubifs: Fix oops when remounting with no_bulk_read.
ubifs: Fail commit if TNC is obviously inconsistent
ubifs: allow userspace to map mounts to volumes
ubifs: Wire-up statx() support
ubifs: Remove dead code from ubifs_get_link()
ubifs: Massage debug prints wrt. fscrypt
ubifs: Add assert to dent_key_init()
ubifs: Fix unlink code wrt. double hash lookups
ubifs: Fix data node size for truncating uncompressed nodes
ubifs: Don't encrypt special files on creation
ubifs: Fix memory leak in RENAME_WHITEOUT error path in do_rename
ubifs: Fix inode data budget in ubifs_mknod
ubifs: Correctly evict xattr inodes
ubifs: Unexport ubifs_inode_slab
ubifs: don't bother checking for encryption key in ->mmap()
ubifs: require key for truncate(2) of encrypted file
Linus Torvalds [Sat, 15 Jul 2017 17:18:16 +0000 (10:18 -0700)]
Merge tag 'kvm-4.13-2' of git://git./virt/kvm/kvm
Pull more KVM updates from Radim Krčmář:
"Second batch of KVM updates for v4.13
Common:
- add uevents for VM creation/destruction
- annotate and properly access RCU-protected objects
s390:
- rename IOCTL added in the first v4.13 merge
x86:
- emulate VMLOAD VMSAVE feature in SVM
- support paravirtual asynchronous page fault while nested
- add Hyper-V userspace interfaces for better migration
- improve master clock corner cases
- extend internal error reporting after EPT misconfig
- correct single-stepping of emulated instructions in SVM
- handle MCE during VM entry
- fix nVMX VM entry checks and nVMX VMCS shadowing"
* tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
kvm: x86: hyperv: make VP_INDEX managed by userspace
KVM: async_pf: Let guest support delivery of async_pf from guest mode
KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list
kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2
KVM: x86: make backwards_tsc_observed a per-VM variable
KVM: trigger uevents when creating or destroying a VM
KVM: SVM: Enable Virtual VMLOAD VMSAVE feature
KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition
KVM: SVM: Rename lbr_ctl field in the vmcb control area
KVM: SVM: Prepare for new bit definition in lbr_ctl
KVM: SVM: handle singlestep exception when skipping emulated instructions
KVM: x86: take slots_lock in kvm_free_pit
KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition
kvm: vmx: Properly handle machine check during VM-entry
KVM: x86: update master clock before computing kvmclock_offset
kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields
kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls
kvm: nVMX: Validate the I/O bitmaps on nested VM-entry
...
Sebastian Andrzej Siewior [Fri, 30 Jun 2017 14:37:13 +0000 (16:37 +0200)]
random: reorder READ_ONCE() in get_random_uXX
Avoid the READ_ONCE in commit
4a072c71f49b ("random: silence compiler
warnings and fix race") if we can leave the function after
arch_get_random_XXX().
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Thu, 8 Jun 2017 08:16:59 +0000 (04:16 -0400)]
random: suppress spammy warnings about unseeded randomness
Unfortunately, on some models of some architectures getting a fully
seeded CRNG is extremely difficult, and so this can result in dmesg
getting spammed for a surprisingly long time. This is really bad from
a security perspective, and so architecture maintainers really need to
do what they can to get the CRNG seeded sooner after the system is
booted. However, users can't do anything actionble to address this,
and spamming the kernel messages log will only just annoy people.
For developers who want to work on improving this situation,
CONFIG_WARN_UNSEEDED_RANDOM has been renamed to
CONFIG_WARN_ALL_UNSEEDED_RANDOM. By default the kernel will always
print the first use of unseeded randomness. This way, hopefully the
security obsessed will be happy that there is _some_ indication when
the kernel boots there may be a potential issue with that architecture
or subarchitecture. To see all uses of unseeded randomness,
developers can enable CONFIG_WARN_ALL_UNSEEDED_RANDOM.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Sat, 15 Jul 2017 05:57:32 +0000 (22:57 -0700)]
Merge tag 'xfs-4.13-merge-6' of git://git./fs/xfs/xfs-linux
Pull XFS fixes from Darrick Wong:
"Largely debugging and regression fixes.
- Add some locking assertions for the _ilock helpers.
- Revert the XFS_QMOPT_NOLOCK patch; after discussion with hch the
online fsck patch that would have needed it has been redesigned and
no longer needs it.
- Fix behavioral regression of SEEK_HOLE/DATA with negative offsets
to match 4.12-era XFS behavior"
* tag 'xfs-4.13-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: in iomap seek_{hole,data}, return -ENXIO for negative offsets
Revert "xfs: grab dquots without taking the ilock"
xfs: assert locking precondition in xfs_readlink_bmap_ilocked
xfs: assert locking precondіtion in xfs_attr_list_int_ilocked
xfs: fixup xfs_attr_get_ilocked
Linus Torvalds [Sat, 15 Jul 2017 05:55:52 +0000 (22:55 -0700)]
Merge branch 'for-4.13-part2' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"We've identified and fixed a silent corruption (introduced by code in
the first pull), a fixup after the blk_status_t merge and two fixes to
incremental send that Filipe has been hunting for some time"
* 'for-4.13-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix unexpected return value of bio_readpage_error
btrfs: btrfs_create_repair_bio never fails, skip error handling
btrfs: cloned bios must not be iterated by bio_for_each_segment_all
Btrfs: fix write corruption due to bio cloning on raid5/6
Btrfs: incremental send, fix invalid memory access
Btrfs: incremental send, fix invalid path for link commands
Linus Torvalds [Sat, 15 Jul 2017 05:53:37 +0000 (22:53 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull a few more input updates from Dmitry Torokhov:
- multi-touch handling for Xen
- fix for long-standing bug causing crashes in i8042 on boot
- change to gpio_keys to better handle key presses during system state
transition
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - fix crash at boot time
Input: gpio_keys - handle the missing key press event in resume phase
Input: xen-kbdfront - add multi-touch support
Linus Torvalds [Sat, 15 Jul 2017 05:49:50 +0000 (22:49 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- fix new compiler warnings in cavium
- set post-op IV properly in caam (this fixes chaining)
- fix potential use-after-free in atmel in case of EBUSY
- fix sleeping in softirq path in chcr
- disable buggy sha1-avx2 driver (may overread and page fault)
- fix use-after-free on signals in caam
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: cavium - make several functions static
crypto: chcr - Avoid algo allocation in softirq.
crypto: caam - properly set IV after {en,de}crypt
crypto: atmel - only treat EBUSY as transient if backlog
crypto: af_alg - Avoid sock_graft call warning
crypto: caam - fix signals handling
crypto: sha1-ssse3 - Disable avx2
Linus Torvalds [Sat, 15 Jul 2017 05:39:35 +0000 (22:39 -0700)]
Merge tag 'devprop-fix-4.13-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull device properties framework fix from Rafael Wysocki:
"This fixes a problem with bool properties that could be seen as "true"
when the property was not present at all by adding a special helper
for bool properties with checks for all of the requisute conditions
(Sakari Ailus)"
* tag 'devprop-fix-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
device property: Introduce fwnode_call_bool_op() for ops that return bool
Linus Torvalds [Sat, 15 Jul 2017 05:27:13 +0000 (22:27 -0700)]
Merge tag 'acpi-fixes-4.13-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix the return value of an IRQ mapping routine in the ACPI core,
fix an EC driver issue causing abnormal fan behavior after system
resume on some systems and add quirks for ACPI device objects that
need to be treated as "always present" to work around bogus
implementations of the _STA control method.
Specifics:
- Fix the return value of acpi_gsi_to_irq() to make the GSI to IRQ
mapping work on the Mustang (ARM64) platform (Mark Salter).
- Fix an EC driver issue that causes fans to behave abnormally after
system resume on some systems which turns out to be related to
switching over the EC into the polling mode during the noirq stages
of system suspend and resume (Lv Zheng).
- Add quirks for ACPI device objects that need to be treated as
"always present", because their _STA methods are designed to work
around Windows driver bugs and return garbage from our perspective
(Hans de Goede)"
* tag 'acpi-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / x86: Add KIOX000A accelerometer on GPD win to always_present_ids array
ACPI / x86: Add Dell Venue 11 Pro 7130 touchscreen to always_present_ids
ACPI / x86: Allow matching always_present_id array entries by DMI
Revert "ACPI / EC: Enable event freeze mode..." to fix a regression
ACPI / EC: Drop EC noirq hooks to fix a regression
ACPI / irq: Fix return code of acpi_gsi_to_irq()
Linus Torvalds [Sat, 15 Jul 2017 05:24:25 +0000 (22:24 -0700)]
Merge tag 'pm-fixes-4.13-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a recently exposed issue in the PCI device wakeup code and
one older problem related to PCI device wakeup that has been reported
recently, modify one more piece of computations in intel_pstate to get
rid of a rounding error, fix a possible race in the schedutil cpufreq
governor, fix the device PM QoS sysfs interface to correctly handle
invalid user input, fix return values of two probe routines in devfreq
drivers and constify an attribute_group structure in devfreq.
Specifics:
- Avoid clearing the PCI PME Enable bit for devices as a result of
config space restoration which confuses AML executed afterward and
causes wakeup events to be lost on some systems (Rafael Wysocki).
- Fix the native PCIe PME interrupts handling in the cases when the
PME IRQ is set up as a system wakeup one so that runtime PM remote
wakeup works as expected after system resume on systems where that
happens (Rafael Wysocki).
- Fix the device PM QoS sysfs interface to handle invalid user input
correctly instead of using an unititialized variable value as the
latency tolerance for the device at hand (Dan Carpenter).
- Get rid of one more rounding error from intel_pstate computations
(Srinivas Pandruvada).
- Fix the schedutil cpufreq governor to prevent it from possibly
accessing unititialized data structures from governor callbacks in
some cases on systems when multiple CPUs share a single cpufreq
policy object (Vikram Mulukutla).
- Fix the return values of probe routines in two devfreq drivers
(Gustavo Silva).
- Constify an attribute_group structure in devfreq (Arvind Yadav)"
* tag 'pm-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI / PM: Fix native PME handling during system suspend/resume
PCI / PM: Restore PME Enable after config space restoration
cpufreq: schedutil: Fix sugov_start() versus sugov_update_shared() race
PM / QoS: return -EINVAL for bogus strings
cpufreq: intel_pstate: Fix ratio setting for min_perf_pct
PM / devfreq: constify attribute_group structures.
PM / devfreq: tegra: fix error return code in tegra_devfreq_probe()
PM / devfreq: rk3399_dmc: fix error return code in rk3399_dmcfreq_probe()
Linus Torvalds [Sat, 15 Jul 2017 04:57:25 +0000 (21:57 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge even more updates from Andrew Morton:
- a few leftovers
- fault-injector rework
- add a module loader test driver
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kmod: throttle kmod thread limit
kmod: add test driver to stress test the module loader
MAINTAINERS: give kmod some maintainer love
xtensa: use generic fb.h
fault-inject: add /proc/<pid>/fail-nth
fault-inject: simplify access check for fail-nth
fault-inject: make fail-nth read/write interface symmetric
fault-inject: parse as natural 1-based value for fail-nth write interface
fault-inject: automatically detect the number base for fail-nth write interface
kernel/watchdog.c: use better pr_fmt prefix
MAINTAINERS: move the befs tree to kernel.org
lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int
mm: fix overflow check in expand_upwards()
Daniel Micay [Fri, 14 Jul 2017 21:28:12 +0000 (17:28 -0400)]
replace incorrect strscpy use in FORTIFY_SOURCE
Using strscpy was wrong because FORTIFY_SOURCE is passing the maximum
possible size of the outermost object, but strscpy defines the count
parameter as the exact buffer size, so this could copy past the end of
the source. This would still be wrong with the planned usage of
__builtin_object_size(p, 1) for intra-object overflow checks since it's
the maximum possible size of the specified object with no guarantee of
it being that large.
Reuse of the fortified functions like this currently makes the runtime
error reporting less precise but that can be improved later on.
Noticed by Dave Jones and KASAN.
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 15 Jul 2017 04:50:50 +0000 (21:50 -0700)]
Merge git://git./linux/kernel/git/cmetcalf/linux-tile
Pull arch/tile updates from Chris Metcalf:
"This adds support for an <arch/intreg.h> to help with removing
__need_xxx #defines from glibc, and removes some dead code in
arch/tile/mm/init.c"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
mm, tile: drop arch_{add,remove}_memory
tile: prefer <arch/intreg.h> to __need_int_reg_t
Linus Torvalds [Fri, 14 Jul 2017 22:33:15 +0000 (15:33 -0700)]
Merge tag 'powerpc-4.13-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Nothing that really stands out, just a bunch of fixes that have come
in in the last couple of weeks.
None of these are actually fixes for code that is new in 4.13. It's
roughly half older bugs, with fixes going to stable, and half
fixes/updates for Power9.
Thanks to: Aneesh Kumar K.V, Anton Blanchard, Balbir Singh, Benjamin
Herrenschmidt, Madhavan Srinivasan, Michael Neuling, Nicholas Piggin,
Oliver O'Halloran"
* tag 'powerpc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Fix atomic64_inc_not_zero() to return an int
powerpc: Fix emulation of mfocrf in emulate_step()
powerpc: Fix emulation of mcrf in emulate_step()
powerpc/perf: Add POWER9 alternate PM_RUN_CYC and PM_RUN_INST_CMPL events
powerpc/perf: Fix SDAR_MODE value for continous sampling on Power9
powerpc/asm: Mark cr0 as clobbered in mftb()
powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9
powerpc/mm/radix: Synchronize updates to the process table
powerpc/mm/radix: Properly clear process table entry
powerpc/powernv: Tell OPAL about our MMU mode on POWER9
powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR
Luis R. Rodriguez [Fri, 14 Jul 2017 21:50:11 +0000 (14:50 -0700)]
kmod: throttle kmod thread limit
If we reach the limit of modprobe_limit threads running the next
request_module() call will fail. The original reason for adding a kill
was to do away with possible issues with in old circumstances which would
create a recursive series of request_module() calls.
We can do better than just be super aggressive and reject calls once we've
reached the limit by simply making pending callers wait until the
threshold has been reduced, and then throttling them in, one by one.
This throttling enables requests over the kmod concurrent limit to be
processed once a pending request completes. Only the first item queued up
to wait is woken up. The assumption here is once a task is woken it will
have no other option to also kick the queue to check if there are more
pending tasks -- regardless of whether or not it was successful.
By throttling and processing only max kmod concurrent tasks we ensure we
avoid unexpected fatal request_module() calls, and we keep memory
consumption on module loading to a minimum.
With x86_64 qemu, with 4 cores, 4 GiB of RAM it takes the following run
time to run both tests:
time ./kmod.sh -t 0008
real 0m16.366s
user 0m0.883s
sys 0m8.916s
time ./kmod.sh -t 0009
real 0m50.803s
user 0m0.791s
sys 0m9.852s
Link: http://lkml.kernel.org/r/20170628223155.26472-4-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis R. Rodriguez [Fri, 14 Jul 2017 21:50:08 +0000 (14:50 -0700)]
kmod: add test driver to stress test the module loader
This adds a new stress test driver for kmod: the kernel module loader.
The new stress test driver, test_kmod, is only enabled as a module right
now. It should be possible to load this as built-in and load tests
early (refer to the force_init_test module parameter), however since a
lot of test can get a system out of memory fast we leave this disabled
for now.
Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
fast with this test driver.
The test_kmod driver exposes API knobs for us to fine tune simple
request_module() and get_fs_type() calls. Since these API calls only
allow each one parameter a test driver for these is rather simple.
Other factors that can help out test driver though are the number of
calls we issue and knowing current limitations of each. This exposes
configuration as much as possible through userspace to be able to build
tests directly from userspace.
Since it allows multiple misc devices its will eventually (once we add a
knob to let us create new devices at will) also be possible to perform
more tests in parallel, provided you have enough memory.
We only enable tests we know work as of right now.
Demo screenshots:
# tools/testing/selftests/kmod/kmod.sh
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0002_driver: OK! - loading kmod test
kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0002_fs: OK! - loading kmod test
kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0003: OK! - loading kmod test
kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0004: OK! - loading kmod test
kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
XXX: add test restult for 0007
Test completed
You can also request for specific tests:
# tools/testing/selftests/kmod/kmod.sh -t 0001
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
Test completed
Lastly, the current available number of tests:
# tools/testing/selftests/kmod/kmod.sh --help
Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
Valid tests: 0001-0009
0001 - Simple test - 1 thread for empty string
0002 - Simple test - 1 thread for modules/filesystems that do not exist
0003 - Simple test - 1 thread for get_fs_type() only
0004 - Simple test - 2 threads for get_fs_type() only
0005 - multithreaded tests with default setup - request_module() only
0006 - multithreaded tests with default setup - get_fs_type() only
0007 - multithreaded tests with default setup test request_module() and get_fs_type()
0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
The following test cases currently fail, as such they are not currently
enabled by default:
# tools/testing/selftests/kmod/kmod.sh -t 0008
# tools/testing/selftests/kmod/kmod.sh -t 0009
To be sure to run them as intended please unload both of the modules:
o test_module
o xfs
And ensure they are not loaded on your system prior to testing them. If
you use these paritions for your rootfs you can change the default test
driver used for get_fs_type() by exporting it into your environment. For
example of other test defaults you can override refer to kmod.sh
allow_user_defaults().
Behind the scenes this is how we fine tune at a test case prior to
hitting a trigger to run it:
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
Finally to trigger:
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
The kmod.sh script uses the above constructs to build different test cases.
A bit of interpretation of the current failures follows, first two
premises:
a) When request_module() is used userspace figures out an optimized
version of module order for us. Once it finds the modules it needs, as
per depmod symbol dep map, it will finit_module() the respective
modules which are needed for the original request_module() request.
b) We have an optimization in place whereby if a kernel uses
request_module() on a module already loaded we never bother userspace
as the module already is loaded. This is all handled by kernel/kmod.c.
A few things to consider to help identify root causes of issues:
0) kmod 19 has a broken heuristic for modules being assumed to be
built-in to your kernel and will return 0 even though request_module()
failed. Upgrade to a newer version of kmod.
1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
not for "xfs". The optimization in kernel described in b) fails to
catch if we have a lot of consecutive get_fs_type() calls. The reason
is the optimization in place does not look for aliases. This means two
consecutive get_fs_type() calls will bump kmod_concurrent, whereas
request_module() will not.
This one explanation why test case 0009 fails at least once for
get_fs_type().
2) If a module fails to load --- for whatever reason (kmod_concurrent
limit reached, file not yet present due to rootfs switch, out of
memory) we have a period of time during which module request for the
same name either with request_module() or get_fs_type() will *also*
fail to load even if the file for the module is ready.
This explains why *multiple* NULLs are possible on test 0009.
3) finit_module() consumes quite a bit of memory.
4) Filesystems typically also have more dependent modules than other
modules, its important to note though that even though a get_fs_type()
call does not incur additional kmod_concurrent bumps, since userspace
loads dependencies it finds it needs via finit_module_fd(), it *will*
take much more memory to load a module with a lot of dependencies.
Because of 3) and 4) we will easily run into out of memory failures with
certain tests. For instance test 0006 fails on qemu with 1024 MiB of RAM.
It panics a box after reaping all userspace processes and still not
having enough memory to reap.
[arnd@arndb.de: add dependencies for test module]
Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis R. Rodriguez [Fri, 14 Jul 2017 21:50:05 +0000 (14:50 -0700)]
MAINTAINERS: give kmod some maintainer love
As suggested by Jessica, I've been actively working on kmod, so might as
well reflect its maintained status.
Changes are expected to go through akpm's tree.
Link: http://lkml.kernel.org/r/20170628223155.26472-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Suggested-by: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobias Klauser [Fri, 14 Jul 2017 21:50:03 +0000 (14:50 -0700)]
xtensa: use generic fb.h
The arch uses a verbatim copy of the asm-generic version and does not
add any own implementations to the header, so use asm-generic/fb.h
instead of duplicating code.
Link: http://lkml.kernel.org/r/20170517083545.2115-1-tklauser@distanz.ch
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:50:00 +0000 (14:50 -0700)]
fault-inject: add /proc/<pid>/fail-nth
fail-nth interface is only created in /proc/self/task/<current-tid>/.
This change also adds it in /proc/<pid>/.
This makes shell based tool a bit simpler.
$ bash -c "builtin echo 100 > /proc/self/fail-nth && exec ls /"
Link: http://lkml.kernel.org/r/1491490561-10485-6-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:57 +0000 (14:49 -0700)]
fault-inject: simplify access check for fail-nth
The fail-nth file is created with 0666 and the access is permitted if
and only if the task is current.
This file is owned by the currnet user. So we can create it with 0644
and allow the owner to write it. This enables to watch the status of
task->fail_nth from another processes.
[akinobu.mita@gmail.com: don't convert unsigned type value as signed int]
Link: http://lkml.kernel.org/r/1492444483-9239-1-git-send-email-akinobu.mita@gmail.com
[akinobu.mita@gmail.com: avoid unwanted data race to task->fail_nth]
Link: http://lkml.kernel.org/r/1499962492-8931-1-git-send-email-akinobu.mita@gmail.com
Link: http://lkml.kernel.org/r/1491490561-10485-5-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:54 +0000 (14:49 -0700)]
fault-inject: make fail-nth read/write interface symmetric
The read interface for fail-nth looks a bit odd. Read from this file
returns "NYYYY..." or "YYYYY..." (this makes me surprise when cat this
file). Because there is no EOF condition. The first character
indicates current->fail_nth is zero or not, and then current->fail_nth
is reset to zero.
Just returning task->fail_nth value is more natural to understand.
Link: http://lkml.kernel.org/r/1491490561-10485-4-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:52 +0000 (14:49 -0700)]
fault-inject: parse as natural 1-based value for fail-nth write interface
The value written to fail-nth file is parsed as 0-based. Parsing as
one-based is more natural to understand and it enables to cancel the
previous setup by simply writing '0'.
This change also converts task->fail_nth from signed to unsigned int.
Link: http://lkml.kernel.org/r/1491490561-10485-3-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:49 +0000 (14:49 -0700)]
fault-inject: automatically detect the number base for fail-nth write interface
Automatically detect the number base to use when writing to fail-nth
file instead of always parsing as a decimal number.
Link: http://lkml.kernel.org/r/1491490561-10485-2-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kefeng Wang [Fri, 14 Jul 2017 21:49:46 +0000 (14:49 -0700)]
kernel/watchdog.c: use better pr_fmt prefix
After commit
73ce0511c436 ("kernel/watchdog.c: move hardlockup
detector to separate file"), 'NMI watchdog' is inappropriate in
kernel/watchdog.c, using 'watchdog' only.
Link: http://lkml.kernel.org/r/1499928642-48983-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis de Bethencourt [Fri, 14 Jul 2017 21:49:44 +0000 (14:49 -0700)]
MAINTAINERS: move the befs tree to kernel.org
Update the location of the befs git tree and my email address.
Link: http://lkml.kernel.org/r/20170709110012.2991-1-luisbg@kernel.org
Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Ellerman [Fri, 14 Jul 2017 21:49:41 +0000 (14:49 -0700)]
lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int
atomic64_inc_not_zero() returns a "truth value" which in C is
traditionally an int. That means callers are likely to expect the
result will fit in an int.
If an implementation returns a "true" value which does not fit in an
int, then there's a possibility that callers will truncate it when they
store it in an int.
In fact this happened in practice, see commit
966d2b04e070
("percpu-refcount: fix reference leak during percpu-atomic transition").
So add a test that the result fits in an int, even when the input
doesn't. This catches the case where an implementation just passes the
non-zero input value out as the result.
Link: http://lkml.kernel.org/r/1499775133-1231-1-git-send-email-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Helge Deller [Fri, 14 Jul 2017 21:49:38 +0000 (14:49 -0700)]
mm: fix overflow check in expand_upwards()
Jörn Engel noticed that the expand_upwards() function might not return
-ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and
if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE.
Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa
which all define TASK_SIZE as 0xffffffff, but since none of those have
an upwards-growing stack we currently have no actual issue.
Nevertheless let's fix this just in case any of the architectures with
an upward-growing stack (currently parisc, metag and partly ia64) define
TASK_SIZE similar.
Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box
Fixes:
bd726c90b6b8 ("Allow stack to grow up to address space limit")
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Jörn Engel <joern@purestorage.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Mon, 26 Jun 2017 11:49:04 +0000 (13:49 +0200)]
ubifs: Set double hash cookie also for RENAME_EXCHANGE
We developed RENAME_EXCHANGE and UBIFS_FLG_DOUBLE_HASH more or less in
parallel and this case was forgotten. :-(
Cc: stable@vger.kernel.org
Fixes:
d63d61c16972 ("ubifs: Implement UBIFS_FLG_DOUBLE_HASH")
Signed-off-by: Richard Weinberger <richard@nod.at>
Xiaolei Li [Fri, 23 Jun 2017 02:37:23 +0000 (10:37 +0800)]
ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs
The inode is not locked in init_xattrs when creating a new inode.
Without this patch, there will occurs assert when booting or creating
a new file, if the kernel config CONFIG_SECURITY_SMACK is enabled.
Log likes:
UBIFS assert failed in ubifs_xattr_set at 298 (pid 1156)
CPU: 1 PID: 1156 Comm: ldconfig Tainted: G S
4.12.0-rc1-207440-g1e70b02 #2
Hardware name: MediaTek MT2712 evaluation board (DT)
Call trace:
[<
ffff000008088538>] dump_backtrace+0x0/0x238
[<
ffff000008088834>] show_stack+0x14/0x20
[<
ffff0000083d98d4>] dump_stack+0x9c/0xc0
[<
ffff00000835d524>] ubifs_xattr_set+0x374/0x5e0
[<
ffff00000835d7ec>] init_xattrs+0x5c/0xb8
[<
ffff000008385788>] security_inode_init_security+0x110/0x190
[<
ffff00000835e058>] ubifs_init_security+0x30/0x68
[<
ffff00000833ada0>] ubifs_mkdir+0x100/0x200
[<
ffff00000820669c>] vfs_mkdir+0x11c/0x1b8
[<
ffff00000820b73c>] SyS_mkdirat+0x74/0xd0
[<
ffff000008082f8c>] __sys_trace_return+0x0/0x4
Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Fri, 16 Jun 2017 14:21:44 +0000 (16:21 +0200)]
ubifs: Don't leak kernel memory to the MTD
When UBIFS prepares data structures which will be written to the MTD it
ensues that their lengths are multiple of 8. Since it uses kmalloc() the
padded bytes are left uninitialized and we leak a few bytes of kernel
memory to the MTD.
To make sure that all bytes are initialized, let's switch to kzalloc().
Kzalloc() is fine in this case because the buffers are not huge and in
the IO path the performance bottleneck is anyway the MTD.
Cc: stable@vger.kernel.org
Fixes:
1e51764a3c2a ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Hyunchul Lee [Wed, 14 Jun 2017 00:31:49 +0000 (09:31 +0900)]
ubifs: Change gfp flags in page allocation for bulk read
In low memory situations, page allocations for bulk read
can kill applications for reclaiming memory, and print an
failure message when allocations are failed.
Because bulk read is just an optimization, we don't have
to do these and can stop page allocations.
Though this siutation happens rarely, add __GFP_NORETRY
to prevent from excessive memory reclaim and killing
applications, and __GFP_WARN to suppress this failure
message.
For this, Use readahead_gfp_mask for gfp flags when
allocating pages.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
karam.lee [Mon, 12 Jun 2017 01:46:31 +0000 (10:46 +0900)]
ubifs: Fix oops when remounting with no_bulk_read.
When remounting with the no_bulk_read option,
there is a problem accessing the "bulk_read buffer(bu.buf)"
which has already been freed.
If the bulk_read option is enabled,
ubifs_tnc_bulk_read uses the pre-allocated bu.buf.
While bu.buf is being used by ubifs_tnc_bulk_read,
remounting with no_bulk_read frees bu.buf.
So I added code to check the use of "bu.buf" to avoid this situation.
------
I tested as follows(kernel v3.18) :
Use the script to repeat "no_bulk_read <-> bulk_read"
remount.sh
#!/bin/sh
while true do;
mount -o remount,no_bulk_read ${MOUNT_POINT};
sleep 1;
mount -o remount,bulk_read ${MOUNT_POINT};
sleep 1;
done
Perform read operation
cat ${MOUNT_POINT}/* > /dev/null
The problem is reproduced immediately.
[ 234.256845][kernel.0]Internal error: Oops: 17 [#1] PREEMPT ARM
[ 234.258557][kernel.0]CPU: 0 PID: 2752 Comm: cat Tainted: G W O 3.18.31+ #51
[ 234.259531][kernel.0]task:
cbff8580 ti:
cbd66000 task.ti:
cbd66000
[ 234.260306][kernel.0]PC is at validate_data_node+0x10/0x264
[ 234.260994][kernel.0]LR is at ubifs_tnc_bulk_read+0x388/0x3ec
[ 234.261712][kernel.0]pc : [<
c01d98fc>] lr : [<
c01dc300>] psr:
80000013
[ 234.261712][kernel.0]sp :
cbd67ba0 ip :
00000001 fp :
00000000
[ 234.263337][kernel.0]r10:
cd3e0260 r9 :
c0df2008 r8 :
00000000
[ 234.264087][kernel.0]r7 :
cd3e0000 r6 :
00000000 r5 :
cd3e0278 r4 :
cd3e0000
[ 234.264999][kernel.0]r3 :
00000003 r2 :
cd3e0280 r1 :
00000000 r0 :
cd3e0000
[ 234.265910][kernel.0]Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 234.266896][kernel.0]Control:
10c53c7d Table:
8c40c059 DAC:
00000015
[ 234.267711][kernel.0]Process cat (pid: 2752, stack limit = 0xcbd66400)
[ 234.268525][kernel.0]Stack: (0xcbd67ba0 to 0xcbd68000)
[ 234.269169][kernel.0]7ba0:
cd7c3940 c03d8650 0001bfe0 00002ab2 00000000 cbd67c5c cbd67c58 0001bfe0
[ 234.270287][kernel.0]7bc0:
cd3e0000 00002ab2 0001bfe0 00000014 cbd66000 cd3e0260 00000000 c01d6660
[ 234.271403][kernel.0]7be0:
00002ab2 00000000 c82a5800 ffffffff cd3e0298 cd3e0278 00000000 cd3e0000
[ 234.272520][kernel.0]7c00:
00000000 00000000 cd3e0260 c01dc300 00002ab2 00000000 60000013 d663affa
[ 234.273639][kernel.0]7c20:
cd3e01f0 cd3e01f0 60000013 c09397ec 00000000 cd3e0278 00002ab2 00000000
[ 234.274755][kernel.0]7c40:
cd3e0000 c01dbf48 00000014 00000003 00000160 00000015 00000004 d663affa
[ 234.275874][kernel.0]7c60:
ccdaa978 cd3e0278 cd3e0000 cf32a5f4 ccdaa820 00000044 cbd66000 cd3e0260
[ 234.276992][kernel.0]7c80:
00000003 c01cec84 ccdaa8dc cbd67cc4 cbd67ec0 00000010 ccdaa978 00000000
[ 234.278108][kernel.0]7ca0:
0000015e ccdaa8dc 00000000 00000000 cf32a5d0 00000000 0000015f ccdaa8dc
[ 234.279228][kernel.0]7cc0:
00000000 c8488300 0009e5a4 0000000e cbd66000 0000015e cf32a5f4 c0113c04
[ 234.280346][kernel.0]7ce0:
0000009f 0000003c c00098c4 ffffffff 00001000 00000000 000000ad 00000010
[ 234.281463][kernel.0]7d00:
00000038 cd68f580 00000150 c8488360 00000000 cbd67d30 cbd67d70 0000000e
[ 234.282579][kernel.0]7d20:
00000010 00000000 c0951874 c0112a9c cf379b60 cf379b84 cf379890 cf3798b4
[ 234.283699][kernel.0]7d40:
cf379578 cf37959c cf379380 cf3793a4 cf3790b0 cf3790d4 cf378fd8 cf378ffc
[ 234.284814][kernel.0]7d60:
cf378f48 cf378f6c cf32a5f4 cf32a5d0 00000000 00001000 00000018 00000000
[ 234.285932][kernel.0]7d80:
00001000 c0050da4 00000000 00001000 cec04c00 00000000 00001000 c0e11328
[ 234.287049][kernel.0]7da0:
00000000 00001000 cbd66000 00000000 00001000 c0012a60 00000000 00001000
[ 234.288166][kernel.0]7dc0:
cbd67dd4 00000000 00001000 80000013 00000000 00001000 cd68f580 00000000
[ 234.289285][kernel.0]7de0:
00001000 c915d600 00000000 00001000 cbd67e48 00000000 00001000 00000018
[ 234.290402][kernel.0]7e00:
00000000 00001000 00000000 00000000 00001000 c915d768 c915d768 c0113550
[ 234.291522][kernel.0]7e20:
cd68f580 cbd67e48 cd68f580 cb6713c0 00010000 000ac5a4 00000000 001fc5a4
[ 234.292637][kernel.0]7e40:
00000000 c8488300 cbd67ec0 00eb0000 cd68f580 c0113ee4 00000000 cbd67ec0
[ 234.293754][kernel.0]7e60:
cd68f580 c8488300 cbd67ec0 00eb0000 cd68f580 00150000 c8488300 00eb0000
[ 234.294874][kernel.0]7e80:
00010000 c0112fd0 00000000 cbd67ec0 cd68f580 00150000 00000000 cd68f580
[ 234.295991][kernel.0]7ea0:
cbd67ef0 c011308c 00000000 00000002 cd768850 00010000 00000000 c01133fc
[ 234.297110][kernel.0]7ec0:
00150000 00000000 cbd67f50 00000000 00000000 cb6713c0 01000000 cbd67f48
[ 234.298226][kernel.0]7ee0:
cbd67f50 c8488300 00000000 c0113204 00010000 01000000 00000000 cb6713c0
[ 234.299342][kernel.0]7f00:
00150000 00000000 cbd67f50 00000000 00000000 00000000 00000000 00000000
[ 234.300462][kernel.0]7f20:
cbd67f50 01000000 01000000 cb6713c0 c8488300 c00ebba8 01000000 00000000
[ 234.301577][kernel.0]7f40:
c8488300 cb6713c0 00000000 00000000 00000000 00000000 ccdaa820 00000000
[ 234.302697][kernel.0]7f60:
00000000 01000000 00000003 00000001 cbd66000 00000000 00000001 c00ec678
[ 234.303813][kernel.0]7f80:
00000000 00000200 00000000 01000000 01000000 00000000 00000000 000000ef
[ 234.304933][kernel.0]7fa0:
c000e904 c000e780 01000000 00000000 00000001 00000003 00000000 01000000
[ 234.306049][kernel.0]7fc0:
01000000 00000000 00000000 000000ef 00000001 00000003 01000000 00000001
[ 234.307165][kernel.0]7fe0:
00000000 beafb78c 0000ad08 00128d1c 60000010 00000001 00000000 00000000
[ 234.308292][kernel.0][<
c01d98fc>] (validate_data_node) from [<
c01dc300>] (ubifs_tnc_bulk_read+0x388/0x3ec)
[ 234.309493][kernel.0][<
c01dc300>] (ubifs_tnc_bulk_read) from [<
c01cec84>] (ubifs_readpage+0x1dc/0x46c)
[ 234.310656][kernel.0][<
c01cec84>] (ubifs_readpage) from [<
c0113c04>] (__generic_file_splice_read+0x29c/0x4cc)
[ 234.311890][kernel.0][<
c0113c04>] (__generic_file_splice_read) from [<
c0113ee4>] (generic_file_splice_read+0xb0/0xf4)
[ 234.313214][kernel.0][<
c0113ee4>] (generic_file_splice_read) from [<
c0112fd0>] (do_splice_to+0x68/0x7c)
[ 234.314386][kernel.0][<
c0112fd0>] (do_splice_to) from [<
c011308c>] (splice_direct_to_actor+0xa8/0x190)
[ 234.315544][kernel.0][<
c011308c>] (splice_direct_to_actor) from [<
c0113204>] (do_splice_direct+0x90/0xb8)
[ 234.316741][kernel.0][<
c0113204>] (do_splice_direct) from [<
c00ebba8>] (do_sendfile+0x17c/0x2b8)
[ 234.317838][kernel.0][<
c00ebba8>] (do_sendfile) from [<
c00ec678>] (SyS_sendfile64+0xc4/0xcc)
[ 234.318890][kernel.0][<
c00ec678>] (SyS_sendfile64) from [<
c000e780>] (ret_fast_syscall+0x0/0x38)
[ 234.319983][kernel.0]Code:
e92d47f0 e24dd050 e59f9228 e1a04000 (
e5d18014)
Signed-off-by: karam.lee <karam.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Wed, 7 Jun 2017 21:33:35 +0000 (23:33 +0200)]
ubifs: Fail commit if TNC is obviously inconsistent
A reference to LEB 0 or with length 0 in the TNC
is never correct and could be caused by a memory corruption.
Don't write such a bad index node to the MTD.
Instead fail the commit which will turn UBIFS into read-only mode.
This is less painful than having the bad reference on the MTD
from where UBFIS has no chance to recover.
Signed-off-by: Richard Weinberger <richard@nod.at>
Rabin Vincent [Wed, 31 May 2017 09:40:27 +0000 (11:40 +0200)]
ubifs: allow userspace to map mounts to volumes
There currently appears to be no way for userspace to find out the
underlying volume number for a mounted ubifs file system, since ubifs
uses anonymous block devices. The volume name is present in
/proc/mounts but UBI volumes can be renamed after the volume has been
mounted.
To remedy this, show the UBI number and UBI volume number as part of the
options visible under /proc/mounts.
Also, accept and ignore the ubi= vol= options if they are used mounting
(patch from Richard Weinberger).
# mount -t ubifs ubi:baz x
# mount
ubi:baz on /root/x type ubifs (rw,relatime,ubi=0,vol=2)
# ubirename /dev/ubi0 baz bazz
# mount
ubi:baz on /root/x type ubifs (rw,relatime,ubi=0,vol=2)
# ubinfo -d 0 -n 2
Volume ID: 2 (on ubi0)
Type: dynamic
Alignment: 1
Size: 67 LEBs (
1063424 bytes, 1.0 MiB)
State: OK
Name: bazz
Character device major/minor: 254:3
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Sat, 20 May 2017 22:16:26 +0000 (00:16 +0200)]
ubifs: Wire-up statx() support
statx() can report what flags a file has, expose flags that UBIFS
supports. Especially STATX_ATTR_COMPRESSED and STATX_ATTR_ENCRYPTED
can be interesting for userspace.
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Wed, 17 May 2017 08:36:49 +0000 (10:36 +0200)]
ubifs: Remove dead code from ubifs_get_link()
We check the length already, no need to check later
again for an empty string.
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Wed, 17 May 2017 08:36:48 +0000 (10:36 +0200)]
ubifs: Massage debug prints wrt. fscrypt
If file names are encrypted we can no longer print them.
That's why we have to change these prints or remove them completely.
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Wed, 17 May 2017 08:36:47 +0000 (10:36 +0200)]
ubifs: Add assert to dent_key_init()
...to make sure that we don't use it for double hashed lookups
instead of dent_key_init_hash().
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Wed, 17 May 2017 08:36:46 +0000 (10:36 +0200)]
ubifs: Fix unlink code wrt. double hash lookups
When removing an encrypted file with a long name and without having
the key we have to be able to locate and remove the directory entry
via a double hash. This corner case was simply forgotten.
Fixes:
528e3d178f25 ("ubifs: Add full hash lookup support")
Reported-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
David Oberhollenzer [Wed, 17 May 2017 08:36:45 +0000 (10:36 +0200)]
ubifs: Fix data node size for truncating uncompressed nodes
Currently, the function truncate_data_node only updates the
destination data node size if compression is used. For
uncompressed nodes, the old length is incorrectly retained.
This patch makes sure that the length is correctly set when
compression is disabled.
Fixes:
7799953b34d1 ("ubifs: Implement encrypt/decrypt for all IO")
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
David Gstir [Wed, 17 May 2017 11:36:16 +0000 (13:36 +0200)]
ubifs: Don't encrypt special files on creation
When a new inode is created, we check if the containing folder has a encryption
policy set and inherit that. This should however only be done for regular
files, links and subdirectories. Not for sockes fifos etc.
Fixes:
d475a507457b ("ubifs: Add skeleton for fscrypto")
Cc: stable@vger.kernel.org
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Hyunchul Lee [Tue, 16 May 2017 23:58:02 +0000 (08:58 +0900)]
ubifs: Fix memory leak in RENAME_WHITEOUT error path in do_rename
in RENAME_WHITEOUT error path, fscrypt_name should be freed.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Hyunchul Lee [Tue, 16 May 2017 23:57:18 +0000 (08:57 +0900)]
ubifs: Fix inode data budget in ubifs_mknod
Assign inode data budget to budget request correctly.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Tue, 16 May 2017 22:20:27 +0000 (00:20 +0200)]
ubifs: Correctly evict xattr inodes
UBIFS handles extended attributes just like files, as consequence of
that, they also have inodes.
Therefore UBIFS does all the inode machinery also for xattrs. Since new
inodes have i_nlink of 1, a file or xattr inode will be evicted
if i_nlink goes down to 0 after an unlink. UBIFS assumes this model also
for xattrs, which is not correct.
One can create a file "foo" with xattr "user.test". By reading
"user.test" an inode will be created, and by deleting "user.test" it
will get evicted later. The assumption breaks if the file "foo", which
hosts the xattrs, will be removed. VFS nor UBIFS does not remove each
xattr via ubifs_xattr_remove(), it just removes the host inode from
the TNC and all underlying xattr nodes too and the inode will remain
in the cache and wastes memory.
To solve this problem, remove xattr inodes from the VFS inode cache in
ubifs_xattr_remove() to make sure that they get evicted.
Fixes:
1e51764a3c2ac05a ("UBIFS: add new flash file system")
Cc: <stable@vger.kernel.org>
Signed-off-by: Richard Weinberger <richard@nod.at>