Nayna Jain [Mon, 30 Jan 2017 09:59:41 +0000 (04:59 -0500)]
tpm: enhance TPM 2.0 PCR extend to support multiple banks
The current TPM 2.0 device driver extends only the SHA1 PCR bank
but the TCG Specification[1] recommends extending all active PCR
banks, to prevent malicious users from setting unused PCR banks with
fake measurements and quoting them.
The existing in-kernel interface(tpm_pcr_extend()) expects only a
SHA1 digest. To extend all active PCR banks with differing
digest sizes, the SHA1 digest is padded with trailing 0's as needed.
This patch reuses the defined digest sizes from the crypto subsystem,
adding a dependency on CRYPTO_HASH_INFO module.
[1] TPM 2.0 Specification referred here is "TCG PC Client Specific
Platform Firmware Profile for TPM 2.0"
Signed-off-by: Nayna Jain <nayna@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Kenneth Goldman <kgold@linux.vnet.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Nayna Jain [Mon, 30 Jan 2017 09:59:40 +0000 (04:59 -0500)]
tpm: implement TPM 2.0 capability to get active PCR banks
This patch implements the TPM 2.0 capability TPM_CAP_PCRS to
retrieve the active PCR banks from the TPM. This is needed
to enable extending all active banks as recommended by TPM 2.0
TCG Specification.
Signed-off-by: Nayna Jain <nayna@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Kenneth Goldman <kgold@linux.vnet.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jarkko Sakkinen [Wed, 25 Jan 2017 21:00:22 +0000 (23:00 +0200)]
tpm: fix RC value check in tpm2_seal_trusted
The error code handling is broken as any error code that has the same
bits set as TPM_RC_HASH passes. Implemented tpm2_rc_value() helper to
parse the error value from FMT0 and FMT1 error codes so that these types
of mistakes are prevented in the future.
Fixes:
5ca4c20cfd37 ("keys, trusted: select hash algorithm for TPM2 chips")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Maciej S. Szmigiero [Wed, 25 Jan 2017 20:07:36 +0000 (22:07 +0200)]
tpm_tis: fix iTPM probe via probe_itpm() function
probe_itpm() function is supposed to send command without an itpm flag set
and if this fails to repeat it, this time with the itpm flag set.
However, commit
41a5e1cf1fe15 ("tpm/tpm_tis: Split tpm_tis driver into a
core and TCG TIS compliant phy") moved the itpm flag from an "itpm"
variable to a TPM_TIS_ITPM_POSSIBLE chip flag, so setting the
(now function-local) itpm variable no longer had any effect.
Finally, this function-local itpm variable was removed by
commit
56af322156dbe9 ("tpm/tpm_tis: remove unused itpm variable")
Tested only on non-iTPM TIS TPM.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jason Gunthorpe [Tue, 24 Jan 2017 00:06:08 +0000 (17:06 -0700)]
tpm: Begin the process to deprecate user_read_timer
For a long time the cdev read/write interface had this strange
idea that userspace had to read the result within 60 seconds otherwise
it is discarded. Perhaps this made sense under some older locking regime,
but in the modern kernel it is not required and is just dangerous.
Since something may be relying on this, double the timeout and print a
warning. We can remove the code in a few years, but this should be
enough to prevent new users.
Suggested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jarkko Sakkinen [Wed, 25 Jan 2017 14:47:39 +0000 (16:47 +0200)]
tpm: remove tpm_read_index and tpm_write_index from tpm.h
These are non-generic functions and do not belong to tpm.h.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Lans Zhang [Fri, 6 Jan 2017 04:38:11 +0000 (12:38 +0800)]
ima: allow to check MAY_APPEND
Otherwise some mask and inmask tokens with MAY_APPEND flag may not work
as expected.
Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Mimi Zohar [Tue, 17 Jan 2017 11:45:41 +0000 (06:45 -0500)]
ima: fix ima_d_path() possible race with rename
On failure to return a pathname from ima_d_path(), a pointer to
dname is returned, which is subsequently used in the IMA measurement
list, the IMA audit records, and other audit logging. Saving the
pointer to dname for later use has the potential to race with rename.
Intead of returning a pointer to dname on failure, this patch returns
a pointer to a copy of the filename.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
James Morris [Thu, 26 Jan 2017 22:23:21 +0000 (09:23 +1100)]
Merge branch 'smack-for-4.11' of git://github.com/cschaufler/smack-next into next
Stefan Berger [Thu, 19 Jan 2017 12:19:12 +0000 (07:19 -0500)]
tpm: Check size of response before accessing data
Make sure that we have not received less bytes than what is indicated
in the header of the TPM response. Also, check the number of bytes in
the response before accessing its data.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkine@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkine@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkine@linux.intel.com>
Maciej S. Szmigiero [Fri, 13 Jan 2017 21:37:00 +0000 (22:37 +0100)]
tpm_tis: use default timeout value if chip reports it as zero
Since commit
1107d065fdf1 ("tpm_tis: Introduce intermediate layer for
TPM access") Atmel 3203 TPM on ThinkPad X61S (TPM firmware version 13.9)
no longer works. The initialization proceeds fine until we get and
start using chip-reported timeouts - and the chip reports C and D
timeouts of zero.
It turns out that until commit
8e54caf407b98e ("tpm: Provide a generic
means to override the chip returned timeouts") we had actually let
default timeout values remain in this case, so let's bring back this
behavior to make chips like Atmel 3203 work again.
Use a common code that was introduced by that commit so a warning is
printed in this case and /sys/class/tpm/tpm*/timeouts correctly says the
timeouts aren't chip-original.
Fixes:
1107d065fdf1 ("tpm_tis: Introduce intermediate layer for TPM access")
Cc: stable@vger.kernel.org
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jason Gunthorpe [Mon, 21 Nov 2016 18:31:09 +0000 (11:31 -0700)]
tpm: Do not print an error message when doing TPM auto startup
This is a regression when this code was reworked and made the error
print unconditional. The original code deliberately suppressed printing
of the first error message so it could quietly sense
TPM_ERR_INVALID_POSTINIT.
Fixes:
a502feb67b47 ("tpm: Clean up reading of timeout and duration capabilities")
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jiandi An [Mon, 19 Dec 2016 04:20:53 +0000 (22:20 -0600)]
tpm, tpm_crb: Handle 64-bit resource in crb_check_resource()
crb_check_resource() in TPM CRB driver calls
acpi_dev_resource_memory() which only handles 32-bit resources.
Adding a call to acpi_dev_resource_address_space() in TPM CRB
driver which handles 64-bit resources.
Signed-off-by: Jiandi An <anjiandi@codeaurora.org>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Geliang Tang [Wed, 23 Nov 2016 15:18:53 +0000 (23:18 +0800)]
tpm/tpm_tis_spi: drop duplicate header module.h
Drop duplicate header module.h from tpm_tis_spi.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Corentin Labbe [Thu, 15 Dec 2016 17:10:17 +0000 (18:10 +0100)]
tpm/st33zp24: Remove unneeded linux/miscdevice.h include
tpm/st33zp24/st33zp24.c does not use any miscdevice so this patch remove
this unnecessary inclusion.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Winkler, Tomas [Wed, 23 Nov 2016 10:04:14 +0000 (12:04 +0200)]
tpm/vtpm: fix kdoc warnings
Use corret kdoc format for function description and eliminate warning
of type:
tpm_ibmvtpm.c:66: warning: No description found for parameter 'count'
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Winkler, Tomas [Wed, 23 Nov 2016 10:04:13 +0000 (12:04 +0200)]
tmp: use pdev for parent device in tpm_chip_alloc
The tpm stack uses pdev name convention for the parent device.
Fix that also in tpm_chip_alloc().
Fixes:
3897cd9c8d1d ("tpm: Split out the devm stuff from tpmm_chip_alloc")'
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Winkler, Tomas [Wed, 23 Nov 2016 10:04:12 +0000 (12:04 +0200)]
tpm/tpm2-chip: fix kdoc errors
Use correct kdoc format, describe correct parameters and return values.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Winkler, Tomas [Wed, 23 Nov 2016 10:04:11 +0000 (12:04 +0200)]
tpm: add kdoc for tpm_transmit and tpm_transmit_cmd
Functions tpm_transmit and transmit_cmd are referenced
from other functions kdoc hence deserve documentation.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Mike Frysinger [Fri, 20 Jan 2017 04:28:57 +0000 (22:28 -0600)]
seccomp: dump core when using SECCOMP_RET_KILL
The SECCOMP_RET_KILL mode is documented as immediately killing the
process as if a SIGSYS had been sent and not caught (similar to a
SIGKILL). However, a SIGSYS is documented as triggering a coredump
which does not happen today.
This has the advantage of being able to more easily debug a process
that fails a seccomp filter. Today, most apps need to recompile and
change their filter in order to get detailed info out, or manually run
things through strace, or enable detailed kernel auditing. Now we get
coredumps that fit into existing system-wide crash reporting setups.
From a security pov, this shouldn't be a problem. Unhandled signals
can already be sent externally which trigger a coredump independent of
the status of the seccomp filter. The act of dumping core itself does
not cause change in execution of the program.
URL: https://crbug.com/676357
Signed-off-by: Mike Frysinger <vapier@chromium.org>
Acked-by: Jorge Lucangeli Obes <jorgelo@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Casey Schaufler [Thu, 19 Jan 2017 01:09:05 +0000 (17:09 -0800)]
LSM: Add /sys/kernel/security/lsm
I am still tired of having to find indirect ways to determine
what security modules are active on a system. I have added
/sys/kernel/security/lsm, which contains a comma separated
list of the active security modules. No more groping around
in /proc/filesystems or other clever hacks.
Unchanged from previous versions except for being updated
to the latest security next branch.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
John Johansen [Mon, 16 Jan 2017 21:21:27 +0000 (13:21 -0800)]
apparmor: fix undefined reference to `aa_g_hash_policy'
The kernel build bot turned up a bad config combination when
CONFIG_SECURITY_APPARMOR is y and CONFIG_SECURITY_APPARMOR_HASH is n,
resulting in the build error
security/built-in.o: In function `aa_unpack':
(.text+0x841e2): undefined reference to `aa_g_hash_policy'
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:15 +0000 (00:43 -0800)]
apparmor: replace remaining BUG_ON() asserts with AA_BUG()
AA_BUG() uses WARN and won't break the kernel like BUG_ON().
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:14 +0000 (00:43 -0800)]
apparmor: fix restricted endian type warnings for policy unpack
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:13 +0000 (00:43 -0800)]
apparmor: fix restricted endian type warnings for dfa unpack
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:11 +0000 (00:43 -0800)]
apparmor: add check for apparmor enabled in module parameters missing it
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:10 +0000 (00:43 -0800)]
apparmor: add per cpu work buffers to avoid allocating buffers at every hook
Signed-off-by: John Johansen <john.johansen@canonical.com>
Tyler Hicks [Thu, 17 Mar 2016 00:19:10 +0000 (19:19 -0500)]
apparmor: sysctl to enable unprivileged user ns AppArmor policy loading
If this sysctl is set to non-zero and a process with CAP_MAC_ADMIN in
the root namespace has created an AppArmor policy namespace,
unprivileged processes will be able to change to a profile in the
newly created AppArmor policy namespace and, if the profile allows
CAP_MAC_ADMIN and appropriate file permissions, will be able to load
policy in the respective policy namespace.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
William Hua [Mon, 16 Jan 2017 00:49:28 +0000 (16:49 -0800)]
apparmor: support querying extended trusted helper extra data
Allow a profile to carry extra data that can be queried via userspace.
This provides a means to store extra data in a profile that a trusted
helper can extract and use from live policy.
Signed-off-by: William Hua <william.hua@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:08 +0000 (00:43 -0800)]
apparmor: update cap audit to check SECURITY_CAP_NOAUDIT
apparmor should be checking the SECURITY_CAP_NOAUDIT constant. Also
in complain mode make it so apparmor can elect to log a message,
informing of the check.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:07 +0000 (00:43 -0800)]
apparmor: make computing policy hashes conditional on kernel parameter
Allow turning off the computation of the policy hashes via the
apparmor.hash_policy kernel parameter.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:06 +0000 (00:43 -0800)]
apparmor: convert change_profile to use fqname later to give better control
Moving the use of fqname to later allows learning profiles to be based
on the fqname request instead of just the hname. It also allows cleaning
up some of the name parsing and lookup by allowing the use of
the fqlookupn_profile() lib fn.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:05 +0000 (00:43 -0800)]
apparmor: fix change_hat debug output
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:03 +0000 (00:43 -0800)]
apparmor: remove unused op parameter from simple_write_to_buffer()
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:02 +0000 (00:43 -0800)]
apparmor: change aad apparmor_audit_data macro to a fn macro
The aad macro can replace aad strings when it is not intended to. Switch
to a fn macro so it is only applied when intended.
Also at the same time cleanup audit_data initialization by putting
common boiler plate behind a macro, and dropping the gfp_t parameter
which will become useless.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:01 +0000 (00:43 -0800)]
apparmor: change op from int to const char *
Having ops be an integer that is an index into an op name table is
awkward and brittle. Every op change requires an edit for both the
op constant and a string in the table. Instead switch to using const
strings directly, eliminating the need for the table that needs to
be kept in sync.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:43:00 +0000 (00:43 -0800)]
apparmor: rename context abreviation cxt to the more standard ctx
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:59 +0000 (00:42 -0800)]
apparmor: fail task profile update if current_cred isn't real_cred
Trying to update the task cred while the task current cred is not the
real cred will result in an error at the cred layer. Avoid this by
failing early and delaying the update.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:58 +0000 (00:42 -0800)]
apparmor: add per policy ns .load, .replace, .remove interface files
Having per policy ns interface files helps with containers restoring
policy.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:57 +0000 (00:42 -0800)]
apparmor: pass the subject profile into profile replace/remove
This is just setup for new ns specific .load, .replace, .remove interface
files.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:56 +0000 (00:42 -0800)]
apparmor: audit policy ns specified in policy load
Verify that profiles in a load set specify the same policy ns and
audit the name of the policy ns that policy is being loaded for.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:55 +0000 (00:42 -0800)]
apparmor: allow introspecting the loaded policy pre internal transform
Store loaded policy and allow introspecting it through apparmorfs. This
has several uses from debugging, policy validation, and policy checkpoint
and restore for containers.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:54 +0000 (00:42 -0800)]
apparmor: add ns name to the audit data for policy loads
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:52 +0000 (00:42 -0800)]
apparmor: add profile and ns params to aa_may_manage_policy()
Policy management will be expanded beyond traditional unconfined root.
This will require knowning the profile of the task doing the management
and the ns view.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:51 +0000 (00:42 -0800)]
apparmor: add ns being viewed as a param to policy_admin_capable()
Prepare for a tighter pairing of user namespaces and apparmor policy
namespaces, by making the ns to be viewed available.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:50 +0000 (00:42 -0800)]
apparmor: add ns being viewed as a param to policy_view_capable()
Prepare for a tighter pairing of user namespaces and apparmor policy
namespaces, by making the ns to be viewed available and checking
that the user namespace level is the same as the policy ns level.
This strict pairing will be relaxed once true support of user namespaces
lands.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:49 +0000 (00:42 -0800)]
apparmor: allow specifying the profile doing the management
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:48 +0000 (00:42 -0800)]
apparmor: allow introspecting the policy namespace name
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:47 +0000 (00:42 -0800)]
apparmor: Make aa_remove_profile() callable from a different view
This is prep work for fs operations being able to remove namespaces.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:46 +0000 (00:42 -0800)]
apparmor: track ns level so it can be used to help in view checks
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:45 +0000 (00:42 -0800)]
apparmor: add special .null file used to "close" fds at exec
Borrow the special null device file from selinux to "close" fds that
don't have sufficient permissions at exec time.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:43 +0000 (00:42 -0800)]
apparmor: provide userspace flag indicating binfmt_elf_mmap change
Commit
9f834ec18def ("binfmt_elf: switch to new creds when switching to new mm")
changed when the creds are installed by the binfmt_elf handler. This
affects which creds are used to mmap the executable into the address
space. Which can have an affect on apparmor policy.
Add a flag to apparmor at
/sys/kernel/security/apparmor/features/domain/fix_binfmt_elf_mmap
to make it possible to detect this semantic change so that the userspace
tools and the regression test suite can correctly deal with the change.
BugLink: http://bugs.launchpad.net/bugs/1630069
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:42 +0000 (00:42 -0800)]
apparmor: add a default null dfa
Instead of testing whether a given dfa exists in every code path, have
a default null dfa that is used when loaded policy doesn't provide a
dfa.
This will let us get rid of special casing and avoid dereference bugs
when special casing is missed.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:41 +0000 (00:42 -0800)]
apparmor: allow policydb to be used as the file dfa
Newer policy will combine the file and policydb dfas, allowing for
better optimizations. However to support older policy we need to
keep the ability to address the "file" dfa separately. So dup
the policydb as if it is the file dfa and set the appropriate start
state.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:40 +0000 (00:42 -0800)]
apparmor: add get_dfa() fn
The dfa is currently setup to be shared (has the basis of refcounting)
but currently can't be because the count can't be increased.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:39 +0000 (00:42 -0800)]
apparmor: prepare to support newer versions of policy
Newer policy encodes more than just version in the version tag,
so add masking to make sure the comparison remains correct.
Note: this is fully compatible with older policy as it will never set
the bits being masked out.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:38 +0000 (00:42 -0800)]
apparmor: add support for force complain flag to support learning mode
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:37 +0000 (00:42 -0800)]
apparmor: remove paranoid load switch
Policy should always under go a full paranoid verification.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:36 +0000 (00:42 -0800)]
apparmor: name null-XXX profiles after the executable
When possible its better to name a learning profile after the missing
profile in question. This allows for both more informative names and
for profile reuse.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:35 +0000 (00:42 -0800)]
apparmor: pass gfp_t parameter into profile allocation
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:34 +0000 (00:42 -0800)]
apparmor: refactor prepare_ns() and make usable from different views
prepare_ns() will need to be called from alternate views, and namespaces
will need to be created via different interfaces. So refactor and
allow specifying the view ns.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:32 +0000 (00:42 -0800)]
apparmor: update policy_destroy to use new debug asserts
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:31 +0000 (00:42 -0800)]
apparmor: pass gfp param into aa_policy_init()
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:30 +0000 (00:42 -0800)]
apparmor: constify policy name and hname
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:29 +0000 (00:42 -0800)]
apparmor: rename hname_tail to basename
Rename to the shorter and more familiar shell cmd name
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:28 +0000 (00:42 -0800)]
apparmor: rename mediated_filesystem() to path_mediated_fs()
Rename to indicate the test is only about whether path mediation is used,
not whether other types of mediation might be used.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:27 +0000 (00:42 -0800)]
apparmor: add debug assert AA_BUG and Kconfig to control debug info
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:26 +0000 (00:42 -0800)]
apparmor: add macro for bug asserts to check that a lock is held
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:25 +0000 (00:42 -0800)]
apparmor: allow ns visibility question to consider subnses
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:24 +0000 (00:42 -0800)]
apparmor: add fn to lookup profiles by fqname
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:23 +0000 (00:42 -0800)]
apparmor: add lib fn to find the "split" for fqnames
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:22 +0000 (00:42 -0800)]
apparmor: add strn version of aa_find_ns
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:21 +0000 (00:42 -0800)]
apparmor: add strn version of lookup_profile fn
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:19 +0000 (00:42 -0800)]
apparmor: rename replacedby to proxy
Proxy is shorter and a better fit than replaceby, so rename it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:18 +0000 (00:42 -0800)]
apparmor: rename PFLAG_INVALID to PFLAG_STALE
Invalid does not convey the meaning of the flag anymore so rename it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:17 +0000 (00:42 -0800)]
apparmor: rename sid to secid
Move to common terminology with other LSMs and kernel infrastucture
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:16 +0000 (00:42 -0800)]
apparmor: rename namespace to ns to improve code line lengths
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:15 +0000 (00:42 -0800)]
apparmor: split apparmor policy namespaces code into its own file
Policy namespaces will be diverging from profile management and
expanding so put it in its own file.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:14 +0000 (00:42 -0800)]
apparmor: split out shared policy_XXX fns to lib
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 16 Jan 2017 08:42:13 +0000 (00:42 -0800)]
apparmor: move lib definitions into separate lib include
Signed-off-by: John Johansen <john.johansen@canonical.com>
Kees Cook [Sat, 17 Dec 2016 01:04:13 +0000 (17:04 -0800)]
apparmor: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Tetsuo Handa [Mon, 14 Nov 2016 11:11:52 +0000 (20:11 +0900)]
AppArmor: Use GFP_KERNEL for __aa_kvmalloc().
Calling kmalloc(GFP_NOIO) with order == PAGE_ALLOC_COSTLY_ORDER is not
recommended because it might fall into infinite retry loop without
invoking the OOM killer.
Since aa_dfa_unpack() is the only caller of kvzalloc() and
aa_dfa_unpack() which is calling kvzalloc() via unpack_table() is
doing kzalloc(GFP_KERNEL), it is safe to use GFP_KERNEL from
__aa_kvmalloc().
Since aa_simple_write_to_buffer() is the only caller of kvmalloc()
and aa_simple_write_to_buffer() is calling copy_from_user() which
is GFP_KERNEL context (see memdup_user_nul()), it is safe to use
GFP_KERNEL from __aa_kvmalloc().
Therefore, replace GFP_NOIO with GFP_KERNEL. Also, since we have
vmalloc() fallback, add __GFP_NORETRY so that we don't invoke the OOM
killer by kmalloc(GFP_KERNEL) with order == PAGE_ALLOC_COSTLY_ORDER.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Seung-Woo Kim [Mon, 12 Dec 2016 08:35:26 +0000 (17:35 +0900)]
Smack: ignore private inode for file functions
The access to fd from anon_inode is always failed because there is
no set xattr operations. So this patch fixes to ignore private
inode including anon_inode for file functions.
It was only ignored for smack_file_receive() to share dma-buf fd,
but dma-buf has other functions like ioctl and mmap.
Reference: https://lkml.org/lkml/2015/4/17/16
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Rafal Krypa [Fri, 9 Dec 2016 13:03:04 +0000 (14:03 +0100)]
Smack: fix d_instantiate logic for sockfs and pipefs
Since
4b936885a (v2.6.32) all inodes on sockfs and pipefs are disconnected.
It caused filesystem specific code in smack_d_instantiate to be skipped,
because all inodes on those pseudo filesystems were treated as root inodes.
As a result all sockfs inodes had the Smack label set to floor.
In most cases access checks for sockets use socket_smack data so the inode
label is not important. But there are special cases that were broken.
One example would be calling fcntl with F_SETOWN command on a socket fd.
Now smack_d_instantiate expects all pipefs and sockfs inodes to be
disconnected and has the logic in appropriate place.
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Himanshu Shukla [Wed, 23 Nov 2016 06:29:45 +0000 (11:59 +0530)]
SMACK: Use smk_tskacc() instead of smk_access() for proper logging
smack_file_open() is first checking the capability of calling subject,
this check will skip the SMACK logging for success case. Use smk_tskacc()
for proper logging and SMACK access check.
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Vishal Goel [Wed, 23 Nov 2016 05:15:31 +0000 (10:45 +0530)]
Smack: Traverse the smack_known_list using list_for_each_entry_rcu macro
In smack_from_secattr function,"smack_known_list" is being traversed
using list_for_each_entry macro, although it is a rcu protected
structure. So it should be traversed using "list_for_each_entry_rcu"
macro to fetch the rcu protected entry.
Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Himanshu Shukla [Wed, 23 Nov 2016 06:29:19 +0000 (11:59 +0530)]
SMACK: Free the i_security blob in inode using RCU
There is race condition issue while freeing the i_security blob in SMACK
module. There is existing condition where i_security can be freed while
inode_permission is called from path lookup on second CPU. There has been
observed the page fault with such condition. VFS code and Selinux module
takes care of this condition by freeing the inode and i_security field
using RCU via call_rcu(). But in SMACK directly the i_secuirty blob is
being freed. Use call_rcu() to fix this race condition issue.
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Himanshu Shukla [Wed, 23 Nov 2016 06:28:48 +0000 (11:58 +0530)]
SMACK: Delete list_head repeated initialization
smk_copy_rules() and smk_copy_relabel() are initializing list_head though
they have been initialized already in new_task_smack() function. Delete
repeated initialization.
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Vishal Goel [Wed, 23 Nov 2016 05:16:57 +0000 (10:46 +0530)]
SMACK: Add new lock for adding entry in smack master list
"smk_set_access()" function adds a new rule entry in subject label specific
list(rule_list) and in global rule list(smack_rule_list) both. Mutex lock
(rule_lock) is used to avoid simultaneous updates. But this lock is subject
label specific lock. If 2 processes tries to add different rules(i.e with
different subject labels) simultaneously, then both the processes can take
the "rule_lock" respectively. So it will cause a problem while adding
entries in master rule list.
Now a new mutex lock(smack_master_list_lock) has been taken to add entry in
smack_rule_list to avoid simultaneous updates of different rules.
Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Vishal Goel [Wed, 23 Nov 2016 05:02:54 +0000 (10:32 +0530)]
Smack: Fix the issue of wrong SMACK label update in socket bind fail case
Fix the issue of wrong SMACK label (SMACK64IPIN) update when a second bind
call is made to same IP address & port, but with different SMACK label
(SMACK64IPIN) by second instance of server. In this case server returns
with "Bind:Address already in use" error but before returning, SMACK label
is updated in SMACK port-label mapping list inside smack_socket_bind() hook
To fix this issue a new check has been added in smk_ipv6_port_label()
function before updating the existing port entry. It checks whether the
socket for matching port entry is closed or not. If it is closed then it
means port is not bound and it is safe to update the existing port entry
else return if port is still getting used. For checking whether socket is
closed or not, one more field "smk_can_reuse" has been added in the
"smk_port_label" structure. This field will be set to '1' in
"smack_sk_free_security()" function which is called to free the socket
security blob when the socket is being closed. In this function, port entry
is searched in the SMACK port-label mapping list for the closing socket.
If entry is found then "smk_can_reuse" field is set to '1'.Initially
"smk_can_reuse" field is set to '0' in smk_ipv6_port_label() function after
creating a new entry in the list which indicates that socket is in use.
Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Vishal Goel [Wed, 23 Nov 2016 05:01:59 +0000 (10:31 +0530)]
Smack: Fix the issue of permission denied error in ipv6 hook
Permission denied error comes when 2 IPv6 servers are running and client
tries to connect one of them. Scenario is that both servers are using same
IP and port but different protocols(Udp and tcp). They are using different
SMACK64IPIN labels.Tcp server is using "test" and udp server is using
"test-in". When we try to run tcp client with SMACK64IPOUT label as "test",
then connection denied error comes. It should not happen since both tcp
server and client labels are same.This happens because there is no check
for protocol in smk_ipv6_port_label() function while searching for the
earlier port entry. It checks whether there is an existing port entry on
the basis of port only. So it updates the earlier port entry in the list.
Due to which smack label gets changed for earlier entry in the
"smk_ipv6_port_list" list and permission denied error comes.
Now a check is added for socket type also.Now if 2 processes use same
port but different protocols (tcp or udp), then 2 different port entries
will be added in the list. Similarly while checking smack access in
smk_ipv6_port_check() function, port entry is searched on the basis of
both port and protocol.
Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Himanshu Shukla <Himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Vishal Goel [Wed, 23 Nov 2016 05:01:08 +0000 (10:31 +0530)]
SMACK: Add the rcu synchronization mechanism in ipv6 hooks
Add the rcu synchronization mechanism for accessing smk_ipv6_port_list
in smack IPv6 hooks. Access to the port list is vulnerable to a race
condition issue,it does not apply proper synchronization methods while
working on critical section. It is possible that when one thread is
reading the list, at the same time another thread is modifying the
same port list, which can cause the major problems.
To ensure proper synchronization between two threads, rcu mechanism
has been applied while accessing and modifying the port list. RCU will
also not affect the performance, as there are more accesses than
modification where RCU is most effective synchronization mechanism.
Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Mickaël Salaün [Wed, 21 Dec 2016 23:32:25 +0000 (00:32 +0100)]
security: Fix inode_getattr documentation
Replace arguments @mnt and @dentry with @path.
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Mathias Svensson [Fri, 6 Jan 2017 21:32:39 +0000 (13:32 -0800)]
samples/seccomp: fix 64-bit comparison macros
There were some bugs in the JNE64 and JLT64 comparision macros. This fixes
them, improves comments, and cleans up the file while we are at it.
Reported-by: Stephen Röttger <sroettger@google.com>
Signed-off-by: Mathias Svensson <idolf@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
Linus Torvalds [Sun, 8 Jan 2017 22:18:17 +0000 (14:18 -0800)]
Linux 4.10-rc3
Linus Torvalds [Sun, 8 Jan 2017 19:42:04 +0000 (11:42 -0800)]
Merge tag 'usb-4.10-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a bunch of USB fixes for 4.10-rc3. Yeah, it's a lot, an
artifact of the holiday break I think.
Lots of gadget and the usual XHCI fixups for reported issues (one day
that driver will calm down...) Also included are a bunch of usb-serial
driver fixes, and for good measure, a number of much-reported MUSB
driver issues have finally been resolved.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (72 commits)
USB: fix problems with duplicate endpoint addresses
usb: ohci-at91: use descriptor-based gpio APIs correctly
usb: storage: unusual_uas: Add JMicron JMS56x to unusual device
usb: hub: Move hub_port_disable() to fix warning if PM is disabled
usb: musb: blackfin: add bfin_fifo_offset in bfin_ops
usb: musb: fix compilation warning on unused function
usb: musb: Fix trying to free already-free IRQ 4
usb: musb: dsps: implement clear_ep_rxintr() callback
usb: musb: core: add clear_ep_rxintr() to musb_platform_ops
USB: serial: ti_usb_3410_5052: fix NULL-deref at open
USB: serial: spcp8x5: fix NULL-deref at open
USB: serial: quatech2: fix sleep-while-atomic in close
USB: serial: pl2303: fix NULL-deref at open
USB: serial: oti6858: fix NULL-deref at open
USB: serial: omninet: fix NULL-derefs at open and disconnect
USB: serial: mos7840: fix misleading interrupt-URB comment
USB: serial: mos7840: remove unused write URB
USB: serial: mos7840: fix NULL-deref at open
USB: serial: mos7720: remove obsolete port initialisation
USB: serial: mos7720: fix parallel probe
...
Linus Torvalds [Sun, 8 Jan 2017 19:37:44 +0000 (11:37 -0800)]
Merge tag 'char-misc-4.10-rc3' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are a few small char/misc driver fixes for 4.10-rc3.
Two MEI driver fixes, and three NVMEM patches for reported issues, and
a new Hyper-V driver MAINTAINER update. Nothing major at all, all have
been in linux-next with no reported issues"
* tag 'char-misc-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
hyper-v: Add myself as additional MAINTAINER
nvmem: fix nvmem_cell_read() return type doc
nvmem: imx-ocotp: Fix wrong register size
nvmem: qfprom: Allow single byte accesses for read/write
mei: move write cb to completion on credentials failures
mei: bus: fix mei_cldev_enable KDoc
Linus Torvalds [Sun, 8 Jan 2017 19:22:00 +0000 (11:22 -0800)]
Merge tag 'staging-4.10-rc3' of git://git./linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Greg KH:
"Here are some staging and IIO driver fixes for 4.10-rc3.
Most of these are minor IIO fixes of reported issues, along with one
network driver fix to resolve an issue. And a MAINTAINERS update with
a new mailing list. All of these, except the MAINTAINERS file update,
have been in linux-next with no reported issues (the MAINTAINERS patch
happened on Friday...)"
* tag 'staging-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
MAINTAINERS: add greybus subsystem mailing list
staging: octeon: Call SET_NETDEV_DEV()
iio: accel: st_accel: fix LIS3LV02 reading and scaling
iio: common: st_sensors: fix channel data parsing
iio: max44000: correct value in illuminance_integration_time_available
iio: adc: TI_AM335X_ADC should depend on HAS_DMA
iio: bmi160: Fix time needed to sleep after command execution
iio: 104-quad-8: Fix active level mismatch for the preset enable option
iio: 104-quad-8: Fix off-by-one errors when addressing IOR
iio: 104-quad-8: Fix index control configuration
Johannes Weiner [Sat, 7 Jan 2017 00:21:43 +0000 (19:21 -0500)]
mm: workingset: fix use-after-free in shadow node shrinker
Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked like
use-after-free access from the shadow node shrinker.
Dave Jones managed to reproduce the issue with a debug patch applied,
which confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:
WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
Call Trace:
delete_node+0x1e4/0x200
__radix_tree_delete_node+0xd/0x10
shadow_lru_isolate+0xe6/0x220
__list_lru_walk_one.isra.4+0x9b/0x190
list_lru_walk_one+0x23/0x30
scan_shadow_nodes+0x2e/0x40
shrink_slab.part.44+0x23d/0x5d0
shrink_node+0x22c/0x330
kswapd+0x392/0x8f0
This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
inlined radix_tree_shrink().
The problem is with
14b468791fa9 ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node.
While the reclaimed shadow node itself is unlinked by the shrinker, its
deletion from the tree can cause the left-most leaf node in the tree to
be shrunk. If that happens to be a shadow node as well, we don't unlink
it from the LRU as we should.
Consider this tree, where the s are shadow entries:
root->rnode
|
[0 n]
| |
[s ] [sssss]
Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:
root->rnode
|
[0 ]
|
[s ]
Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:
root->rnode
|
[s ]
The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root->rnode directly and free
the node - which is a leaf node on the shadow node LRU.
root->rnode
|
s
Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.
Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.
Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.
Fixes:
14b468791fa9 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
Reported-by: Dave Chinner <david@fromorbit.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Leech <cleech@redhat.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 7 Jan 2017 23:37:31 +0000 (15:37 -0800)]
mm: stop leaking PageTables
4.10-rc loadtest (even on x86, and even without THPCache) fails with
"fork: Cannot allocate memory" or some such; and /proc/meminfo shows
PageTables growing.
Commit
953c66c2b22a ("mm: THP page cache support for ppc64") that got
merged in rc1 removed the freeing of an unused preallocated pagetable
after do_fault_around() has called map_pages().
This is usually a good optimization, so that the followup doesn't have
to reallocate one; but it's not sufficient to shift the freeing into
alloc_set_pte(), since there are failure cases (most commonly
VM_FAULT_RETRY) which never reach finish_fault().
Check and free it at the outer level in do_fault(), then we don't need
to worry in alloc_set_pte(), and can restore that to how it was (I
cannot find any reason to pte_free() under lock as it was doing).
And fix a separate pagetable leak, or crash, introduced by the same
change, that could only show up on some ppc64: why does do_set_pmd()'s
failure case attempt to withdraw a pagetable when it never deposited
one, at the same time overwriting (so leaking) the vmf->prealloc_pte?
Residue of an earlier implementation, perhaps? Delete it.
Fixes:
953c66c2b22a ("mm: THP page cache support for ppc64")
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>