Linus Torvalds [Sat, 19 Mar 2016 03:25:49 +0000 (20:25 -0700)]
Merge branch 'for-4.6' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"cgroup changes for v4.6-rc1. No userland visible behavior changes in
this pull request. I'll send out a separate pull request for the
addition of cgroup namespace support.
- The biggest change is the revamping of cgroup core task migration
and controller handling logic. There are quite a few places where
controllers and tasks are manipulated. Previously, many of those
places implemented custom operations for each specific use case
assuming specific starting conditions. While this worked, it makes
the code fragile and difficult to follow.
The bulk of this pull request restructures these operations so that
most related operations are performed through common helpers which
implement recursive (subtrees are always processed consistently)
and idempotent (they make cgroup hierarchy converge to the target
state rather than performing operations assuming specific starting
conditions). This makes the code a lot easier to understand,
verify and extend.
- Implicit controller support is added. This is primarily for using
perf_event on the v2 hierarchy so that perf can match cgroup v2
path without requiring the user to do anything special. The kernel
portion of perf_event changes is acked but userland changes are
still pending review.
- cgroup_no_v1= boot parameter added to ease testing cgroup v2 in
certain environments.
- There is a regression introduced during v4.4 devel cycle where
attempts to migrate zombie tasks can mess up internal object
management. This was fixed earlier this week and included in this
pull request w/ stable cc'd.
- Misc non-critical fixes and improvements"
* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (44 commits)
cgroup: avoid false positive gcc-6 warning
cgroup: ignore css_sets associated with dead cgroups during migration
Documentation: cgroup v2: Trivial heading correction.
cgroup: implement cgroup_subsys->implicit_on_dfl
cgroup: use css_set->mg_dst_cgrp for the migration target cgroup
cgroup: make cgroup[_taskset]_migrate() take cgroup_root instead of cgroup
cgroup: move migration destination verification out of cgroup_migrate_prepare_dst()
cgroup: fix incorrect destination cgroup in cgroup_update_dfl_csses()
cgroup: Trivial correction to reflect controller.
cgroup: remove stale item in cgroup-v1 document INDEX file.
cgroup: update css iteration in cgroup_update_dfl_csses()
cgroup: allocate 2x cgrp_cset_links when setting up a new root
cgroup: make cgroup_calc_subtree_ss_mask() take @this_ss_mask
cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends
cgroup: use cgroup_apply_enable_control() in cgroup creation path
cgroup: combine cgroup_mutex locking and offline css draining
cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write()
cgroup: introduce cgroup_{save|propagate|restore}_control()
cgroup: make cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() recursive
cgroup: factor out cgroup_apply_control_enable() from cgroup_subtree_control_write()
...
Linus Torvalds [Sat, 19 Mar 2016 03:06:46 +0000 (20:06 -0700)]
Merge branch 'for-4.6' of git://git./linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo:
- ahci grew runtime power management support so that the controller can
be turned off if no devices are attached.
- sata_via isn't dead yet. It got hotplug support and more refined
workaround for certain WD drives.
- Misc cleanups. There's a merge from for-4.5-fixes to avoid confusing
conflicts in ahci PCI ID table.
* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: ahci_xgene: dereferencing uninitialized pointer in probe
AHCI: Remove obsolete Intel Lewisburg SATA RAID device IDs
ata: sata_rcar: Use ARCH_RENESAS
sata_via: Implement hotplug for VT6421
sata_via: Apply WD workaround only when needed on VT6421
ahci: Add runtime PM support for the host controller
ahci: Add functions to manage runtime PM of AHCI ports
ahci: Convert driver to use modern PM hooks
ahci: Cache host controller version
scsi: Drop runtime PM usage count after host is added
scsi: Set request queue runtime PM status back to active on resume
block: Add blk_set_runtime_active()
ata: ahci_mvebu: add support for Armada 3700 variant
libata: fix unbalanced spin_lock_irqsave/spin_unlock_irq() in ata_scsi_park_show()
libata: support AHCI on OCTEON platform
Linus Torvalds [Sat, 19 Mar 2016 03:05:39 +0000 (20:05 -0700)]
Merge branch 'for-4.6' of git://git./linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
"Three trivial workqueue changes"
* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix comment for work_on_cpu()
sched/core: Get rid of 'cpu' argument in wq_worker_sleeping()
workqueue: Replace usage of init_name with dev_set_name()
Linus Torvalds [Sat, 19 Mar 2016 02:26:54 +0000 (19:26 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:
- a couple of hotfixes
- the rest of MM
- a new timer slack control in procfs
- a couple of procfs fixes
- a few misc things
- some printk tweaks
- lib/ updates, notably to radix-tree.
- add my and Nick Piggin's old userspace radix-tree test harness to
tools/testing/radix-tree/. Matthew said it was a godsend during the
radix-tree work he did.
- a few code-size improvements, switching to __always_inline where gcc
screwed up.
- partially implement character sets in sscanf
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
sscanf: implement basic character sets
lib/bug.c: use common WARN helper
param: convert some "on"/"off" users to strtobool
lib: add "on"/"off" support to kstrtobool
lib: update single-char callers of strtobool()
lib: move strtobool() to kstrtobool()
include/linux/unaligned: force inlining of byteswap operations
include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
usb: common: convert to use match_string() helper
ide: hpt366: convert to use match_string() helper
ata: hpt366: convert to use match_string() helper
power: ab8500: convert to use match_string() helper
power: charger_manager: convert to use match_string() helper
drm/edid: convert to use match_string() helper
pinctrl: convert to use match_string() helper
device property: convert to use match_string() helper
lib/string: introduce match_string() helper
radix-tree tests: add test for radix_tree_iter_next
radix-tree tests: add regression3 test
...
Linus Torvalds [Sat, 19 Mar 2016 00:13:31 +0000 (17:13 -0700)]
Merge branch 'for-4.6/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
"This is the block driver pull request for this merge window. It sits
on top of for-4.6/core, that was just sent out.
This contains:
- A set of fixes for lightnvm. One from Alan, fixing an overflow,
and the rest from the usual suspects, Javier and Matias.
- A set of fixes for nbd from Markus and Dan, and a fixup from Arnd
for correct usage of the signed 64-bit divider.
- A set of bug fixes for the Micron mtip32xx, from Asai.
- A fix for the brd discard handling from Bart.
- Update the maintainers entry for cciss, since that hardware has
transferred ownership.
- Three bug fixes for bcache from Eric Wheeler.
- Set of fixes for xen-blk{back,front} from Jan and Konrad.
- Removal of the cpqarray driver. It has been disabled in Kconfig
since 2013, and we were initially scheduled to remove it in 3.15.
- Various updates and fixes for NVMe, with the most important being:
- Removal of the per-device NVMe thread, replacing that with a
watchdog timer instead. From Christoph.
- Exposing the namespace WWID through sysfs, from Keith.
- Set of cleanups from Ming Lin.
- Logging the controller device name instead of the underlying
PCI device name, from Sagi.
- And a bunch of fixes and optimizations from the usual suspects
in this area"
* 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits)
NVMe: Expose ns wwid through single sysfs entry
drivers:block: cpqarray clean up
brd: Fix discard request processing
cpqarray: remove it from the kernel
cciss: update MAINTAINERS
NVMe: Remove unused sq_head read in completion path
bcache: fix cache_set_flush() NULL pointer dereference on OOM
bcache: cleaned up error handling around register_cache()
bcache: fix race of writeback thread starting before complete initialization
NVMe: Create discard zero quirk white list
nbd: use correct div_s64 helper
mtip32xx: remove unneeded variable in mtip_cmd_timeout()
lightnvm: generalize rrpc ppa calculations
lightnvm: remove struct nvm_dev->total_blocks
lightnvm: rename ->nr_pages to ->nr_sects
lightnvm: update closed list outside of intr context
xen/blback: Fit the important information of the thread in 17 characters
lightnvm: fold get bb tbl when using dual/quad plane mode
lightnvm: fix up nonsensical configure overrun checking
xen-blkback: advertise indirect segment support earlier
...
Linus Torvalds [Fri, 18 Mar 2016 23:43:11 +0000 (16:43 -0700)]
Merge branch 'for-4.6/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
"Here are the core block changes for this merge window. Not a lot of
exciting stuff going on in this round, most of the changes have been
on the driver side of things. That pull request is coming next. This
pull request contains:
- A set of fixes for chained bio handling from Christoph.
- A tag bounds check for blk-mq from Hannes, ensuring that we don't
do something stupid if a device reports an invalid tag value.
- A set of fixes/updates for the CFQ IO scheduler from Jan Kara.
- A set of blk-mq fixes from Keith, adding support for dynamic
hardware queues, and fixing init of max_dev_sectors for stacking
devices.
- A fix for the dynamic hw context from Ming.
- Enabling of cgroup writeback support on a block device, from
Shaohua"
* 'for-4.6/core' of git://git.kernel.dk/linux-block:
blk-mq: add bounds check on tag-to-rq conversion
block: bio_remaining_done() isn't unlikely
block: cleanup bio_endio
block: factor out chained bio completion
block: don't unecessarily clobber bi_error for chained bios
block-dev: enable writeback cgroup support
blk-mq: Fix NULL pointer updating nr_requests
blk-mq: mark request queue as mq asap
block: Initialize max_dev_sectors to 0
blk-mq: dynamic h/w context count
cfq-iosched: Allow parent cgroup to preempt its child
cfq-iosched: Allow sync noidle workloads to preempt each other
cfq-iosched: Reorder checks in cfq_should_preempt()
cfq-iosched: Don't group_idle if cfqq has big thinktime
Linus Torvalds [Fri, 18 Mar 2016 23:19:15 +0000 (16:19 -0700)]
Merge tag 'usb-4.6-rc1' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here is a USB fix for the reported issue with commit
69bec7259853
("USB: core: let USB device know device node") as well as some other
issues that have been reported so far with this merge window"
* tag 'usb-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: uas: Reduce can_queue to MAX_CMNDS
USB: cdc-acm: more sanity checking
USB: usb_driver_claim_interface: add sanity checking
usb/core: usb_alloc_dev(): fix setting of ->portnum
USB: iowarrior: fix oops with malicious USB descriptors
Linus Torvalds [Fri, 18 Mar 2016 17:21:04 +0000 (10:21 -0700)]
Merge tag 'for-linus-4.6' of git://git.code.sf.net/p/openipmi/linux-ipmi
Pull IPMI updates from Corey Minyard:
"Just some minor fixes, nothing big"
* tag 'for-linus-4.6' of git://git.code.sf.net/p/openipmi/linux-ipmi:
ipmi: do not probe ACPI devices if si_tryacpi is unset
ipmi_si: Avoid a wrong long timeout on transaction done
ipmi_si: Fix module parameter doc names
ipmi_ssif: Fix logic around alert handling
Linus Torvalds [Fri, 18 Mar 2016 17:15:11 +0000 (10:15 -0700)]
Merge tag 'mfd-for-linus-4.6' of git://git./linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"New Drivers:
- Freescale Touch Screen ADC
- X-Powers AXP PMIC with RSB
- TI TPS65086 Power Management IC (PMIC)
New Device Support:
- Supply device PCI IDs for Intel Broxton
Fix-ups:
- Move to clkdev_create() API; intel_quark_i2c_gpio
- Complete re-write of TI's TPS65912 Power Management IC (PMIC)
- Remove unnecessary function argument; axp20x
- Separate out bus related code; axp20x
- Coding Style changes; axp20x
- Allow more drivers to be compiled as modules
- Work around false positive 'used uninitialised' warning; db8500-prcmu
Bug Fixes:
- Remove do_div(); fsl-imx25-gcq
- Fix driver init when built-in; tps65010
- Fix clock-unregister leak; intel-lpss"
* tag 'mfd-for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (53 commits)
mfd: intel-lpss: Pass I2C configuration via properties on BXT
mfd: imx6sx: Add PCIe register definitions for iomuxc gpr
mfd: ipaq-micro: Use __maybe_unused to hide pm functions
mfd: max77686: Add max77802 to I2C device ID table
mfd: max77686: Export OF module alias information
mfd: max77686: Allow driver to be built as a module
mfd: stmpe: Add the proper PWM resources
mfd: tps65090: Set regmap config reg counts properly
mfd: syscon: Return ENOTSUPP instead of ENOSYS when disabled
mfd: as3711: Set regmap config reg counts properly
mfd: rc5t583: Set regmap config reg counts properly
gpio: tps65086: Add GPO driver for the TPS65086 PMIC
mfd: mt6397: Add platform device ID table
mfd: da9063: Fix missing volatile registers in the core regmap_range volatile lists
mfd: mt6397: Add MT6323 support to MT6397 driver
mfd: mt6397: Add support for different Slave types
mfd: mt6397: int_con and int_status may vary in location
dt-bindings: mfd: Add bindings for the MediaTek MT6323 PMIC
mfd: da9062: Fix missing volatile registers in the core regmap_range volatile lists
mfd: Add documentation for ACT8945A DT bindings
...
Linus Torvalds [Fri, 18 Mar 2016 17:05:46 +0000 (10:05 -0700)]
Merge tag 'sound-4.6-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"After a heavy storm by syzkaller in 4.5 cycle, we have relatively few
changes in the core at this time while a lot of changes are found in
the driver side, unsurprisingly. Below are some highlights:
ALSA core:
- A few more hardening in ALSA timer codes
- An extension of sequencer API for advertising the card / pid
- Small fixes in compress-offload and jack layers
HD-audio:
- Dynamic PCM assignment in HDMI/DP codec; preparation for upcoming
DP-MST support
- Lots of code refactoring for sharing with ASoC SKL driver
- Regression fixes for Intel HDMI/DP
- Fixups for CX20724 codec, Lenovo AiO
USB-audio:
- Add quirk_alias option to make quirk debugging easier
- Fixes for possible Oops by malformed firmware
Firewire:
- Add support for FW-1804 in tascam driver
- Improvements / changes in card registration, multi stream handling,
etc for DICE
- Lots of code refactoring
ASoC:
- Enhancements of still ongoing topology API
- Lots of commits for Intel Skylake support including HDMI support
- A few Intel Atom driver updates for recent devices
- Lots of improvements to the Renesas drivers
- Capture support for Qualcomm drivers
- Support for TI DaVinci DRA7xxx devices
- New machine drivers for Freescale systems with Cirrus CODECs,
Mediatek systems with RT5650 CODECs
- New CPU drivers for Allwinner S/PDIF controllers
- New CODEC drivers for Maxim MAX9867 and MAX98926 and Realtek RT5514"
* tag 'sound-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (291 commits)
ALSA: hda - Fix mutex deadlock at HDMI/DP hotplug
ALSA: ctl: change return value in compatibility layer so that it's the same value in core implementation
ALSA: mixart: silence an uninitialized variable warning
ALSA: usb-audio: Add sanity checks for endpoint accesses
ALSA: usb-audio: Minor code cleanup in create_fixed_stream_quirk()
ALSA: usb-audio: Fix NULL dereference in create_fixed_stream_quirk()
ALSA: hda - Limit i915 HDMI binding only for HSW and later
ALSA: hda - Fix unconditional GPIO toggle via automute
ALSA: mixart: silence unitialized variable warnings
ALSA: hda - Fixes double fault in nvhdmi_chmap_cea_alloc_validate_get_type
ALSA: intel8x0: Add clock quirk entry for
AD1981B on IBM ThinkPad X41.
ALSA: hda - Add new GPU codec ID 0x10de0082 to snd-hda
ASoC: rsnd: add simplified module explanation
ASoC: hdac_hdmi: Add broxton device ID
ASoC: Intel: Bxtn: Add Broxton PCI ID
ASoC: Intel: Skylake: Move Skylake dsp ops & loader ops
ASoC: Intel: add dmabuffer to common sst_dsp
ASoC: Intel: Skylake: Unstatify skl_dsp_enable_core
ASoC: Intel: Skylake: Fix whitepsace issues
ASoC: Intel: Skylake: Move module id defines
...
Linus Torvalds [Fri, 18 Mar 2016 16:39:22 +0000 (09:39 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"Initial roundup of 4.6 merge window patches.
This is the first of two pull requests. It is the smaller request,
but touches for more different things (this is everything but what is
in or going into staging). The pull request for the code in
staging/rdma is on hold until after we decide what to do on the
write/writev API issue and may be partially deferred until 4.7 as a
result.
Summary:
- cxgb4 updates
- nes updates
- unification of iwarp portmapper code to core
- add drain_cq API
- various ib_core updates
- minor ipoib updates
- minor mlx4 updates
- more significant mlx5 updates (including a minor merge conflict
with net-next tree...merge is simple to resolve and Stephen's
resolution was confirmed by Mellanox)
- trivial net/9p rdma conversion
- ocrdma RoCEv2 update
- srpt updates"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (85 commits)
iwpm: crash fix for large connections test
iw_cxgb3: support for iWARP port mapping
iw_cxgb4: remove port mapper related code
iw_nes: remove port mapper related code
iwcm: common code for port mapper
net/9p: convert to new CQ API
IB/mlx5: Add support for don't trap rules
net/mlx5_core: Introduce forward to next priority action
net/mlx5_core: Create anchor of last flow table
iser: Accept arbitrary sg lists mapping if the device supports it
mlx5: Add arbitrary sg list support
IB/core: Add arbitrary sg_list support
IB/mlx5: Expose correct max_fast_reg_page_list_len
IB/mlx5: Make coding style more consistent
IB/mlx5: Convert UMR CQ to new CQ API
IB/ocrdma: Skip using unneeded intermediate variable
IB/ocrdma: Skip using unneeded intermediate variable
IB/ocrdma: Delete unnecessary variable initialisations in 11 functions
IB/core: Documentation fix in the MAD header file
IB/core: trivial prink cleanup.
...
Hans de Goede [Mon, 7 Mar 2016 19:11:52 +0000 (20:11 +0100)]
USB: uas: Reduce can_queue to MAX_CMNDS
The uas driver can never queue more then MAX_CMNDS (- 1) tags and tags
are shared between luns, so there is no need to claim that we can_queue
some random large number.
Not claiming that we can_queue 65536 commands, fixes the uas driver
failing to initialize while allocating the tag map with a "Page allocation
failure (order 7)" error on systems which have been running for a while
and thus have fragmented memory.
Cc: stable@vger.kernel.org
Reported-and-tested-by: Yves-Alexis Perez <corsac@corsac.net>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oliver Neukum [Tue, 15 Mar 2016 09:14:04 +0000 (10:14 +0100)]
USB: cdc-acm: more sanity checking
An attack has become available which pretends to be a quirky
device circumventing normal sanity checks and crashes the kernel
by an insufficient number of interfaces. This patch adds a check
to the code path for quirky devices.
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oliver Neukum [Wed, 16 Mar 2016 12:26:17 +0000 (13:26 +0100)]
USB: usb_driver_claim_interface: add sanity checking
Attacks that trick drivers into passing a NULL pointer
to usb_driver_claim_interface() using forged descriptors are
known. This thwarts them by sanity checking.
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicolai Stange [Thu, 17 Mar 2016 22:53:02 +0000 (23:53 +0100)]
usb/core: usb_alloc_dev(): fix setting of ->portnum
With commit
69bec7259853 ("USB: core: let USB device know device node"),
the port1 argument of usb_alloc_dev() gets overwritten as follows:
... usb_alloc_dev(..., unsigned port1)
{
...
if (!parent->parent) {
port1 = usb_hcd_find_raw_port_number(..., port1);
}
...
}
Later on, this now overwritten port1 gets assigned to ->portnum:
dev->portnum = port1;
However, since xhci_find_raw_port_number() isn't idempotent, the
aforementioned commit causes a number of KASAN splats like the following:
BUG: KASAN: slab-out-of-bounds in xhci_find_raw_port_number+0x98/0x170
at addr
ffff8801d9311670
Read of size 8 by task kworker/2:1/87
[...]
Workqueue: usb_hub_wq hub_event
0000000000000188 000000005814b877 ffff8800cba17588 ffffffff8191447e
0000000041b58ab3 ffffffff82a03209 ffffffff819143a2 ffffffff82a252f4
ffff8801d93115e0 0000000000000188 ffff8801d9311628 ffff8800cba17588
Call Trace:
[<
ffffffff8191447e>] dump_stack+0xdc/0x15e
[<
ffffffff819143a2>] ? _atomic_dec_and_lock+0xa2/0xa2
[<
ffffffff814e2cd1>] ? print_section+0x61/0xb0
[<
ffffffff814e4939>] print_trailer+0x179/0x2c0
[<
ffffffff814f0d84>] object_err+0x34/0x40
[<
ffffffff814f4388>] kasan_report_error+0x2f8/0x8b0
[<
ffffffff814eb91e>] ? __slab_alloc+0x5e/0x90
[<
ffffffff812178c0>] ? __lock_is_held+0x90/0x130
[<
ffffffff814f5091>] kasan_report+0x71/0xa0
[<
ffffffff814ec082>] ? kmem_cache_alloc_trace+0x212/0x560
[<
ffffffff81d99468>] ? xhci_find_raw_port_number+0x98/0x170
[<
ffffffff814f33d4>] __asan_load8+0x64/0x70
[<
ffffffff81d99468>] xhci_find_raw_port_number+0x98/0x170
[<
ffffffff81db0105>] xhci_setup_addressable_virt_dev+0x235/0xa10
[<
ffffffff81d9ea51>] xhci_setup_device+0x3c1/0x1430
[<
ffffffff8121cddd>] ? trace_hardirqs_on+0xd/0x10
[<
ffffffff81d9fac0>] ? xhci_setup_device+0x1430/0x1430
[<
ffffffff81d9fad3>] xhci_address_device+0x13/0x20
[<
ffffffff81d2081a>] hub_port_init+0x55a/0x1550
[<
ffffffff81d28705>] hub_event+0xef5/0x24d0
[<
ffffffff81d27810>] ? hub_port_debounce+0x2f0/0x2f0
[<
ffffffff8195e1ee>] ? debug_object_deactivate+0x1be/0x270
[<
ffffffff81210203>] ? print_rt_rq+0x53/0x2d0
[<
ffffffff8121657d>] ? trace_hardirqs_off+0xd/0x10
[<
ffffffff8226acfb>] ? _raw_spin_unlock_irqrestore+0x5b/0x60
[<
ffffffff81250000>] ? irq_domain_set_hwirq_and_chip+0x30/0xb0
[<
ffffffff81256339>] ? debug_lockdep_rcu_enabled+0x39/0x40
[<
ffffffff812178c0>] ? __lock_is_held+0x90/0x130
[<
ffffffff81196877>] process_one_work+0x567/0xec0
[...]
Afterwards, xhci reports some functional errors:
xhci_hcd 0000:00:14.0: ERROR: unexpected setup address command completion
code 0x11.
xhci_hcd 0000:00:14.0: ERROR: unexpected setup address command completion
code 0x11.
usb 4-3: device not accepting address 2, error -22
Fix this by not overwriting the port1 argument in usb_alloc_dev(), but
storing the raw port number as required by OF in an additional variable,
raw_port.
Fixes:
69bec7259853 ("USB: core: let USB device know device node")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Josh Boyer [Mon, 14 Mar 2016 14:42:38 +0000 (10:42 -0400)]
USB: iowarrior: fix oops with malicious USB descriptors
The iowarrior driver expects at least one valid endpoint. If given
malicious descriptors that specify 0 for the number of endpoints,
it will crash in the probe function. Ensure there is at least
one endpoint on the interface before using it.
The full report of this issue can be found here:
http://seclists.org/bugtraq/2016/Mar/87
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Joe Lawrence [Thu, 18 Feb 2016 21:02:54 +0000 (16:02 -0500)]
ipmi: do not probe ACPI devices if si_tryacpi is unset
Extend the tryacpi module parameter to turn off acpi_ipmi_probe such
that hard-coded options (type, ports, address, etc.) have complete
control over the smi_info data structures setup by the driver.
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Corey Minyard [Mon, 25 Jan 2016 22:11:20 +0000 (16:11 -0600)]
ipmi_si: Avoid a wrong long timeout on transaction done
Under some circumstances, the IPMI state machine could return
a call without delay option but the driver would still do a long
delay because the result wasn't checked. Instead of calling
the state machine after transaction done, just go back to the
top of the processing to start over.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Corey Minyard [Tue, 19 Jan 2016 20:51:53 +0000 (14:51 -0600)]
ipmi_si: Fix module parameter doc names
Several were tryacpi instead of their actual values.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Corey Minyard [Wed, 6 Jan 2016 15:32:06 +0000 (09:32 -0600)]
ipmi_ssif: Fix logic around alert handling
There was a mistake in the logic, if an alert came in very quickly
it would hang the driver.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Linus Torvalds [Fri, 18 Mar 2016 05:13:41 +0000 (22:13 -0700)]
Merge tag 'staging-4.6-rc1' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
"Here is the big staging driver pull request for 4.6-rc1.
Lots of little things here, over 1600 patches or so. Notable is all
of the good Lustre work happening, those developers have finally woken
up and are cleaning up their code greatly. The Outreachy intern
application process is also happening, which brought in another 400 or
so patches. Full details are in the very long shortlog.
All of these have been in linux-next with no reported issues"
* tag 'staging-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1673 commits)
staging: lustre: fix aligments in lnet selftest
staging: lustre: report minimum of two buffers for LNet selftest load test
staging: lustre: test for proper errno code in lstcon_rpc_trans_abort
staging: lustre: filter remaining extra spacing for lnet selftest
staging: lustre: remove extra spacing when setting variable for lnet selftest
staging: lustre: remove extra spacing of variable declartions for lnet selftest
staging: lustre: fix spacing issues checkpatch reported in lnet selftest
staging: lustre: remove returns in void function for lnet selftest
staging: lustre: fix bogus lst errors for lnet selftest
staging: netlogic: Replacing pr_err with dev_err after the call to devm_kzalloc
staging: mt29f_spinand: Replacing pr_info with dev_info after the call to devm_kzalloc
staging: android: ion: fix up file mode
staging: ion: debugfs invalid gfp mask
staging: rts5208: Replace pci_enable_device with pcim_enable_device
Staging: ieee80211: Place constant on right side of the test.
staging: speakup: Replace del_timer with del_timer_sync
staging: lowmemorykiller: fix 2 checks that checkpatch complained
staging: mt29f_spinand: Drop void pointer cast
staging: rdma: hfi1: file_ops: Replace ALIGN with PAGE_ALIGN
staging: rdma: hfi1: driver: Replace IS_ALIGNED with PAGE_ALIGNED
...
Linus Torvalds [Fri, 18 Mar 2016 04:51:52 +0000 (21:51 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"The most notable item is addition of support for Synaptics RMI4
protocol which is native protocol for all current Synaptics devices
(touchscreens, touchpads). In later releases we'll switch devices
using HID and PS/2 protocol emulation to RMI4.
You will also get:
- BYD PS/2 touchpad protocol support for psmouse
- MELFAS MIP4 Touchscreen driver
- rotary encoder was moved away from legacy platform data and to
generic device properties API, devm_* API, and can now handle
encoders using more than 2 GPIOs
- Cypress touchpad driver was switched to devm_* API and device
properties
- other assorted driver fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (40 commits)
ARM: pxa/raumfeld: use PROPERTY_ENTRY_INTEGER to define props
Input: synaptics-rmi4 - using logical instead of bitwise AND
Input: powermate - fix oops with malicious USB descriptors
Input: snvs_pwrkey - fix returned value check of syscon_regmap_lookup_by_phandle()
MAINTAINERS: add devicetree bindings to Input Drivers section
Input: synaptics-rmi4 - add device tree support to the SPI transport driver
Input: synaptics-rmi4 - add SPI transport driver
Input: synaptics-rmi4 - add support for F30
Input: synaptics-rmi4 - add support for F12
Input: synaptics-rmi4 - add device tree support for 2d sensors and F11
Input: synaptics-rmi4 - add support for 2D sensors and F11
Input: synaptics-rmi4 - add device tree support for RMI4 I2C devices
Input: synaptics-rmi4 - add I2C transport driver
Input: synaptics-rmi4 - add support for Synaptics RMI4 devices
Input: ad7879 - add device tree support
Input: ad7879 - fix default x/y axis assignment
Input: ad7879 - move header to platform_data directory
Input: ts4800 - add hardware dependency
Input: cyapa - fix for losing events during device power transitions
Input: sh_keysc - remove dependency on SUPERH
...
Linus Torvalds [Fri, 18 Mar 2016 04:46:32 +0000 (21:46 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/livepatching
Pull livepatching update from Jiri Kosina:
- cleanup of module notifiers; this depends on a module.c cleanup which
has been acked by Rusty; from Jessica Yu
- small assorted fixes and MAINTAINERS update
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch/module: remove livepatch module notifier
modules: split part of complete_formation() into prepare_coming_module()
livepatch: Update maintainers
livepatch: Fix the error message about unresolvable ambiguity
klp: remove CONFIG_LIVEPATCH dependency from klp headers
klp: remove superfluous errors in asm/livepatch.h
Linus Torvalds [Fri, 18 Mar 2016 04:38:27 +0000 (21:38 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
drivers/rtc: broken link fix
drm/i915 Fix typos in i915_gem_fence.c
Docs: fix missing word in REPORTING-BUGS
lib+mm: fix few spelling mistakes
MAINTAINERS: add git URL for APM driver
treewide: Fix typo in printk
Linus Torvalds [Fri, 18 Mar 2016 04:32:20 +0000 (21:32 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
- functionally equivalent cleanups for wacom driver, making the code
more readable, from Benjamin Tissoires
- a bunch of improvements and fixes for thingm driver from Heiner
Kallweit
- bugfixes to out-of-bound access for generic parsing functions (which
have been there since ever) extract() and implement(), from Dmitry
Torokhov
- a lot of added / improved device support in sony, wacom, microsoft,
multitouch and logitech driver, from various people
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (44 commits)
HID: microsoft: Add ID for MS Wireless Comfort Keyboard
hid: thingm: reorder calls in thingm_probe
HID: i2c-hid: fix OOB write in i2c_hid_set_or_send_report()
HID: multitouch: Release all touch slots on reset_resume
HID: usbhid: enable NO_INIT_REPORTS quirk for Semico USB Keykoard2
HID: penmount: report only one button for PenMount 6000 USB touchscreen controller
HID: i2c-hid: Fix suspend/resume when already runtime suspended
HID: i2c-hid: Add hid-over-i2c name to i2c id table
HID: multitouch: force retrieving of Win8 signature blob
HID: Support for CMedia CM6533 HID audio jack controls
HID: thingm: improve locking
HID: thingm: switch to managed version of led_classdev_register
HID: thingm: remove workqueue
HID: corsair: fix mapping of non-keyboard usages
HID: wacom: close the wireless receiver on remove()
HID: wacom: cleanup input devices
HID: wacom: reuse wacom_parse_and_register() in wireless_work
HID: wacom: move down wireless_work()
HID: wacom: break out parsing of device and registering of input
HID: wacom: break out wacom_intuos_get_tool_type
...
Linus Torvalds [Fri, 18 Mar 2016 04:05:32 +0000 (21:05 -0700)]
Merge tag 'gpio-v4.6-1' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for kernel v4.6. There is quite a
lot of interesting stuff going on.
The patches to other subsystems and arch-wide are ACKed as far as
possible, though I consider things like per-arch <asm/gpio.h> as
essentially a part of the GPIO subsystem so it should not be needed.
Core changes:
- The gpio_chip is now a *real device*. Until now the gpio chips
were just piggybacking the parent device or (gasp) floating in
space outside of the device model.
We now finally make GPIO chips devices. The gpio_chip will create
a gpio_device which contains a struct device, and this gpio_device
struct is kept private. Anything that needs to be kept private
from the rest of the kernel will gradually be moved over to the
gpio_device.
- As a result of making the gpio_device a real device, we have added
resource management, so devm_gpiochip_add_data() will cut down on
overhead and reduce code lines. A huge slew of patches convert
almost all drivers in the subsystem to use this.
- Building on making the GPIO a real device, we add the first step of
a new userspace ABI: the GPIO character device. We take small
steps here, so we first add a pure *information* ABI and the tool
"lsgpio" that will list all GPIO devices on the system and all
lines on these devices.
We can now discover GPIOs properly from userspace. We still have
not come up with a way to actually *use* GPIOs from userspace.
- To encourage people to use the character device for the future, we
have it always-enabled when using GPIO. The old sysfs ABI is still
opt-in (and can be used in parallel), but is marked as deprecated.
We will keep it around for the foreseeable future, but it will not
be extended to cover ever more use cases.
Cleanup:
- Bjorn Helgaas removed a whole slew of per-architecture <asm/gpio.h>
includes.
This dates back to when GPIO was an opt-in feature and no shared
library even existed: just a header file with proper prototypes was
provided and all semantics were up to the arch to implement. These
patches make the GPIO chip even more a proper device and cleans out
leftovers of the old in-kernel API here and there.
Still some cruft is left but it's very little now.
- There is still some clamping of return values for .get() going on,
but we now return sane values in the vast majority of drivers and
the errorpath is sanitized. Some patches for powerpc, blackfin and
unicore still drop in.
- We continue to switch the ARM, MIPS, blackfin, m68k local GPIO
implementations to use gpiochip_add_data() and cut down on code
lines.
- MPC8xxx is converted to use the generic GPIO helpers.
- ATH79 is converted to use the generic GPIO helpers.
New drivers:
- WinSystems WS16C48
- Acces 104-DIO-48E
- F81866 (a F7188x variant)
- Qoric (a MPC8xxx variant)
- TS-4800
- SPI serializers (pisosr): simple 74xx shift registers connected to
SPI to obtain a dirt-cheap output-only GPIO expander.
- Texas Instruments TPIC2810
- Texas Instruments TPS65218
- Texas Instruments TPS65912
- X-Gene (ARM64) standby GPIO controller"
* tag 'gpio-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (194 commits)
Revert "Share upstreaming patches"
gpio: mcp23s08: Fix clearing of interrupt.
gpiolib: Fix comment referring to gpio_*() in gpiod_*()
gpio: pca953x: Fix pca953x_gpio_set_multiple() on 64-bit
gpio: xgene: Fix kconfig for standby GIPO contoller
gpio: Add generic serializer DT binding
gpio: uapi: use 0xB4 as ioctl() major
gpio: tps65912: fix bad merge
Revert "gpio: lp3943: Drop pin_used and lp3943_gpio_request/lp3943_gpio_free"
gpio: omap: drop dev field from gpio_bank structure
gpio: mpc8xxx: Slightly update the code for better readability
gpio: mpc8xxx: Remove *read_reg and *write_reg from struct mpc8xxx_gpio_chip
gpio: mpc8xxx: Fixup setting gpio direction output
gpio: mcp23s08: Add support for mcp23s18
dt-bindings: gpio: altera: Fix altr,interrupt-type property
gpio: add driver for MEN 16Z127 GPIO controller
gpio: lp3943: Drop pin_used and lp3943_gpio_request/lp3943_gpio_free
gpio: timberdale: Switch to devm_ioremap_resource()
gpio: ts4800: Add IMX51 dependency
gpiolib: rewrite gpiodev_add_to_list
...
Linus Torvalds [Fri, 18 Mar 2016 03:19:19 +0000 (20:19 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
"The main change is the removal of the bit-rotten 68360 support. Also
a fix to always make the ethernet FEC platform info available"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: remove obsolete 68360 support
m68knommu: fix FEC platform device registration when driver is modular
Linus Torvalds [Fri, 18 Mar 2016 03:03:47 +0000 (20:03 -0700)]
Merge tag 'arm64-upstream' of git://git./linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Here are the main arm64 updates for 4.6. There are some relatively
intrusive changes to support KASLR, the reworking of the kernel
virtual memory layout and initial page table creation.
Summary:
- Initial page table creation reworked to avoid breaking large block
mappings (huge pages) into smaller ones. The ARM architecture
requires break-before-make in such cases to avoid TLB conflicts but
that's not always possible on live page tables
- Kernel virtual memory layout: the kernel image is no longer linked
to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
of the vmalloc space, allowing the kernel to be loaded (nearly)
anywhere in physical RAM
- Kernel ASLR: position independent kernel Image and modules being
randomly mapped in the vmalloc space with the randomness is
provided by UEFI (efi_get_random_bytes() patches merged via the
arm64 tree, acked by Matt Fleming)
- Implement relative exception tables for arm64, required by KASLR
(initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
but actual x86 conversion to deferred to 4.7 because of the merge
dependencies)
- Support for the User Access Override feature of ARMv8.2: this
allows uaccess functions (get_user etc.) to be implemented using
LDTR/STTR instructions. Such instructions, when run by the kernel,
perform unprivileged accesses adding an extra level of protection.
The set_fs() macro is used to "upgrade" such instruction to
privileged accesses via the UAO bit
- Half-precision floating point support (part of ARMv8.2)
- Optimisations for CPUs with or without a hardware prefetcher (using
run-time code patching)
- copy_page performance improvement to deal with 128 bytes at a time
- Sanity checks on the CPU capabilities (via CPUID) to prevent
incompatible secondary CPUs from being brought up (e.g. weird
big.LITTLE configurations)
- valid_user_regs() reworked for better sanity check of the
sigcontext information (restored pstate information)
- ACPI parking protocol implementation
- CONFIG_DEBUG_RODATA enabled by default
- VDSO code marked as read-only
- DEBUG_PAGEALLOC support
- ARCH_HAS_UBSAN_SANITIZE_ALL enabled
- Erratum workaround Cavium ThunderX SoC
- set_pte_at() fix for PROT_NONE mappings
- Code clean-ups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Fix misspellings in comments.
arm64: efi: add missing frame pointer assignment
arm64: make mrs_s prefixing implicit in read_cpuid
arm64: enable CONFIG_DEBUG_RODATA by default
arm64: Rework valid_user_regs
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: KVM: Move kvm_call_hyp back to its original localtion
arm64: mm: treat memstart_addr as a signed quantity
arm64: mm: list kernel sections in order
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
arm64: kconfig: add submenu for 8.2 architectural features
arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
arm64: Add support for Half precision floating point
arm64: Remove fixmap include fragility
arm64: Add workaround for Cavium erratum 27456
arm64: mm: Mark .rodata as RO
...
Linus Torvalds [Fri, 18 Mar 2016 02:37:08 +0000 (19:37 -0700)]
Merge tag 'linux-kselftest-4.6-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:
"This update for Kselftest adds:
- A new feature to create test-specific kconfig fragments. This
feature helps configure Kselftests to test specific Kernel
Configuration options as opposed to defconfig.
- A new test for Media Controller API
- A few fixes"
* tag 'linux-kselftest-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: media_dcevice_test fix usage information
selftests: media_dcevice_test fix to handle ioctl failure case
selftests: add missing .gitignore file or entry
Makefile: add kselftest-merge
selftests: create test-specific kconfig fragments
selftests: breakpoint: add step_after_suspend_test
selftests: add a new test for Media Controller API
Linus Torvalds [Thu, 17 Mar 2016 23:57:55 +0000 (16:57 -0700)]
Merge tag 'please-pull-pstore' of git://git./linux/kernel/git/aegl/linux
Pull pstore update from Tony Luck:
"Allow ram backend to be configured with addresses above 4GB"
* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
pstore: Add support for 64 Bit address space
Linus Torvalds [Thu, 17 Mar 2016 23:51:32 +0000 (16:51 -0700)]
Merge tag 'gfs2-merge-window' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 updates from Bob Peterson:
"We only have six patches ready for this merge window:
- Arnd Bergmann contributed a patch that fixes an uninitialized
variable warning.
- The second patch avoids a kernel panic due to referencing an iopen
glock that may not be held, in an error path.
- The third patch fixes a rounding error that caused xfs_tests direct
IO write "fsx" tests to fail on GFS2.
- The fourth patch tidies up the code path when glocks are being
reused to recreate a dinode that was recently deleted.
- The fifth reverts an ages-old patch that should no longer be
needed, and which interfered with the transition of dinodes from
unlinked to free.
- And lastly, a patch to eliminate a function parameter that's not
needed"
* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
GFS2: Eliminate parameter non_block on gfs2_inode_lookup
GFS2: Don't filter out I_FREEING inodes anymore
GFS2: Prevent delete work from occurring on glocks used for create
GFS2: Fix direct IO write rounding error
gfs2: avoid uninitialized variable warning
GFS2: Check if iopen is held when deleting inode
Linus Torvalds [Thu, 17 Mar 2016 23:38:36 +0000 (16:38 -0700)]
Merge tag 'dlm-4.6' of git://git./linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"Previous changes introduced the use of socket error reporting for dlm
sockets. This set includes two fixes in how the socket error
callbacks are used"
* tag 'dlm-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
DLM: Save and restore socket callbacks properly
DLM: Replace nodeid_to_addr with kernel_getpeername
Linus Torvalds [Thu, 17 Mar 2016 23:31:18 +0000 (16:31 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Performance improvements in SEEK_DATA and xattr scalability
improvements, plus a lot of clean ups and bug fixes"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (38 commits)
ext4: clean up error handling in the MMP support
jbd2: do not fail journal because of frozen_buffer allocation failure
ext4: use __GFP_NOFAIL in ext4_free_blocks()
ext4: fix compile error while opening the macro DOUBLE_CHECK
ext4: print ext4 mount option data_err=abort correctly
ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry()
ext4: fix misspellings in comments.
jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path
ext4: more efficient SEEK_DATA implementation
ext4: cleanup handling of bh->b_state in DAX mmap
ext4: return hole from ext4_map_blocks()
ext4: factor out determining of hole size
ext4: fix setting of referenced bit in ext4_es_lookup_extent()
ext4: remove i_ioend_count
ext4: simplify io_end handling for AIO DIO
ext4: move trans handling and completion deferal out of _ext4_get_block
ext4: rename and split get blocks functions
ext4: use i_mutex to serialize unaligned AIO DIO
ext4: pack ioend structure better
...
Linus Torvalds [Thu, 17 Mar 2016 23:25:46 +0000 (16:25 -0700)]
Merge tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs
Pull configfs updates from Christoph Hellwig:
- A large patch from me to simplify setting up the list of default
groups by actually implementing it as a list instead of an array.
- a small Y2083 prep patch from Deepa Dinamani. Probably doesn't
matter on it's own, but it seems like he is trying to get rid of all
CURRENT_TIME uses in file systems, which is a worthwhile goal.
* tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs:
configfs: switch ->default groups to a linked list
configfs: Replace CURRENT_TIME by current_fs_time()
Jessica Yu [Thu, 17 Mar 2016 21:23:07 +0000 (14:23 -0700)]
sscanf: implement basic character sets
Implement basic character sets for the '%[' conversion specifier.
The '%[' conversion specifier matches a nonempty sequence of characters
from the specified set of accepted (or with '^', rejected) characters
between the brackets. The substring matched is to be made up of
characters in (or not in) the set. This is useful for matching
substrings that are delimited by something other than spaces.
This implementation differs from its glibc counterpart in the following ways:
(1) No support for character ranges (e.g., 'a-z' or '0-9')
(2) The hyphen '-' is not a special character
(3) The closing bracket ']' cannot be matched
(4) No support (yet) for discarding matching input ('%*[')
The bitmap code is largely based upon sample code which was provided by
Rasmus.
The motivation for adding character set support to sscanf originally
stemmed from the kernel livepatching project. An ongoing patchset
utilizes new livepatch Elf symbol and section names to store important
metadata livepatch needs to properly apply its patches. Such metadata
is stored in these section and symbol names as substrings delimited by
periods '.' and commas ','. For example, a livepatch symbol name might
look like this:
.klp.sym.vmlinux.printk,0
However, sscanf currently can only extract "substrings" delimited by
whitespace using the "%s" specifier. Thus for the above symbol name,
one cannot not use sscanf() to extract substrings "vmlinux" or
"printk", for example. A number of discussions on the livepatch
mailing list dealing with string parsing code for extracting these '.'
and ',' delimited substrings eventually led to the conclusion that such
code would be completely unnecessary if the kernel sscanf() supported
character sets. Thus only a single sscanf() call would be necessary to
extract these substrings. In addition, such an addition to sscanf()
could benefit other areas of the kernel that might have a similar need
in the future.
[akpm@linux-foundation.org: 80-col tweaks]
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Poimboeuf [Thu, 17 Mar 2016 21:23:04 +0000 (14:23 -0700)]
lib/bug.c: use common WARN helper
The traceoff_on_warning option doesn't have any effect on s390, powerpc,
arm64, parisc, and sh because there are two different types of WARN
implementations:
1) The above mentioned architectures treat WARN() as a special case of a
BUG() exception. They handle warnings in report_bug() in lib/bug.c.
2) All other architectures just call warn_slowpath_*() directly. Their
warnings are handled in warn_slowpath_common() in kernel/panic.c.
Support traceoff_on_warning on all architectures and prevent any future
divergence by using a single common function to emit the warning.
Also remove the '()' from '%pS()', because the parentheses look funky:
[ 45.607629] WARNING: at /root/warn_mod/warn_mod.c:17 .init_dummy+0x20/0x40 [warn_mod]()
Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Thu, 17 Mar 2016 21:23:00 +0000 (14:23 -0700)]
param: convert some "on"/"off" users to strtobool
This changes several users of manual "on"/"off" parsing to use
strtobool.
Some side-effects:
- these uses will now parse y/n/1/0 meaningfully too
- the early_param uses will now bubble up parse errors
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Perches <joe@perches.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steve French <sfrench@samba.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Thu, 17 Mar 2016 21:22:57 +0000 (14:22 -0700)]
lib: add "on"/"off" support to kstrtobool
Add support for "on" and "off" when converting to boolean.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Thu, 17 Mar 2016 21:22:54 +0000 (14:22 -0700)]
lib: update single-char callers of strtobool()
Some callers of strtobool() were passing a pointer to unterminated
strings. In preparation of adding multi-character processing to
kstrtobool(), update the callers to not pass single-character pointers,
and switch to using the new kstrtobool_from_user() helper where
possible.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Steve French <sfrench@samba.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Thu, 17 Mar 2016 21:22:50 +0000 (14:22 -0700)]
lib: move strtobool() to kstrtobool()
Create the kstrtobool_from_user() helper and move strtobool() logic into
the new kstrtobool() (matching all the other kstrto* functions).
Provides an inline wrapper for existing strtobool() callers.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Steve French <sfrench@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Thu, 17 Mar 2016 21:22:47 +0000 (14:22 -0700)]
include/linux/unaligned: force inlining of byteswap operations
Sometimes gcc mysteriously doesn't inline
very small functions we expect to be inlined. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
With this .config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
the following functions get deinlined many times.
Examples of disassembly:
<get_unaligned_be16> (24 copies, 108 calls):
66 8b 07 mov (%rdi),%ax
55 push %rbp
48 89 e5 mov %rsp,%rbp
86 e0 xchg %ah,%al
5d pop %rbp
c3 retq
<get_unaligned_be32> (25 copies, 181 calls):
8b 07 mov (%rdi),%eax
55 push %rbp
48 89 e5 mov %rsp,%rbp
0f c8 bswap %eax
5d pop %rbp
c3 retq
<get_unaligned_be64> (23 copies, 94 calls):
48 8b 07 mov (%rdi),%rax
55 push %rbp
48 89 e5 mov %rsp,%rbp
48 0f c8 bswap %rax
5d pop %rbp
c3 retq
<put_unaligned_be16> (2 copies, 11 calls):
89 f8 mov %edi,%eax
55 push %rbp
c1 ef 08 shr $0x8,%edi
c1 e0 08 shl $0x8,%eax
09 c7 or %eax,%edi
48 89 e5 mov %rsp,%rbp
66 89 3e mov %di,(%rsi)
<put_unaligned_be32> (8 copies, 43 calls):
55 push %rbp
0f cf bswap %edi
89 3e mov %edi,(%rsi)
48 89 e5 mov %rsp,%rbp
5d pop %rbp
c3 retq
<put_unaligned_be64> (26 copies, 157 calls):
55 push %rbp
48 0f cf bswap %rdi
48 89 3e mov %rdi,(%rsi)
48 89 e5 mov %rsp,%rbp
5d pop %rbp
c3 retq
This patch fixes this via s/inline/__always_inline/.
It only affects arches with efficient unaligned access insns, such as x86.
(arched which lack such ops do not include linux/unaligned/access_ok.h)
Code size decrease after the patch is ~8.5k:
text data bss dec hex filename
92197848 20826112 36417536 149441496 8e84bd8 vmlinux
92189231 20826144 36417536 149432911 8e82a4f vmlinux6_unaligned_be_after
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Thu, 17 Mar 2016 21:22:44 +0000 (14:22 -0700)]
include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
Sometimes gcc mysteriously doesn't inline
very small functions we expect to be inlined. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
With this .config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
the following functions get deinlined many times.
Examples of disassembly:
<get_unaligned_be16> (12 copies, 51 calls):
66 8b 07 mov (%rdi),%ax
55 push %rbp
48 89 e5 mov %rsp,%rbp
86 e0 xchg %ah,%al
5d pop %rbp
c3 retq
<get_unaligned_be32> (12 copies, 135 calls):
8b 07 mov (%rdi),%eax
55 push %rbp
48 89 e5 mov %rsp,%rbp
0f c8 bswap %eax
5d pop %rbp
c3 retq
<get_unaligned_be64> (2 copies, 20 calls):
48 8b 07 mov (%rdi),%rax
55 push %rbp
48 89 e5 mov %rsp,%rbp
48 0f c8 bswap %rax
5d pop %rbp
c3 retq
<__swab16p> (16 copies, 146 calls):
55 push %rbp
89 f8 mov %edi,%eax
86 e0 xchg %ah,%al
48 89 e5 mov %rsp,%rbp
5d pop %rbp
c3 retq
<__swab32p> (43 copies, ~560 calls):
55 push %rbp
89 f8 mov %edi,%eax
0f c8 bswap %eax
48 89 e5 mov %rsp,%rbp
5d pop %rbp
c3 retq
<__swab64p> (21 copies, 119 calls):
55 push %rbp
48 89 f8 mov %rdi,%rax
48 0f c8 bswap %rax
48 89 e5 mov %rsp,%rbp
5d pop %rbp
c3 retq
<__swab32s> (6 copies, 47 calls):
8b 07 mov (%rdi),%eax
55 push %rbp
48 89 e5 mov %rsp,%rbp
0f c8 bswap %eax
89 07 mov %eax,(%rdi)
5d pop %rbp
c3 retq
This patch fixes this via s/inline/__always_inline/.
Code size decrease after the patch is ~4.5k:
text data bss dec hex filename
92202377 20826112 36417536 149446025 8e85d89 vmlinux
92197848 20826112 36417536 149441496 8e84bd8 vmlinux5_swap_after
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Thu, 17 Mar 2016 21:22:41 +0000 (14:22 -0700)]
include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
Sometimes gcc mysteriously doesn't inline
very small functions we expect to be inlined. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
With this .config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
atomic_long_inc(), atomic_long_dec() and atomic_long_add()
functions get deinlined about 40 times. Examples of disassembly:
<atomic_long_inc> (21 copies, 147 calls):
55 push %rbp
48 89 e5 mov %rsp,%rbp
f0 48 ff 07 lock incq (%rdi)
5d pop %rbp
c3 retq
<atomic_long_dec> (4 copies, 14 calls) is similar to inc.
<atomic_long_add> (11 copies, 41 calls):
55 push %rbp
48 89 e5 mov %rsp,%rbp
f0 48 01 3e lock add %rdi,(%rsi)
5d pop %rbp
c3 retq
This patch fixes this via s/inline/__always_inline/.
Code size decrease after the patch is ~1.3k:
text data bss dec hex filename
92203657 20826112 36417536 149447305 8e86289 vmlinux
92202377 20826112 36417536 149446025 8e85d89 vmlinux4_atomiclong_after
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heikki Krogerus [Thu, 17 Mar 2016 21:22:38 +0000 (14:22 -0700)]
usb: common: convert to use match_string() helper
The new helper returns index of the mathing string in an array. We
would use it here.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Thu, 17 Mar 2016 21:22:35 +0000 (14:22 -0700)]
ide: hpt366: convert to use match_string() helper
The new helper returns index of the mathing string in an array. We
would use it here.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Thu, 17 Mar 2016 21:22:32 +0000 (14:22 -0700)]
ata: hpt366: convert to use match_string() helper
The new helper returns index of the mathing string in an array. We
would use it here.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Thu, 17 Mar 2016 21:22:29 +0000 (14:22 -0700)]
power: ab8500: convert to use match_string() helper
The new helper returns index of the mathing string in an array. We
would use it here.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Thu, 17 Mar 2016 21:22:26 +0000 (14:22 -0700)]
power: charger_manager: convert to use match_string() helper
The new helper returns index of the mathing string in an array. We
would use it here.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Thu, 17 Mar 2016 21:22:23 +0000 (14:22 -0700)]
drm/edid: convert to use match_string() helper
The new helper returns index of the mathing string in an array. We
would use it here.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Thu, 17 Mar 2016 21:22:20 +0000 (14:22 -0700)]
pinctrl: convert to use match_string() helper
The new helper returns index of the mathing string in an array. We
would use it here.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Thu, 17 Mar 2016 21:22:17 +0000 (14:22 -0700)]
device property: convert to use match_string() helper
The new helper returns index of the mathing string in an array. We
would use it here.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Thu, 17 Mar 2016 21:22:14 +0000 (14:22 -0700)]
lib/string: introduce match_string() helper
Occasionally we have to search for an occurrence of a string in an array
of strings. Make a simple helper for that purpose.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: David Airlie <airlied@linux.ie>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Thu, 17 Mar 2016 21:22:11 +0000 (14:22 -0700)]
radix-tree tests: add test for radix_tree_iter_next
Without fix test crashes inside tagged iteration.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Thu, 17 Mar 2016 21:22:08 +0000 (14:22 -0700)]
radix-tree tests: add regression3 test
After calling radix_tree_iter_retry(), 'slot' will be set to NULL. This
can cause radix_tree_next_slot() to dereference the NULL pointer. Add
Konstantin Khlebnikov's test to the regression framework.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reported-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Thu, 17 Mar 2016 21:22:06 +0000 (14:22 -0700)]
radix-tree,shmem: introduce radix_tree_iter_next()
shmem likes to occasionally drop the lock, schedule, then reacqire the
lock and continue with the iteration from the last place it left off.
This is currently done with a pretty ugly goto. Introduce
radix_tree_iter_next() and use it throughout shmem.c.
[koct9i@gmail.com: fix bug in radix_tree_iter_next() for tagged iteration]
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Thu, 17 Mar 2016 21:22:03 +0000 (14:22 -0700)]
mm: use radix_tree_iter_retry()
Instead of a 'goto restart', we can now use radix_tree_iter_retry() to
restart from our current position. This will make a difference when
there are more ways to happen across an indirect pointer. And it
eliminates some confusing gotos.
[vbabka@suse.cz: remove now-obsolete-and-misleading comment]
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Thu, 17 Mar 2016 21:22:00 +0000 (14:22 -0700)]
btrfs: use radix_tree_iter_retry()
Even though this is a 'can't happen' situation, use the new
radix_tree_iter_retry() pattern to eliminate a goto.
[akpm@linux-foundation.org: fix btrfs build]
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: David Sterba <dsterba@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Thu, 17 Mar 2016 21:21:57 +0000 (14:21 -0700)]
radix_tree: add radix_tree_dump
This is debug code which is #if 0 out.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Thu, 17 Mar 2016 21:21:54 +0000 (14:21 -0700)]
radix_tree: add support for multi-order entries
With huge pages, it is convenient to have the radix tree be able to
return an entry that covers multiple indices. Previous attempts to deal
with the problem have involved inserting N duplicate entries, which is a
waste of memory and leads to problems trying to handle aliased tags, or
probing the tree multiple times to find alternative entries which might
cover the requested index.
This approach inserts one canonical entry into the tree for a given
range of indices, and may also insert other entries in order to ensure
that lookups find the canonical entry.
This solution only tolerates inserting powers of two that are greater
than the fanout of the tree. If we wish to expand the radix tree's
abilities to support large-ish pages that is less than the fanout at the
penultimate level of the tree, then we would need to add one more step
in lookup to ensure that any sibling nodes in the final level of the
tree are dereferenced and we return the canonical entry that they
reference.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Thu, 17 Mar 2016 21:21:51 +0000 (14:21 -0700)]
radix_tree: loop based on shift count, not height
When we introduce entries that can cover multiple indices, we will need
to stop in __radix_tree_create based on the shift, not the height.
Split out for ease of bisect.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Thu, 17 Mar 2016 21:21:48 +0000 (14:21 -0700)]
radix_tree: tag all internal tree nodes as indirect pointers
Set the 'indirect_ptr' bit on all the pointers to internal nodes, not
just on the root node. This enables the following patches to support
multi-order entries in the radix tree. This patch is split out for ease
of bisection.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Thu, 17 Mar 2016 21:21:45 +0000 (14:21 -0700)]
radix tree test harness
This code is mostly from Andrew Morton and Nick Piggin; tarball downloaded
from http://ozlabs.org/~akpm/rtth.tar.gz with sha1sum
0ce679db9ec047296b5d1ff7a1dfaa03a7bef1bd
Some small modifications were necessary to the test harness to fix the
build with the current Linux source code.
I also made minor modifications to automatically test the radix-tree.c
and radix-tree.h files that are in the current source tree, as opposed
to a copied and slightly modified version. I am sure more could be done
to tidy up the harness, as well as adding more tests.
[koct9i@gmail.com: fix compilation]
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Thu, 17 Mar 2016 21:21:42 +0000 (14:21 -0700)]
radix-tree: add an explicit include of bitops.h
The radix-tree header uses the __ffs() function, which is defined in
bitops.h. The current kernel headers implicitly include bitops.h, but
the userspace test harness does not.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heiko Carstens [Thu, 17 Mar 2016 21:21:38 +0000 (14:21 -0700)]
lib/bug.c: make panic_on_warn available for all architectures
Christian Borntraeger reported that panic_on_warn doesn't have any
effect on s390.
The panic_on_warn feature was introduced with
9e3961a09798 ("kernel: add
panic_on_warn"). However it did care only for the case when
WANT_WARN_ON_SLOWPATH is defined. This is turn is only the case for
architectures which do not have an own __WARN_TAINT defined.
Other architectures which do have __WARN_TAINT defined call report_bug()
for warnings within lib/bug.c which does not call panic() in case
panic_on_warn is set.
Let's simply enable the panic_on_warn feature by adding the same code
like it was added to warn_slowpath_common() in panic.c.
This enables panic_on_warn also for arm64, parisc, powerpc, s390 and sh.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chen Gang [Thu, 17 Mar 2016 21:21:35 +0000 (14:21 -0700)]
include/linux/list_bl.h: use bool instead of int for boolean functions
hlist_bl_unhashed() and hlist_bl_empty() are all boolean functions, so
return bool instead of int.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Kershner [Thu, 17 Mar 2016 21:21:32 +0000 (14:21 -0700)]
MAINTAINERS: update s-Par driver maintainer list
Benjamin Romer is no longer a maintainer for the Unisys s-Par driver,
presently in drivers/staging/unisys/.
Signed-off-by: David Kershner <david.kershner@unisys.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ivan Delalande [Thu, 17 Mar 2016 21:21:30 +0000 (14:21 -0700)]
printk: add clear_idx symbol to vmcoreinfo
This allows us to extract from the vmcore only the messages emitted
since the last time the ring buffer was cleared. We just have to make
sure its value is always up-to-date, when old messages are discarded to
free space in log_make_free_space() for example.
Signed-off-by: Zeyu Zhao <zzy8200@gmail.com>
Signed-off-by: Ivan Delalande <colona@arista.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Thu, 17 Mar 2016 21:21:27 +0000 (14:21 -0700)]
printk: check CON_ENABLED in have_callable_console()
have_callable_console() must also test CON_ENABLED bit, not just
CON_ANYTIME. We may have disabled CON_ANYTIME console so printk can
wrongly assume that it's safe to call_console_drivers().
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Thu, 17 Mar 2016 21:21:23 +0000 (14:21 -0700)]
printk: set may_schedule for some of console_trylock() callers
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1, since
8d91f8b15361 ("printk: do
cond_resched() between lines while outputting to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
However, console_trylock() callers (among them is printk()) do not
always call printk() from atomic contexts, and some of them can
cond_resched() in console_unlock(), so console_trylock() can set
`console_may_schedule' to 1 for such processes.
For !CONFIG_PREEMPT_COUNT kernels, however, console_trylock() always
sets `console_may_schedule' to 0.
It's possible to drop explicit preempt_disable()/preempt_enable() in
vprintk_emit(), because console_unlock() and console_trylock() are now
smart enough:
a) console_unlock() does not cond_resched() when it's unsafe
(console_trylock() takes care of that)
b) console_unlock() does can_use_console() check.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Thu, 17 Mar 2016 21:21:20 +0000 (14:21 -0700)]
printk: move can_use_console() out of console_trylock_for_printk()
console_unlock() allows to cond_resched() if its caller has set
`console_may_schedule' to 1 (this functionality is present since
8d91f8b15361 ("printk: do cond_resched() between lines while outputting
to consoles").
The rules are:
-- console_lock() always sets `console_may_schedule' to 1
-- console_trylock() always sets `console_may_schedule' to 0
printk() calls console_unlock() with preemption desabled, which
basically can lead to RCU stalls, watchdog soft lockups, etc. if
something is simultaneously calling printk() frequent enough (IOW,
console_sem owner always has new data to send to console divers and
can't leave console_unlock() for a long time).
printk()->console_trylock() callers do not necessarily execute in atomic
contexts, and some of them can cond_resched() in console_unlock().
console_trylock() can set `console_may_schedule' to 1 (allow
cond_resched() later in consoe_unlock()) when it's safe.
This patch (of 3):
vprintk_emit() disables preemption around console_trylock_for_printk()
and console_unlock() calls for a strong reason -- can_use_console()
check. The thing is that vprintl_emit() can be called on a CPU that is
not fully brought up yet (!cpu_online()), which potentially can cause
problems if console driver wants to access per-cpu data. A console
driver can explicitly state that it's safe to call it from !online cpu
by setting CON_ANYTIME bit in console ->flags. That's why for
!cpu_online() can_use_console() iterates all the console to find out if
there is a CON_ANYTIME console, otherwise console_unlock() must be
avoided.
can_use_console() ensures that console_unlock() call is safe in
vprintk_emit() only; console_lock() and console_trylock() are not
covered by this check. Even though call_console_drivers(), invoked from
console_cont_flush() and console_unlock(), tests `!cpu_online() &&
CON_ANYTIME' for_each_console(), it may be too late, which can result in
messages loss.
Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
CON_ANYTIME consoles available.
CPU0 online CPU1 !online
console_trylock()
...
console_unlock()
console_cont_flush
spin_lock logbuf_lock
if (!cont.len) {
spin_unlock logbuf_lock
return
}
for (;;) {
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
spin_lock logbuf_lock
!console_trylock_for_printk msg_print_text
return console_idx = log_next()
console_seq++
console_prev = msg->flags
spin_unlock logbuf_lock
call_console_drivers()
for_each_console(con) {
if (!cpu_online() &&
!(con->flags & CON_ANYTIME))
continue;
}
/*
* no message printed, we lost it
*/
vprintk_emit
spin_lock logbuf_lock
log_store
spin_unlock logbuf_lock
!console_trylock_for_printk
return
/*
* go to the beginning of the loop,
* find out there are new messages,
* lose it
*/
}
console_trylock()/console_lock() call on CPU1 may come from cpu
notifiers registered on that CPU. Since notifiers are not getting
unregistered when CPU is going DOWN, all of the notifiers receive
notifications during CPU UP. For example, on my x86_64, I see around 50
notification sent from offline CPU to itself
[swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
[swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
while doing
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu2/online
So grabbing the console_sem lock while CPU is !online is possible,
in theory.
This patch moves can_use_console() check out of
console_trylock_for_printk(). Instead it calls it in console_unlock(),
so now console_lock()/console_unlock() are also 'protected' by
can_use_console(). This also means that console_trylock_for_printk() is
not really needed anymore and can be removed.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rob Landley [Thu, 17 Mar 2016 21:21:17 +0000 (14:21 -0700)]
include/uapi/linux/elf-em.h: remove v850
The v850 port was removed by commits
f606ddf42fd4 and
07a887d399b8 in
2008. These #defines are not used in the current kernel.
Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Thu, 17 Mar 2016 21:21:15 +0000 (14:21 -0700)]
fix Christoph's email addresses
There are various email addresses for me throughout the kernel. Use the
one that will always be valid.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Rostedt [Thu, 17 Mar 2016 21:21:12 +0000 (14:21 -0700)]
bug: set warn variable before calling WARN()
This has hit me a couple of times already. I would be debugging code
and the system would simply hang and then reboot. Finally, I found that
the problem was caused by WARN_ON_ONCE() and friends.
The macro WARN_ON_ONCE(condition) is defined as:
static bool __section(.data.unlikely) __warned;
int __ret_warn_once = !!(condition);
if (unlikely(__ret_warn_once))
if (WARN_ON(!__warned))
__warned = true;
unlikely(__ret_warn_once);
Which looks great and all. But what I have hit, is an issue when
WARN_ON() itself hits the same WARN_ON_ONCE() code. Because, the
variable __warned is not yet set. Then it too calls WARN_ON() and that
triggers the warning again. It keeps doing this until the stack is
overflowed and the system crashes.
By setting __warned first before calling WARN_ON() makes the original
WARN_ON_ONCE() really only warn once, and not an infinite amount of
times if the WARN_ON() also triggers the warning.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 17 Mar 2016 21:21:09 +0000 (14:21 -0700)]
arch/mn10300/kernel/fpu-nofpu.c: needs asm/elf.h
arch/mn10300/kernel/fpu-nofpu.c:27:36: error: unknown type name 'elf_fpregset_t'
int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpreg)
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 17 Mar 2016 21:21:06 +0000 (14:21 -0700)]
mn10300, c6x: CONFIG_GENERIC_BUG must depend on CONFIG_BUG
CONFIG_BUG=n && CONFIG_GENERIC_BUG=y make no sense and things break:
In file included from include/linux/page-flags.h:9:0,
from kernel/bounds.c:9:
include/linux/bug.h:91:47: warning: 'struct bug_entry' declared inside parameter list
static inline int is_warning_bug(const struct bug_entry *bug)
^
include/linux/bug.h:91:47: warning: its scope is only this definition or declaration, which is probably not what you want
include/linux/bug.h: In function 'is_warning_bug':
>> include/linux/bug.h:93:12: error: dereferencing pointer to incomplete type
return bug->flags & BUGFLAG_WARNING;
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Young [Thu, 17 Mar 2016 21:21:03 +0000 (14:21 -0700)]
proc-vmcore: wrong data type casting fix
On i686 PAE enabled machine the contiguous physical area could be large
and it can cause trimming down variables in below calculation in
read_vmcore() and mmap_vmcore():
tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);
That is, the types being used is like below on i686:
m->offset: unsigned long long int
m->size: unsigned long long int
*fpos: loff_t (long long int)
buflen: size_t (unsigned int)
So casting (m->offset + m->size - *fpos) by size_t means truncating a
given value by 4GB.
Suppose (m->offset + m->size - *fpos) being truncated to 0, buflen >0
then we will get tsz = 0. It is of course not an expected result.
Similarly we could also get other truncated values less than buflen.
Then the real size passed down is not correct any more.
If (m->offset + m->size - *fpos) is above 4GB, read_vmcore or
mmap_vmcore use the min_t result with truncated values being compared to
buflen. Then, fpos proceeds with the wrong value so that we reach below
bugs:
1) read_vmcore will refuse to continue so makedumpfile fails.
2) mmap_vmcore will trigger BUG_ON() in remap_pfn_range().
Use unsigned long long in min_t instead so that the variables in are not
truncated.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jianyu Zhan <nasa4836@gmail.com>
Cc: Minfei Huang <mhuang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minfei Huang [Thu, 17 Mar 2016 21:21:00 +0000 (14:21 -0700)]
proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan"
It is not elegant that prompt shell does not start from new line after
executing "cat /proc/$pid/wchan". Make prompt shell start from new
line.
Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Engestrom [Thu, 17 Mar 2016 21:20:57 +0000 (14:20 -0700)]
procfs: add conditional compilation check
`proc_timers_operations` is only used when CONFIG_CHECKPOINT_RESTORE is
enabled.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Stultz [Thu, 17 Mar 2016 21:20:54 +0000 (14:20 -0700)]
proc: add /proc/<pid>/timerslack_ns interface
This patch provides a proc/PID/timerslack_ns interface which exposes a
task's timerslack value in nanoseconds and allows it to be changed.
This allows power/performance management software to set timer slack for
other threads according to its policy for the thread (such as when the
thread is designated foreground vs. background activity)
If the value written is non-zero, slack is set to that value. Otherwise
sets it to the default for the thread.
This interface checks that the calling task has permissions to to use
PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we can ensure
arbitrary apps do not change the timer slack for other apps.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Stultz [Thu, 17 Mar 2016 21:20:51 +0000 (14:20 -0700)]
timer: convert timer_slack_ns from unsigned long to u64
This patchset introduces a /proc/<pid>/timerslack_ns interface which
would allow controlling processes to be able to set the timerslack value
on other processes in order to save power by avoiding wakeups (Something
Android currently does via out-of-tree patches).
The first patch tries to fix the internal timer_slack_ns usage which was
defined as a long, which limits the slack range to ~4 seconds on 32bit
systems. It converts it to a u64, which provides the same basically
unlimited slack (500 years) on both 32bit and 64bit machines.
The second patch introduces the /proc/<pid>/timerslack_ns interface
which allows the full 64bit slack range for a task to be read or set on
both 32bit and 64bit machines.
With these two patches, on a 32bit machine, after setting the slack on
bash to 10 seconds:
$ time sleep 1
real 0m10.747s
user 0m0.001s
sys 0m0.005s
The first patch is a little ugly, since I had to chase the slack delta
arguments through a number of functions converting them to u64s. Let me
know if it makes sense to break that up more or not.
Other than that things are fairly straightforward.
This patch (of 2):
The timer_slack_ns value in the task struct is currently a unsigned
long. This means that on 32bit applications, the maximum slack is just
over 4 seconds. However, on 64bit machines, its much much larger (~500
years).
This disparity could make application development a little (as well as
the default_slack) to a u64. This means both 32bit and 64bit systems
have the same effective internal slack range.
Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
the interface as a unsigned long, so we preserve that limitation on
32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
actually larger then what can be stored by an unsigned long.
This patch also modifies hrtimer functions which specified the slack
delta as a unsigned long.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Thu, 17 Mar 2016 21:20:48 +0000 (14:20 -0700)]
mm,oom: do not loop !__GFP_FS allocation if the OOM killer is disabled
After the OOM killer is disabled during suspend operation, any
!__GFP_NOFAIL && __GFP_FS allocations are forced to fail. Thus, any
!__GFP_NOFAIL && !__GFP_FS allocations should be forced to fail as well.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Thu, 17 Mar 2016 21:20:45 +0000 (14:20 -0700)]
mm,oom: make oom_killer_disable() killable
While oom_killer_disable() is called by freeze_processes() after all
user threads except the current thread are frozen, it is possible that
kernel threads invoke the OOM killer and sends SIGKILL to the current
thread due to sharing the thawed victim's memory. Therefore, checking
for SIGKILL is preferable than TIF_MEMDIE.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Thu, 17 Mar 2016 21:20:42 +0000 (14:20 -0700)]
mm/zsmalloc: add `freeable' column to pool stat
Add a new column to pool stats, which will tell how many pages ideally
can be freed by class compaction, so it will be easier to analyze
zsmalloc fragmentation.
At the moment, we have only numbers of FULL and ALMOST_EMPTY classes,
but they don't tell us how badly the class is fragmented internally.
The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows:
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
[..]
12 224 0 2 146 5 8 4 4
13 240 0 0 0 0 0 1 0
14 256 1 13 1840 1672 115 1 10
15 272 0 0 0 0 0 1 0
[..]
49 816 0 3 745 735 149 1 2
51 848 3 4 361 306 76 4 8
52 864 12 14 378 268 81 3 21
54 896 1 12 117 57 26 2 12
57 944 0 0 0 0 0 3 0
[..]
Total 26 131 12709 10994 1071 134
For example, from this particular output we can easily conclude that
class-896 is heavily fragmented -- it occupies 26 pages, 12 can be freed
by compaction.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
YiPing Xu [Thu, 17 Mar 2016 21:20:39 +0000 (14:20 -0700)]
zsmalloc: drop unused member 'mapping_area->huge'
When unmapping a huge class page in zs_unmap_object, the page will be
unmapped by kmap_atomic. the "!area->huge" branch in __zs_unmap_object
is alway true, and no code set "area->huge" now, so we can drop it.
Signed-off-by: YiPing Xu <xuyiping@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shawn Lin [Thu, 17 Mar 2016 21:20:37 +0000 (14:20 -0700)]
mm/vmalloc: use PAGE_ALIGNED() to check PAGE_SIZE alignment
We have PAGE_ALIGNED() in mm.h, so let's use it instead of IS_ALIGNED()
for checking PAGE_SIZE aligned case.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Thu, 17 Mar 2016 21:20:34 +0000 (14:20 -0700)]
mm: memcontrol: zap oom_info_lock
mem_cgroup_print_oom_info is always called under oom_lock, so
oom_info_lock is redundant.
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Thu, 17 Mar 2016 21:20:31 +0000 (14:20 -0700)]
mm: memcontrol: clarify the uncharge_list() loop
uncharge_list() does an unusual list walk because the function can take
regular lists with dedicated list_heads as well as singleton lists where
a single page is passed via the page->lru list node.
This can sometimes lead to confusion as well as suggestions to replace
the loop with a list_for_each_entry(), which wouldn't work.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Thu, 17 Mar 2016 21:20:28 +0000 (14:20 -0700)]
mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage
Setting the original memory.limit_in_bytes hardlimit is subject to a
race condition when the desired value is below the current usage. The
code tries a few times to first reclaim and then see if the usage has
dropped to where we would like it to be, but there is no locking, and
the workload is free to continue making new charges up to the old limit.
Thus, attempting to shrink a workload relies on pure luck and hope that
the workload happens to cooperate.
To fix this in the cgroup2 memory.max knob, do it the other way round:
set the limit first, then try enforcement. And if reclaim is not able
to succeed, trigger OOM kills in the group. Keep going until the new
limit is met, we run out of OOM victims and there's only unreclaimable
memory left, or the task writing to memory.max is killed. This allows
users to shrink groups reliably, and the behavior is consistent with
what happens when new charges are attempted in excess of memory.max.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Thu, 17 Mar 2016 21:20:25 +0000 (14:20 -0700)]
mm: memcontrol: reclaim when shrinking memory.high below usage
When setting memory.high below usage, nothing happens until the next
charge comes along, and then it will only reclaim its own charge and not
the now potentially huge excess of the new memory.high. This can cause
groups to stay in excess of their memory.high indefinitely.
To fix that, when shrinking memory.high, kick off a reclaim cycle that
goes after the delta.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Thu, 17 Mar 2016 21:20:22 +0000 (14:20 -0700)]
tools/vm/page-types.c: avoid memset() in walk_pfn() when count == 1
I found that page-types is very slow and my testing shows many timeout
errors. Here's an example with a simple program allocating 1000 thps.
$ time ./page-types -p $(pgrep -f test_alloc)
...
real 0m17.201s
user 0m16.889s
sys 0m0.312s
Most of time is spent in memset(). Currently memset() clears over whole
buffer for every walk_pfn() call, which is inefficient when walk_pfn()
is called from walk_vma(), because in that case walk_pfn() is called for
each pfn. So this patch limits the zero initialization only for the
first element.
$ time ./page-types.patched -p $(pgrep -f test_alloc)
...
real 0m0.182s
user 0m0.046s
sys 0m0.135s
Fixes:
954e95584579 ("tools/vm/page-types.c: add memory cgroup dumping and filtering")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Suggested-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zhang [Thu, 17 Mar 2016 21:20:19 +0000 (14:20 -0700)]
powerpc/mm: enable page parallel initialisation
Parallel initialisation has been enabled for X86, boot time is improved
greatly. On Power8, it is improved greatly for small memory. Here is
the result from my test on Power8 platform:
For 4GB of memory, boot time is improved by 59%, from 24.5s to 10s.
For 50GB memory, boot time is improved by 22%, from 56.8s to 43.8s.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zhang [Thu, 17 Mar 2016 21:20:16 +0000 (14:20 -0700)]
mm: meminit: initialise more memory for inode/dentry hash tables in early boot
Upstream has supported page parallel initialisation for X86 and the boot
time is improved greately. Some tests have been done for Power.
Here is the result I have done with different memory size.
* 4GB memory:
boot time is as the following:
with patch vs without patch: 10.4s vs 24.5s
boot time is improved 57%
* 200GB memory:
boot time looks the same with and without patches.
boot time is about 38s
* 32TB memory:
boot time looks the same with and without patches
boot time is about 160s.
The boot time is much shorter than X86 with 24TB memory.
From community discussion, it costs about 694s for X86 24T system.
Parallel initialisation improves the performance by deferring memory
initilisation to kswap with N kthreads, it should improve the performance
therotically.
In testing on X86, performance is improved greatly with huge memory. But
on Power platform, it is improved greatly with less than 100GB memory.
For huge memory, it is not improved greatly. But it saves the time with
several threads at least, as the following information shows(32TB system
log):
[ 22.648169] node 9 initialised,
16607461 pages in 280ms
[ 22.783772] node 3 initialised,
23937243 pages in 410ms
[ 22.858877] node 6 initialised,
29179347 pages in 490ms
[ 22.863252] node 2 initialised,
29179347 pages in 490ms
[ 22.907545] node 0 initialised,
32049614 pages in 540ms
[ 22.920891] node 15 initialised,
32212280 pages in 550ms
[ 22.923236] node 4 initialised,
32306127 pages in 550ms
[ 22.923384] node 12 initialised,
32314319 pages in 550ms
[ 22.924754] node 8 initialised,
32314319 pages in 550ms
[ 22.940780] node 13 initialised,
33353677 pages in 570ms
[ 22.940796] node 11 initialised,
33353677 pages in 570ms
[ 22.941700] node 5 initialised,
33353677 pages in 570ms
[ 22.941721] node 10 initialised,
33353677 pages in 570ms
[ 22.941876] node 7 initialised,
33353677 pages in 570ms
[ 22.944946] node 14 initialised,
33353677 pages in 570ms
[ 22.946063] node 1 initialised,
33345485 pages in 580ms
It saves the time about 550*16 ms at least, although it can be ignore to
compare the boot time about 160 seconds. What's more, the boot time is
much shorter on Power even without patches than x86 for huge memory
machine.
So this patchset is still necessary to be enabled for Power.
This patch (of 2):
This patch is based on Mel Gorman's old patch in the mailing list,
https://lkml.org/lkml/2015/5/5/280 which is discussed but it is fixed with
a completion to wait for all memory initialised in page_alloc_init_late().
It is to fix the OOM problem on X86 with 24TB memory which allocates
memory in late initialisation. But for Power platform with 32TB memory,
it causes a call trace in vfs_caches_init->inode_init() and inode hash
table needs more memory. So this patch allocates 1GB for 0.25TB/node for
large system as it is mentioned in https://lkml.org/lkml/2015/5/1/627
This call trace is found on Power with 32TB memory, 1024CPUs, 16nodes.
Currently, it only allocates 2GB*16=32GB for early initialisation. But
Dentry cache hash table needes 16GB and Inode cache hash table needs 16GB.
So the system have no enough memory for it. The log from dmesg as the
following:
Dentry cache hash table entries:
2147483648 (order: 18,
17179869184 bytes)
vmalloc: allocation failure, allocated
16021913600 of
17179934720 bytes
swapper/0: page allocation failure: order:0,mode:0x2080020
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-0-ppc64
Call Trace:
.dump_stack+0xb4/0xb664 (unreliable)
.warn_alloc_failed+0x114/0x160
.__vmalloc_area_node+0x1a4/0x2b0
.__vmalloc_node_range+0xe4/0x110
.__vmalloc_node+0x40/0x50
.alloc_large_system_hash+0x134/0x2a4
.inode_init+0xa4/0xf0
.vfs_caches_init+0x80/0x144
.start_kernel+0x40c/0x4e0
start_here_common+0x20/0x4a4
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 17 Mar 2016 21:20:13 +0000 (14:20 -0700)]
thp: fix deadlock in split_huge_pmd()
split_huge_pmd() tries to munlock page with munlock_vma_page(). That
requires the page to locked.
If the is locked by caller, we would get a deadlock:
Unable to find swap-space signature
INFO: task trinity-c85:1907 blocked for more than 120 seconds.
Not tainted
4.4.0-00032-gf19d0bdced41-dirty #1606
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
trinity-c85 D
ffff88084d997608 0 1907 309 0x00000000
Call Trace:
schedule+0x9f/0x1c0
schedule_timeout+0x48e/0x600
io_schedule_timeout+0x1c3/0x390
bit_wait_io+0x29/0xd0
__wait_on_bit_lock+0x94/0x140
__lock_page+0x1d4/0x280
__split_huge_pmd+0x5a8/0x10f0
split_huge_pmd_address+0x1d9/0x230
try_to_unmap_one+0x540/0xc70
rmap_walk_anon+0x284/0x810
rmap_walk_locked+0x11e/0x190
try_to_unmap+0x1b1/0x4b0
split_huge_page_to_list+0x49d/0x18a0
follow_page_mask+0xa36/0xea0
SyS_move_pages+0xaf3/0x1570
entry_SYSCALL_64_fastpath+0x12/0x6b
2 locks held by trinity-c85/1907:
#0: (&mm->mmap_sem){++++++}, at: SyS_move_pages+0x933/0x1570
#1: (&anon_vma->rwsem){++++..}, at: split_huge_page_to_list+0x402/0x18a0
I don't think the deadlock is triggerable without split_huge_page()
simplifilcation patchset.
But munlock_vma_page() here is wrong: we want to munlock the page
unconditionally, no need in rmap lookup, that munlock_vma_page() does.
Let's use clear_page_mlock() instead. It can be called under ptl.
Fixes:
e90309c9f772 ("thp: allow mlocked THP again")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 17 Mar 2016 21:20:10 +0000 (14:20 -0700)]
thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
freeze_page() and unfreeze_page() helpers evolved in rather complex
beasts. It would be nice to cut complexity of this code.
This patch rewrites freeze_page() using standard try_to_unmap().
unfreeze_page() is rewritten with remove_migration_ptes().
The result is much simpler.
But the new variant is somewhat slower for PTE-mapped THPs. Current
helpers iterates over VMAs the compound page is mapped to, and then over
ptes within this VMA. New helpers iterates over small page, then over
VMA the small page mapped to, and only then find relevant pte.
We have short cut for PMD-mapped THP: we directly install migration
entries on PMD split.
I don't think the slowdown is critical, considering how much simpler
result is and that split_huge_page() is quite rare nowadays. It only
happens due memory pressure or migration.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 17 Mar 2016 21:20:07 +0000 (14:20 -0700)]
mm: make remove_migration_ptes() beyond mm/migration.c
Make remove_migration_ptes() available to be used in split_huge_page().
New parameter 'locked' added: as with try_to_umap() we need a way to
indicate that caller holds rmap lock.
We also shouldn't try to mlock() pte-mapped huge pages: pte-mapeed THP
pages are never mlocked.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 17 Mar 2016 21:20:04 +0000 (14:20 -0700)]
rmap: extend try_to_unmap() to be usable by split_huge_page()
Add support for two ttu_flags:
- TTU_SPLIT_HUGE_PMD would split PMD if it's there, before trying to
unmap page;
- TTU_RMAP_LOCKED indicates that caller holds relevant rmap lock;
Also, change rwc->done to !page_mapcount() instead of !page_mapped().
try_to_unmap() works on pte level, so we are really interested in the
mappedness of this small page rather than of the compound page it's a
part of.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 17 Mar 2016 21:20:01 +0000 (14:20 -0700)]
rmap: introduce rmap_walk_locked()
This patchset rewrites freeze_page() and unfreeze_page() using
try_to_unmap() and remove_migration_ptes(). Result is much simpler, but
somewhat slower.
Migration 8GiB worth of PMD-mapped THP:
Baseline 20.21 +/- 0.393
Patched 20.73 +/- 0.082
Slowdown 1.03x
It's 3% slower, comparing to 14% in v1. I don't it should be a stopper.
Splitting of PTE-mapped pages slowed more. But this is not a common
case.
Migration 8GiB worth of PMD-mapped THP:
Baseline 20.39 +/- 0.225
Patched 22.43 +/- 0.496
Slowdown 1.10x
rmap_walk_locked() is the same as rmap_walk(), but the caller takes care
of the relevant rmap lock.
This is preparation for switching THP splitting from custom rmap walk in
freeze_page()/unfreeze_page() to the generic one.
There is no support for KSM pages for now: not clear which lock is
implied.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 17 Mar 2016 21:19:58 +0000 (14:19 -0700)]
mm: ZONE_DEVICE depends on SPARSEMEM_VMEMMAP
The primary use case for devm_memremap_pages() is to allocate an memmap
array from persistent memory. That capabilty requires vmem_altmap which
requires SPARSEMEM_VMEMMAP.
Also, without SPARSEMEM_VMEMMAP the addition of ZONE_DEVICE expands
ZONES_WIDTH and triggers the:
"Unfortunate NUMA and NUMA Balancing config, growing page-frame for
last_cpupid."
...warning in mm/memory.c. SPARSEMEM_VMEMMAP=n && ZONE_DEVICE=y is not
a configuration we should worry about supporting.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 17 Mar 2016 21:19:55 +0000 (14:19 -0700)]
mm: remove VM_FAULT_MINOR
The define has a comment from Nick Piggin from 2007:
/* For backwards compat. Remove me quickly. */
I guess 9 years should not be too hurried sense of 'quickly' even for
kernel measures.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Thu, 17 Mar 2016 21:19:53 +0000 (14:19 -0700)]
mm: percpu: use pr_fmt to prefix output
Use the normal mechanism to make the logging output consistently
"percpu:" instead of a mix of "PERCPU:" and "percpu:"
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>