GitHub/moto-9609/android_kernel_motorola_exynos9610.git
11 years agoregulatory: pass new regdomain to reset function
Johannes Berg [Thu, 6 Dec 2012 14:44:07 +0000 (15:44 +0100)]
regulatory: pass new regdomain to reset function

Instead of assigning after calling the function do
it inside the function. This will later avoid a
period of time where the pointer is NULL.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: remove handling of channel bandwidth
Johannes Berg [Tue, 4 Dec 2012 14:07:34 +0000 (15:07 +0100)]
regulatory: remove handling of channel bandwidth

The channel bandwidth handling isn't really quite right,
it assumes that a 40 MHz channel is really two 20 MHz
channels, which isn't strictly true. This is the way the
regulatory database handling is defined right now though
so remove the logic to handle other channel widths.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: fix reg_is_valid_request handling
Johannes Berg [Mon, 3 Dec 2012 23:48:59 +0000 (00:48 +0100)]
regulatory: fix reg_is_valid_request handling

There's a bug with the world regulatory domain, it
can be updated any time which is different from all
other regdomains that can only be updated once after
a request for them. Fix this by adding a check for
"processed" to the reg_is_valid_request() function
and clear that when doing a request.

While looking at this I also found another locking
bug, last_request is protected by the reg_mutex not
the cfg80211_mutex so the code in nl80211 is racy.
Remove that code as it only tries to prevent an
allocation in an error case, which isn't necessary.
Then the function can also become static and locking
in nl80211 can have a smaller scope.

Also change __set_regdom() to do the checks earlier
and not different for world/other regdomains.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: remove locking from wiphy_apply_custom_regulatory
Johannes Berg [Mon, 3 Dec 2012 23:19:24 +0000 (00:19 +0100)]
regulatory: remove locking from wiphy_apply_custom_regulatory

wiphy_apply_custom_regulatory() doesn't have to hold
the regulatory mutex as it only modifies the given
wiphy with the given regulatory domain, it doesn't
access any global regulatory data.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: clarify locking rules and assertions
Johannes Berg [Mon, 3 Dec 2012 22:00:08 +0000 (23:00 +0100)]
regulatory: clarify locking rules and assertions

Many places that currently check that cfg80211_mutex
is held don't actually use any data protected by it.
The functions that need to hold the cfg80211_mutex
are the ones using the cfg80211_regdomain variable,
so add the lock assertion to those and clarify this
in the comments.

The reason for this is that nl80211 uses the regdom
without being able to hold reg_mutex.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: simplify freq_reg_info_regd
Johannes Berg [Mon, 3 Dec 2012 23:14:17 +0000 (00:14 +0100)]
regulatory: simplify freq_reg_info_regd

The function itself has dual-purpose: it can
retrieve from a given regdomain or from the
globally installed one. Change it to have a
single purpose only: to look up from a given
regdomain. Pass the correct regdomain in the
freq_reg_info() function instead.

This also changes the locking rules for it,
no locking is required any more.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: remove useless warning
Johannes Berg [Mon, 3 Dec 2012 18:12:02 +0000 (19:12 +0100)]
regulatory: remove useless warning

Even if it never happens and is hidden behind the
debug config option, it's completely useless: the
calltrace will only show module loading.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: remove redundant isalpha() check
Johannes Berg [Mon, 3 Dec 2012 17:59:58 +0000 (18:59 +0100)]
regulatory: remove redundant isalpha() check

toupper() only modifies lower-case letters, so
the isalpha() check is redundant; remove it.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: simplify restore_regulatory_settings
Johannes Berg [Mon, 3 Dec 2012 17:56:41 +0000 (18:56 +0100)]
regulatory: simplify restore_regulatory_settings

Use list_splice_tail_init() and also simplify the locking.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: remove BUG_ON
Johannes Berg [Mon, 3 Dec 2012 17:36:09 +0000 (18:36 +0100)]
regulatory: remove BUG_ON

This code is a bit too BUG_ON happy, remove all
instances and while doing so make some code a bit
smarter by passing the right pointer instead of
indices into arrays.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agocfg80211: remove wiphy_idx_valid
Johannes Berg [Mon, 3 Dec 2012 17:23:37 +0000 (18:23 +0100)]
cfg80211: remove wiphy_idx_valid

This is pretty much useless since get_wiphy_idx()
always returns true since it's always called with
a valid wiphy pointer.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: use proper enum for return values
Johannes Berg [Mon, 3 Dec 2012 16:54:55 +0000 (17:54 +0100)]
regulatory: use proper enum for return values

Instead of treating special error codes specially,
like -EALREADY, introduce a real enum for all the
needed possibilities and use it.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: remove useless locking on exit
Johannes Berg [Mon, 3 Dec 2012 16:59:24 +0000 (17:59 +0100)]
regulatory: remove useless locking on exit

It would be a major problem if anything were to run
concurrently while the module is being unloaded so
remove the locking that doesn't help anything.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: code cleanup
Johannes Berg [Mon, 3 Dec 2012 16:21:11 +0000 (17:21 +0100)]
regulatory: code cleanup

Clean up various things like indentation, extra
parentheses, too many/few line breaks, etc.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: simplify regulatory_hint_11d
Johannes Berg [Mon, 3 Dec 2012 16:21:26 +0000 (17:21 +0100)]
regulatory: simplify regulatory_hint_11d

There's no need to unlock before calling
queue_regulatory_request(), so simplify
the function.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: don't test list before iterating
Johannes Berg [Mon, 3 Dec 2012 16:32:01 +0000 (17:32 +0100)]
regulatory: don't test list before iterating

There's no need to test whether a list is
empty or not before iterating.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: clean up reg_copy_regd()
Johannes Berg [Mon, 3 Dec 2012 15:59:46 +0000 (16:59 +0100)]
regulatory: clean up reg_copy_regd()

Use ERR_PTR/IS_ERR to return the result or errors,
also do some code cleanups.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: clean up regdom_intersect
Johannes Berg [Thu, 6 Dec 2012 16:26:17 +0000 (17:26 +0100)]
regulatory: clean up regdom_intersect

As the dummy_rule (also renamed from irule) is only
used for output by the reg_rules_intersect() function
there's no need to clear it at all, remove that.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: don't allocate too much memory
Johannes Berg [Tue, 4 Dec 2012 11:49:16 +0000 (12:49 +0100)]
regulatory: don't allocate too much memory

There's no need to allocate one reg rule more
than will be used, reduce the allocations. The
allocation in nl80211 already doesn't allocate
too much space.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoregulatory: don't write past array when intersecting rules
Johannes Berg [Thu, 6 Dec 2012 16:03:17 +0000 (17:03 +0100)]
regulatory: don't write past array when intersecting rules

When intersecting rules, we count first to know how many
rules need to be allocated, and then do the intersection
into the allocated array. However, the code doing this
writes past the end of the array because it attempts to
do all intersections. Make it stop when the right number
of rules has been reached.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: remove a bit of dead mesh code
Johannes Berg [Sat, 15 Dec 2012 08:45:50 +0000 (09:45 +0100)]
mac80211: remove a bit of dead mesh code

In a file that's only built when CONFIG_MAC80211_MESH
is defined, having an #ifdef on the same is entirely
pointless, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: optimise roaming time again
Johannes Berg [Thu, 13 Dec 2012 22:49:02 +0000 (23:49 +0100)]
mac80211: optimise roaming time again

The last fixes re-added the RCU synchronize penalty
on roaming to fix the races. Split up sta_info_flush()
now to get rid of that again, and let managed mode
(and only it) delay the actual destruction.

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: warn if unexpectedly removing stations
Johannes Berg [Thu, 13 Dec 2012 22:26:57 +0000 (23:26 +0100)]
mac80211: warn if unexpectedly removing stations

When an interface is brought down it must have been
disconnected (or similar) in all modes other than WDS,
so warn if any stations were removed in other modes.

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: remove final sta_info_flush()
Johannes Berg [Thu, 13 Dec 2012 22:07:46 +0000 (23:07 +0100)]
mac80211: remove final sta_info_flush()

When all interfaces have been removed, there can't
be any stations left over, so there's no need to
flush again. Remove this, and all code associated
with it, which also simplifies the function.

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agowireless: more 'capability info' bits
Vladimir Kondratiev [Tue, 18 Dec 2012 07:55:33 +0000 (09:55 +0200)]
wireless: more 'capability info' bits

define bits for 'capability info', as in recent spec edition
IEEE802.11-2012

Also, add mask for 2-bit field 'bss type', as it is in 802.11ad

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211_hwsim: allow testing paged RX
Johannes Berg [Mon, 10 Dec 2012 10:57:42 +0000 (11:57 +0100)]
mac80211_hwsim: allow testing paged RX

Paged RX, i.e. SKBs with (some of) the data in pages instead
of the SKB header data (skb->data) can behave differently in
the stack and cause other bugs. To make debugging easier add
an option to hwsim to test with such SKBs.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: use short slot time in mesh for 5GHz
Chun-Yeow Yeoh [Thu, 13 Dec 2012 10:59:57 +0000 (18:59 +0800)]
mac80211: use short slot time in mesh for 5GHz

Use short slot time in 5GHz for mesh. The performance is
increased from 16.4Mbps to 23.4Mbps for two directly
connected mesh STAs operating in legacy rate using iperf
measurement. Almost similar to the results claimed in IBSS
mode.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
[call ieee80211_get_sdata_band() only once]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: Allow disabling SGI-20
Ben Greear [Thu, 13 Dec 2012 00:56:20 +0000 (16:56 -0800)]
mac80211: Allow disabling SGI-20

This allows user-space (wpa_supplicant) to disable
short guard interval (SGI) for 20Mhz.  The SGI-40
disable option is already handled.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agoMerge remote-tracking branch 'mac80211/master' into HEAD
Johannes Berg [Thu, 3 Jan 2013 12:01:11 +0000 (13:01 +0100)]
Merge remote-tracking branch 'mac80211/master' into HEAD

11 years agomac80211: fix maximum MTU
Chaitanya [Fri, 21 Dec 2012 11:45:17 +0000 (17:15 +0530)]
mac80211: fix maximum MTU

The maximum MTU shouldn't take the headers into account,
the maximum MSDU size is exactly the maximum MTU.

Signed-off-by: T Krishna Chaitanya <chaitanyatk@posedge.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: fix dtim_period in hidden SSID AP association
Johannes Berg [Mon, 10 Dec 2012 14:38:14 +0000 (16:38 +0200)]
mac80211: fix dtim_period in hidden SSID AP association

When AP's SSID is hidden the BSS can appear several times in
cfg80211's BSS list: once with a zero-length SSID that comes
from the beacon, and once for each SSID from probe reponses.

Since the mac80211 stores its data in ieee80211_bss which
is embedded into cfg80211_bss, mac80211's data will be
duplicated too.

This becomes a problem when a driver needs the dtim_period
since this data exists only in the beacon's instance in
cfg80211 bss table which isn't the instance that is used
when associating.

Remove the DTIM period from the BSS table and track it
explicitly to avoid this problem.

Cc: stable@vger.kernel.org
Tested-by: Efi Tubul <efi.tubul@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: use del_timer_sync for final sta cleanup timer deletion
Johannes Berg [Thu, 13 Dec 2012 22:08:52 +0000 (23:08 +0100)]
mac80211: use del_timer_sync for final sta cleanup timer deletion

This is a very old bug, but there's nothing that prevents the
timer from running while the module is being removed when we
only do del_timer() instead of del_timer_sync().

The timer should normally not be running at this point, but
it's not clearly impossible (or we could just remove this.)

Cc: stable@vger.kernel.org
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: fix station destruction in AP/mesh modes
Johannes Berg [Thu, 13 Dec 2012 21:54:58 +0000 (22:54 +0100)]
mac80211: fix station destruction in AP/mesh modes

Unfortunately, commit b22cfcfcae5b, intended to speed up roaming
by avoiding the synchronize_rcu() broke AP/mesh modes as it moved
some code into that work item that will still call into the driver
at a time where it's no longer expected to handle this: after the
AP or mesh has been stopped.

To fix this problem remove the per-station work struct, maintain a
station cleanup list instead and flush this list when stations are
flushed. To keep this patch smaller for stable, do this when the
stations are flushed (sta_info_flush()). This unfortunately brings
back the original roaming delay; I'll fix that again in a separate
patch.

Also, Ben reported that the original commit could sometimes (with
many interfaces) cause long delays when an interface is set down,
due to blocking on flush_workqueue(). Since we now maintain the
cleanup list, this particular change of the original patch can be
reverted.

Cc: stable@vger.kernel.org [3.7]
Reported-by: Ben Greear <greearb@candelatech.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: RMC buckets are just list heads
Thomas Pedersen [Tue, 18 Dec 2012 02:41:57 +0000 (18:41 -0800)]
mac80211: RMC buckets are just list heads

The array of rmc_entrys is redundant since only the
list_head is used. Make this an array of list_heads
instead and save ~6k per vif at runtime :D

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: assign VLAN channel contexts
Johannes Berg [Tue, 11 Dec 2012 19:38:41 +0000 (20:38 +0100)]
mac80211: assign VLAN channel contexts

Make AP_VLAN type interfaces track the AP master channel
context so they have one assigned for the various lookups.
Don't give them their own refcount etc. since they're just
slaves to the AP master.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: flush AP_VLAN stations when tearing down the BSS AP
Felix Fietkau [Mon, 10 Dec 2012 19:02:34 +0000 (20:02 +0100)]
mac80211: flush AP_VLAN stations when tearing down the BSS AP

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[change to flush stations with AP flush in second loop]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agomac80211: fix ibss scanning
Stanislaw Gruszka [Tue, 11 Dec 2012 09:48:23 +0000 (10:48 +0100)]
mac80211: fix ibss scanning

Do not scan on no-IBSS and disabled channels in IBSS mode. Doing this
can trigger Microcode errors on iwlwifi and iwlegacy drivers.

Also rename ieee80211_request_internal_scan() function since it is only
used in IBSS mode and simplify calling it from ieee80211_sta_find_ibss().

This patch should address:
https://bugzilla.redhat.com/show_bug.cgi?id=883414
https://bugzilla.kernel.org/show_bug.cgi?id=49411

Reported-by: Jesse Kahtava <jesse_kahtava@f-m.fm>
Reported-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
12 years agoLinux 3.8-rc1
Linus Torvalds [Sat, 22 Dec 2012 01:19:00 +0000 (17:19 -0800)]
Linux 3.8-rc1

12 years agoMerge git://www.linux-watchdog.org/linux-watchdog
Linus Torvalds [Sat, 22 Dec 2012 01:10:29 +0000 (17:10 -0800)]
Merge git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:
 "This includes some fixes and code improvements (like
  clk_prepare_enable and clk_disable_unprepare), conversion from the
  omap_wdt and twl4030_wdt drivers to the watchdog framework, addition
  of the SB8x0 chipset support and the DA9055 Watchdog driver and some
  OF support for the davinci_wdt driver."

* git://www.linux-watchdog.org/linux-watchdog: (22 commits)
  watchdog: mei: avoid oops in watchdog unregister code path
  watchdog: Orion: Fix possible null-deference in orion_wdt_probe
  watchdog: sp5100_tco: Add SB8x0 chipset support
  watchdog: davinci_wdt: add OF support
  watchdog: da9052: Fix invalid free of devm_ allocated data
  watchdog: twl4030_wdt: Change TWL4030_MODULE_PM_RECEIVER to TWL_MODULE_PM_RECEIVER
  watchdog: remove depends on CONFIG_EXPERIMENTAL
  watchdog: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
  watchdog: DA9055 Watchdog driver
  watchdog: omap_wdt: eliminate goto
  watchdog: omap_wdt: delete redundant platform_set_drvdata() calls
  watchdog: omap_wdt: convert to devm_ functions
  watchdog: omap_wdt: convert to new watchdog core
  watchdog: WatchDog Timer Driver Core: fix comment
  watchdog: s3c2410_wdt: use clk_prepare_enable and clk_disable_unprepare
  watchdog: imx2_wdt: Select the driver via ARCH_MXC
  watchdog: cpu5wdt.c: add missing del_timer call
  watchdog: hpwdt.c: Increase version string
  watchdog: Convert twl4030_wdt to watchdog core
  davinci_wdt: preparation for switch to common clock framework
  ...

12 years agoMerge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 22 Dec 2012 01:09:07 +0000 (17:09 -0800)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French:
 "Misc small cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: eliminate cifsERROR variable
  cifs: don't compare uniqueids in cifs_prime_dcache unless server inode numbers are in use
  cifs: fix double-free of "string" in cifs_parse_mount_options

12 years agoMerge tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Linus Torvalds [Sat, 22 Dec 2012 01:08:06 +0000 (17:08 -0800)]
Merge tag 'dm-3.8-fixes' of git://git./linux/kernel/git/agk/linux-dm

Pull dm update from Alasdair G Kergon:
 "Miscellaneous device-mapper fixes, cleanups and performance
  improvements.

  Of particular note:
   - Disable broken WRITE SAME support in all targets except linear and
     striped.  Use it when kcopyd is zeroing blocks.
   - Remove several mempools from targets by moving the data into the
     bio's new front_pad area(which dm calls 'per_bio_data').
   - Fix a race in thin provisioning if discards are misused.
   - Prevent userspace from interfering with the ioctl parameters and
     use kmalloc for the data buffer if it's small instead of vmalloc.
   - Throttle some annoying error messages when I/O fails."

* tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (36 commits)
  dm stripe: add WRITE SAME support
  dm: remove map_info
  dm snapshot: do not use map_context
  dm thin: dont use map_context
  dm raid1: dont use map_context
  dm flakey: dont use map_context
  dm raid1: rename read_record to bio_record
  dm: move target request nr to dm_target_io
  dm snapshot: use per_bio_data
  dm verity: use per_bio_data
  dm raid1: use per_bio_data
  dm: introduce per_bio_data
  dm kcopyd: add WRITE SAME support to dm_kcopyd_zero
  dm linear: add WRITE SAME support
  dm: add WRITE SAME support
  dm: prepare to support WRITE SAME
  dm ioctl: use kmalloc if possible
  dm ioctl: remove PF_MEMALLOC
  dm persistent data: improve improve space map block alloc failure message
  dm thin: use DMERR_LIMIT for errors
  ...

12 years agoRevert "nfsd: warn on odd reply state in nfsd_vfs_read"
J. Bruce Fields [Sat, 22 Dec 2012 00:48:59 +0000 (19:48 -0500)]
Revert "nfsd: warn on odd reply state in nfsd_vfs_read"

This reverts commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e.

This is obviously wrong, and I have no idea how I missed seeing the
warning in testing: I must just not have looked at the right logs.  The
caller bumps rq_resused/rq_next_page, so it will always be hit on a
large enough read.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland...
Linus Torvalds [Sat, 22 Dec 2012 00:40:26 +0000 (16:40 -0800)]
Merge tag 'rdma-for-linus' of git://git./linux/kernel/git/roland/infiniband

Pull more infiniband changes from Roland Dreier:
 "Second batch of InfiniBand/RDMA changes for 3.8:
   - cxgb4 changes to fix lookup engine hash collisions
   - mlx4 changes to make flow steering usable
   - fix to IPoIB to avoid pinning dst reference for too long"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cxgb4: Fix bug for active and passive LE hash collision path
  RDMA/cxgb4: Fix LE hash collision bug for passive open connection
  RDMA/cxgb4: Fix LE hash collision bug for active open connection
  mlx4_core: Allow choosing flow steering mode
  mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV
  mlx4_core: Fix error flow in the flow steering wrapper
  mlx4_core: Add QPN enforcement for flow steering rules set by VFs
  cxgb4: Add LE hash collision bug fix path in LLD driver
  cxgb4: Add T4 filter support
  IPoIB: Call skb_dst_drop() once skb is enqueued for sending

12 years agoMerge tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm...
Linus Torvalds [Sat, 22 Dec 2012 00:39:08 +0000 (16:39 -0800)]
Merge tag 'asm-generic' of git://git./linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanup from Arnd Bergmann:
 "These are a few cleanups for asm-generic:

   - a set of patches from Lars-Peter Clausen to generalize asm/mmu.h
     and use it in the architectures that don't need any special
     handling.
   - A patch from Will Deacon to remove the {read,write}s{b,w,l} as
     discussed during the arm64 review
   - A patch from James Hogan that helps with the meta architecture
     series."

* tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  xtensa: Use generic asm/mmu.h for nommu
  h8300: Use generic asm/mmu.h
  c6x: Use generic asm/mmu.h
  asm-generic/mmu.h: Add support for FDPIC
  asm-generic/mmu.h: Remove unused vmlist field from mm_context_t
  asm-generic: io: remove {read,write} string functions
  asm-generic/io.h: remove asm/cacheflush.h include

12 years agoARM: dts: fix duplicated build target and alphabetical sort out for exynos
Kukjin Kim [Fri, 21 Dec 2012 18:02:13 +0000 (10:02 -0800)]
ARM: dts: fix duplicated build target and alphabetical sort out for exynos

Commit db5b0ae00712 ("Merge tag 'dt' of git://git.kernel.org/.../arm-soc")
causes a duplicated build target.  This patch fixes it and sorts out the
build target alphabetically so that we can recognize something wrong
easily.

Cc: Olof Johansson <olof@lixom.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agodm stripe: add WRITE SAME support
Mike Snitzer [Fri, 21 Dec 2012 20:23:41 +0000 (20:23 +0000)]
dm stripe: add WRITE SAME support

Rename stripe_map_discard to stripe_map_range and reuse it for WRITE
SAME bio processing.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm: remove map_info
Mikulas Patocka [Fri, 21 Dec 2012 20:23:41 +0000 (20:23 +0000)]
dm: remove map_info

This patch removes map_info from bio-based device mapper targets.
map_info is still used for request-based targets.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm snapshot: do not use map_context
Mikulas Patocka [Fri, 21 Dec 2012 20:23:41 +0000 (20:23 +0000)]
dm snapshot: do not use map_context

Eliminate struct map_info from dm-snap.

map_info->ptr was used in dm-snap to indicate if the bio was tracked.
If map_info->ptr was non-NULL, the bio was linked in tracked_chunk_hash.

This patch removes the use of map_info->ptr. We determine if the bio was
tracked based on hlist_unhashed(&c->node). If hlist_unhashed is true,
the bio is not tracked, if it is false, the bio is tracked.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm thin: dont use map_context
Mikulas Patocka [Fri, 21 Dec 2012 20:23:40 +0000 (20:23 +0000)]
dm thin: dont use map_context

This patch removes endio_hook_pool from dm-thin and uses per-bio data instead.

This patch removes any use of map_info in preparation for the next patch
that removes map_info from bio-based device mapper.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm raid1: dont use map_context
Mikulas Patocka [Fri, 21 Dec 2012 20:23:40 +0000 (20:23 +0000)]
dm raid1: dont use map_context

Don't use map_info any more in dm-raid1.

map_info was used for writes to hold the region number. For this purpose
we add a new field dm_bio_details to dm_raid1_bio_record.

map_info was used for reads to hold a pointer to dm_raid1_bio_record (if
the pointer was non-NULL, bio details were saved; if the pointer was
NULL, bio details were not saved). We use
dm_raid1_bio_record.details->bi_bdev for this purpose. If bi_bdev is
NULL, details were not saved, if bi_bdev is non-NULL, details were
saved.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm flakey: dont use map_context
Mikulas Patocka [Fri, 21 Dec 2012 20:23:39 +0000 (20:23 +0000)]
dm flakey: dont use map_context

Replace map_info with a per-bio structure "struct per_bio_data" in dm-flakey.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm raid1: rename read_record to bio_record
Mikulas Patocka [Fri, 21 Dec 2012 20:23:39 +0000 (20:23 +0000)]
dm raid1: rename read_record to bio_record

Rename struct read_record to bio_record in dm-raid1.

In the following patch, the structure will be used for both read and
write bios, so rename it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm: move target request nr to dm_target_io
Mikulas Patocka [Fri, 21 Dec 2012 20:23:39 +0000 (20:23 +0000)]
dm: move target request nr to dm_target_io

This patch moves target_request_nr from map_info to dm_target_io and
makes it accessible with dm_bio_get_target_request_nr.

This patch is a preparation for the next patch that removes map_info.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm snapshot: use per_bio_data
Mikulas Patocka [Fri, 21 Dec 2012 20:23:38 +0000 (20:23 +0000)]
dm snapshot: use per_bio_data

Replace tracked_chunk_pool with per_bio_data in dm-snap.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm verity: use per_bio_data
Mikulas Patocka [Fri, 21 Dec 2012 20:23:38 +0000 (20:23 +0000)]
dm verity: use per_bio_data

Replace io_mempool with per_bio_data in dm-verity.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm raid1: use per_bio_data
Mikulas Patocka [Fri, 21 Dec 2012 20:23:38 +0000 (20:23 +0000)]
dm raid1: use per_bio_data

Replace read_record_pool with per_bio_data in dm-raid1.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm: introduce per_bio_data
Mikulas Patocka [Fri, 21 Dec 2012 20:23:38 +0000 (20:23 +0000)]
dm: introduce per_bio_data

Introduce a field per_bio_data_size in struct dm_target.

Targets can set this field in the constructor. If a target sets this
field to a non-zero value, "per_bio_data_size" bytes of auxiliary data
are allocated for each bio submitted to the target. These data can be
used for any purpose by the target and help us improve performance by
removing some per-target mempools.

Per-bio data is accessed with dm_per_bio_data. The
argument data_size must be the same as the value per_bio_data_size in
dm_target.

If the target has a pointer to per_bio_data, it can get a pointer to
the bio with dm_bio_from_per_bio_data() function (data_size must be the
same as the value passed to dm_per_bio_data).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm kcopyd: add WRITE SAME support to dm_kcopyd_zero
Mike Snitzer [Fri, 21 Dec 2012 20:23:37 +0000 (20:23 +0000)]
dm kcopyd: add WRITE SAME support to dm_kcopyd_zero

Add WRITE SAME support to dm-io and make it accessible to
dm_kcopyd_zero().  dm_kcopyd_zero() provides an asynchronous interface
whereas the blkdev_issue_write_same() interface is synchronous.

WRITE SAME is a SCSI command that can be leveraged for more efficient
zeroing of a specified logical extent of a device which supports it.
Only a single zeroed logical block is transfered to the target for each
WRITE SAME and the target then writes that same block across the
specified extent.

The dm thin target uses this.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm linear: add WRITE SAME support
Mike Snitzer [Fri, 21 Dec 2012 20:23:37 +0000 (20:23 +0000)]
dm linear: add WRITE SAME support

The linear target can already support WRITE SAME requests so signal
this by setting num_write_same_requests to 1.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm: add WRITE SAME support
Mike Snitzer [Fri, 21 Dec 2012 20:23:37 +0000 (20:23 +0000)]
dm: add WRITE SAME support

WRITE SAME bios have a payload that contain a single page.  When
cloning WRITE SAME bios DM has no need to modify the bi_io_vec
attributes (and doing so would be detrimental).  DM need only alter the
start and end of the WRITE SAME bio accordingly.

Rather than duplicate __clone_and_map_discard, factor out a common
function that is also used by __clone_and_map_write_same.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm: prepare to support WRITE SAME
Mike Snitzer [Fri, 21 Dec 2012 20:23:36 +0000 (20:23 +0000)]
dm: prepare to support WRITE SAME

Allow targets to opt in to WRITE SAME support by setting
'num_write_same_requests' in the dm_target structure.

A dm device will only advertise WRITE SAME support if all its
targets and all its underlying devices support it.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm ioctl: use kmalloc if possible
Mikulas Patocka [Fri, 21 Dec 2012 20:23:36 +0000 (20:23 +0000)]
dm ioctl: use kmalloc if possible

If the parameter buffer is small enough, try to allocate it with kmalloc()
rather than vmalloc().

vmalloc is noticeably slower than kmalloc because it has to manipulate
page tables.

In my tests, on PA-RISC this patch speeds up activation 13 times.
On Opteron this patch speeds up activation by 5%.

This patch introduces a new function free_params() to free the
parameters and this uses new flags that record whether or not vmalloc()
was used and whether or not the input buffer must be wiped after use.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm ioctl: remove PF_MEMALLOC
Mikulas Patocka [Fri, 21 Dec 2012 20:23:36 +0000 (20:23 +0000)]
dm ioctl: remove PF_MEMALLOC

When allocating memory for the userspace ioctl data, set some
appropriate GPF flags directly instead of using PF_MEMALLOC.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm persistent data: improve improve space map block alloc failure message
Joe Thornber [Fri, 21 Dec 2012 20:23:36 +0000 (20:23 +0000)]
dm persistent data: improve improve space map block alloc failure message

Improve space map error message when unable to allocate a new
metadata block.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm thin: use DMERR_LIMIT for errors
Mike Snitzer [Fri, 21 Dec 2012 20:23:34 +0000 (20:23 +0000)]
dm thin: use DMERR_LIMIT for errors

Throttle all errors logged from the IO path by dm thin.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm persistent data: use DMERR_LIMIT for errors
Mike Snitzer [Fri, 21 Dec 2012 20:23:34 +0000 (20:23 +0000)]
dm persistent data: use DMERR_LIMIT for errors

Nearly all of persistent-data is in the IO path so throttle error
messages with DMERR_LIMIT to limit the amount logged when
something has gone wrong.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm block manager: reinstate message when validator fails
Mike Snitzer [Fri, 21 Dec 2012 20:23:34 +0000 (20:23 +0000)]
dm block manager: reinstate message when validator fails

Reinstate a useful error message when the block manager buffer validator fails.
This was mistakenly eliminated when the block manager was converted to use
dm-bufio.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm raid: round region_size to power of two
Jonathan Brassow [Fri, 21 Dec 2012 20:23:33 +0000 (20:23 +0000)]
dm raid: round region_size to power of two

If the user does not supply a bitmap region_size to the dm raid target,
a reasonable size is computed automatically.  If this is not a power of 2,
the md code will report an error later.

This patch catches the problem early and rounds the region_size to the
next power of two.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm thin: cleanup dead code
Joe Thornber [Fri, 21 Dec 2012 20:23:33 +0000 (20:23 +0000)]
dm thin: cleanup dead code

Remove unused @data_block parameter from cell_defer.
Change thin_bio_map to use many returns rather than setting a variable.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm thin: rename cell_defer_except to cell_defer_no_holder
Joe Thornber [Fri, 21 Dec 2012 20:23:33 +0000 (20:23 +0000)]
dm thin: rename cell_defer_except to cell_defer_no_holder

Rename cell_defer_except() to cell_defer_no_holder() which describes
its function more clearly.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm snapshot: optimize track_chunk
Mikulas Patocka [Fri, 21 Dec 2012 20:23:33 +0000 (20:23 +0000)]
dm snapshot: optimize track_chunk

track_chunk is always called with interrupts enabled. Consequently, we
do not need to save and restore interrupt state in "flags" variable.
This patch changes spin_lock_irqsave to spin_lock_irq and
spin_unlock_irqrestore to spin_unlock_irq.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm raid: use DM_ENDIO_INCOMPLETE
Mikulas Patocka [Fri, 21 Dec 2012 20:23:32 +0000 (20:23 +0000)]
dm raid: use DM_ENDIO_INCOMPLETE

Use a defined macro DM_ENDIO_INCOMPLETE instead of a numeric constant.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm raid1: remove impossible mempool_alloc error test
Mikulas Patocka [Fri, 21 Dec 2012 20:23:32 +0000 (20:23 +0000)]
dm raid1: remove impossible mempool_alloc error test

mempool_alloc can't fail if __GFP_WAIT is specified, so the condition
that tests if read_record is non-NULL is always true.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm thin: emit ignore_discard in status when discards disabled
Mike Snitzer [Fri, 21 Dec 2012 20:23:32 +0000 (20:23 +0000)]
dm thin: emit ignore_discard in status when discards disabled

If "ignore_discard" is specified when creating the thin pool device then
discard support is disabled for that device.  The pool device's status
should reflect this fact rather than stating "no_discard_passdown"
(which implies discards are enabled but passdown is disabled).

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm persistent data: fix nested btree deletion
Joe Thornber [Fri, 21 Dec 2012 20:23:32 +0000 (20:23 +0000)]
dm persistent data: fix nested btree deletion

When deleting nested btrees, the code forgets to delete the innermost
btree.  The thin-metadata code serendipitously compensates for this by
claiming there is one extra layer in the tree.

This patch corrects both problems.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm thin: wake worker when discard is prepared
Joe Thornber [Fri, 21 Dec 2012 20:23:31 +0000 (20:23 +0000)]
dm thin: wake worker when discard is prepared

When discards are prepared it is best to directly wake the worker that
will process them.  The worker will be woken anyway, via periodic
commit, but there is no reason to not wake_worker here.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm thin: fix race between simultaneous io and discards to same block
Joe Thornber [Fri, 21 Dec 2012 20:23:31 +0000 (20:23 +0000)]
dm thin: fix race between simultaneous io and discards to same block

There is a race when discard bios and non-discard bios are issued
simultaneously to the same block.

Discard support is expensive for all thin devices precisely because you
have to be careful to quiesce the area you're discarding.  DM thin must
handle this conflicting IO pattern (simultaneous non-discard vs discard)
even though a sane application shouldn't be issuing such IO.

The race manifests as follows:

1. A non-discard bio is mapped in thin_bio_map.
   This doesn't lock out parallel activity to the same block.

2. A discard bio is issued to the same block as the non-discard bio.

3. The discard bio is locked in a dm_bio_prison_cell in process_discard
   to lock out parallel activity against the same block.

4. The non-discard bio's mapping continues and its all_io_entry is
   incremented so the bio is accounted for in the thin pool's all_io_ds
   which is a dm_deferred_set used to track time locality of non-discard IO.

5. The non-discard bio is finally locked in a dm_bio_prison_cell in
   process_bio.

The race can result in deadlock, leaving the block layer hanging waiting
for completion of a discard bio that never completes, e.g.:

INFO: task ruby:15354 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ruby            D ffffffff8160f0e0     0 15354  15314 0x00000000
 ffff8802fb08bc58 0000000000000082 ffff8802fb08bfd8 0000000000012900
 ffff8802fb08a010 0000000000012900 0000000000012900 0000000000012900
 ffff8802fb08bfd8 0000000000012900 ffff8803324b9480 ffff88032c6f14c0
Call Trace:
 [<ffffffff814e5a19>] schedule+0x29/0x70
 [<ffffffff814e3d85>] schedule_timeout+0x195/0x220
 [<ffffffffa06b9bc1>] ? _dm_request+0x111/0x160 [dm_mod]
 [<ffffffff814e589e>] wait_for_common+0x11e/0x190
 [<ffffffff8107a170>] ? try_to_wake_up+0x2b0/0x2b0
 [<ffffffff814e59ed>] wait_for_completion+0x1d/0x20
 [<ffffffff81233289>] blkdev_issue_discard+0x219/0x260
 [<ffffffff81233e79>] blkdev_ioctl+0x6e9/0x7b0
 [<ffffffff8119a65c>] block_ioctl+0x3c/0x40
 [<ffffffff8117539c>] do_vfs_ioctl+0x8c/0x340
 [<ffffffff8119a547>] ? block_llseek+0x67/0xb0
 [<ffffffff811756f1>] sys_ioctl+0xa1/0xb0
 [<ffffffff810561f6>] ? sys_rt_sigprocmask+0x86/0xd0
 [<ffffffff814ef099>] system_call_fastpath+0x16/0x1b

The thinp-test-suite's test_discard_random_sectors reliably hits this
deadlock on fast SSD storage.

The fix for this race is that the all_io_entry for a bio must be
incremented whilst the dm_bio_prison_cell is held for the bio's
associated virtual and physical blocks.  That cell locking wasn't
occurring early enough in thin_bio_map.  This patch fixes this.

Care is taken to always call the new function inc_all_io_entry() with
the relevant cells locked, but they are generally unlocked before
calling issue() to try to avoid holding the cells locked across
generic_submit_request.

Also, now that thin_bio_map may lock bios in a cell, process_bio() is no
longer the only thread that will do so.  Because of this we must be sure
to use cell_defer_except() to release all non-holder entries, that
were added by the other thread, because they must be deferred.

This patch depends on "dm thin: replace dm_cell_release_singleton with
cell_defer_except".

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@vger.kernel.org
12 years agodm thin: replace dm_cell_release_singleton with cell_defer_except
Joe Thornber [Fri, 21 Dec 2012 20:23:31 +0000 (20:23 +0000)]
dm thin: replace dm_cell_release_singleton with cell_defer_except

Change existing users of the function dm_cell_release_singleton to share
cell_defer_except instead, and then remove the now-unused function.

Everywhere that calls dm_cell_release_singleton, the bio in question
is the holder of the cell.

If there are no non-holder entries in the cell then cell_defer_except
behaves exactly like dm_cell_release_singleton.  Conversely, if there
*are* non-holder entries then dm_cell_release_singleton must not be used
because those entries would need to be deferred.

Consequently, it is safe to replace use of dm_cell_release_singleton
with cell_defer_except.

This patch is a pre-requisite for "dm thin: fix race between
simultaneous io and discards to same block".

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm: disable WRITE SAME
Mike Snitzer [Fri, 21 Dec 2012 20:23:30 +0000 (20:23 +0000)]
dm: disable WRITE SAME

WRITE SAME bios are not yet handled correctly by device-mapper so
disable their use on device-mapper devices by setting
max_write_same_sectors to zero.

As an example, a ciphertext device is incompatible because the data
gets changed according to the location at which it written and so the
dm crypt target cannot support it.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Cc: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm ioctl: prevent unsafe change to dm_ioctl data_size
Alasdair G Kergon [Fri, 21 Dec 2012 20:23:30 +0000 (20:23 +0000)]
dm ioctl: prevent unsafe change to dm_ioctl data_size

Abort dm ioctl processing if userspace changes the data_size parameter
after we validated it but before we finished copying the data buffer
from userspace.

The dm ioctl parameters are processed in the following sequence:
 1. ctl_ioctl() calls copy_params();
 2. copy_params() makes a first copy of the fixed-sized portion of the
    userspace parameters into the local variable "tmp";
 3. copy_params() then validates tmp.data_size and allocates a new
    structure big enough to hold the complete data and copies the whole
    userspace buffer there;
 4. ctl_ioctl() reads userspace data the second time and copies the whole
    buffer into the pointer "param";
 5. ctl_ioctl() reads param->data_size without any validation and stores it
    in the variable "input_param_size";
 6. "input_param_size" is further used as the authoritative size of the
    kernel buffer.

The problem is that userspace code could change the contents of user
memory between steps 2 and 4.  In particular, the data_size parameter
can be changed to an invalid value after the kernel has validated it.
This lets userspace force the kernel to access invalid kernel memory.

The fix is to ensure that the size has not changed at step 4.

This patch shouldn't have a security impact because CAP_SYS_ADMIN is
required to run this code, but it should be fixed anyway.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
12 years agodm persistent data: rename node to btree_node
Mikulas Patocka [Fri, 21 Dec 2012 20:23:30 +0000 (20:23 +0000)]
dm persistent data: rename node to btree_node

This patch fixes a compilation failure on sparc32 by renaming struct node.

struct node is already defined in include/linux/node.h. On sparc32, it
happens to be included through other dependencies and persistent-data
doesn't compile because of conflicting declarations.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agoNFS: Kill fscache warnings when mounting without -ofsc
Trond Myklebust [Fri, 21 Dec 2012 16:02:32 +0000 (11:02 -0500)]
NFS: Kill fscache warnings when mounting without -ofsc

The fscache code will currently bleat a "non-unique superblock keys"
warning even if the user is mounting without the 'fsc' option.

There should be no reason to even initialise the superblock cache cookie
unless we're planning on using fscache for something, so ensure that we
check for the NFS_OPTION_FSCACHE flag before calling into the fscache
code.

Reported-by: Paweł Sikora <pawel.sikora@agmk.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoNFS: Provide stub nfs_fscache_wait_on_invalidate() for when CONFIG_NFS_FSCACHE=n
David Howells [Fri, 21 Dec 2012 12:15:05 +0000 (12:15 +0000)]
NFS: Provide stub nfs_fscache_wait_on_invalidate() for when CONFIG_NFS_FSCACHE=n

Provide a stub nfs_fscache_wait_on_invalidate() function for when
CONFIG_NFS_FSCACHE=n lest the following error appear:

  fs/nfs/inode.c: In function 'nfs_invalidate_mapping':
  fs/nfs/inode.c:887:2: error: implicit declaration of function 'nfs_fscache_wait_on_invalidate' [-Werror=implicit-function-declaration]
  cc1: some warnings being treated as errors

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge tag 'vfio-for-v3.8-v2' of git://github.com/awilliam/linux-vfio
Linus Torvalds [Fri, 21 Dec 2012 05:30:12 +0000 (21:30 -0800)]
Merge tag 'vfio-for-v3.8-v2' of git://github.com/awilliam/linux-vfio

Pull vfio update from Alex Williamson.

* tag 'vfio-for-v3.8-v2' of git://github.com/awilliam/linux-vfio:
  vfio-pci: Enable device before attempting reset
  VFIO: fix out of order labels for error recovery in vfio_pci_init()
  VFIO: use ACCESS_ONCE() to guard access to dev->driver
  VFIO: unregister IOMMU notifier on error recovery path
  vfio-pci: Re-order device reset
  vfio: simplify kmalloc+copy_from_user to memdup_user

12 years agoMerge branch 'for-next' of git://git.infradead.org/users/eparis/notify
Linus Torvalds [Fri, 21 Dec 2012 04:11:52 +0000 (20:11 -0800)]
Merge branch 'for-next' of git://git.infradead.org/users/eparis/notify

Pull filesystem notification updates from Eric Paris:
 "This pull mostly is about locking changes in the fsnotify system.  By
  switching the group lock from a spin_lock() to a mutex() we can now
  hold the lock across things like iput().  This fixes a problem
  involving unmounting a fs and having inodes be busy, first pointed out
  by FAT, but reproducible with tmpfs.

  This also restores signal driven I/O for inotify, which has been
  broken since about 2.6.32."

Ugh.  I *hate* the timing of this.  It was rebased after the merge
window opened, and then left to sit with the pull request coming the day
before the merge window closes.  That's just crap.  But apparently the
patches themselves have been around for over a year, just gathering
dust, so now it's suddenly critical.

Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen
Rothwell's fixes from -next.

* 'for-next' of git://git.infradead.org/users/eparis/notify:
  inotify: automatically restart syscalls
  inotify: dont skip removal of watch descriptor if creation of ignored event failed
  fanotify: dont merge permission events
  fsnotify: make fasync generic for both inotify and fanotify
  fsnotify: change locking order
  fsnotify: dont put marks on temporary list when clearing marks by group
  fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark()
  fsnotify: pass group to fsnotify_destroy_mark()
  fsnotify: use a mutex instead of a spinlock to protect a groups mark list
  fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed
  fsnotify: take groups mark_lock before mark lock
  fsnotify: use reference counting for groups
  fsnotify: introduce fsnotify_get_group()
  inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()

12 years agoMerge branch 'akpm' (Andrew's patch-bomb)
Linus Torvalds [Fri, 21 Dec 2012 04:00:43 +0000 (20:00 -0800)]
Merge branch 'akpm' (Andrew's patch-bomb)

Merge the rest of Andrew's patches for -rc1:
 "A bunch of fixes and misc missed-out-on things.

  That'll do for -rc1.  I still have a batch of IPC patches which still
  have a possible bug report which I'm chasing down."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits)
  keys: use keyring_alloc() to create module signing keyring
  keys: fix unreachable code
  sendfile: allows bypassing of notifier events
  SGI-XP: handle non-fatal traps
  fat: fix incorrect function comment
  Documentation: ABI: remove testing/sysfs-devices-node
  proc: fix inconsistent lock state
  linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
  memcg: don't register hotcpu notifier from ->css_alloc()
  checkpatch: warn on uapi #includes that #include <uapi/...
  revert "rtc: recycle id when unloading a rtc driver"
  mm: clean up transparent hugepage sysfs error messages
  hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method
  hfsplus: rework processing of hfs_btree_write() returned error
  hfsplus: rework processing errors in hfsplus_free_extents()
  hfsplus: avoid crash on failed block map free
  kcmp: include linux/ptrace.h
  drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h>
  mm: cma: WARN if freed memory is still in use
  exec: do not leave bprm->interp on stack
  ...

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Fri, 21 Dec 2012 02:14:31 +0000 (18:14 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull VFS update from Al Viro:
 "fscache fixes, ESTALE patchset, vmtruncate removal series, assorted
  misc stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (79 commits)
  vfs: make lremovexattr retry once on ESTALE error
  vfs: make removexattr retry once on ESTALE
  vfs: make llistxattr retry once on ESTALE error
  vfs: make listxattr retry once on ESTALE error
  vfs: make lgetxattr retry once on ESTALE
  vfs: make getxattr retry once on an ESTALE error
  vfs: allow lsetxattr() to retry once on ESTALE errors
  vfs: allow setxattr to retry once on ESTALE errors
  vfs: allow utimensat() calls to retry once on an ESTALE error
  vfs: fix user_statfs to retry once on ESTALE errors
  vfs: make fchownat retry once on ESTALE errors
  vfs: make fchmodat retry once on ESTALE errors
  vfs: have chroot retry once on ESTALE error
  vfs: have chdir retry lookup and call once on ESTALE error
  vfs: have faccessat retry once on an ESTALE error
  vfs: have do_sys_truncate retry once on an ESTALE error
  vfs: fix renameat to retry on ESTALE errors
  vfs: make do_unlinkat retry once on ESTALE errors
  vfs: make do_rmdir retry once on ESTALE errors
  vfs: add a flags argument to user_path_parent
  ...

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Linus Torvalds [Fri, 21 Dec 2012 02:05:28 +0000 (18:05 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/signal

Pull signal handling cleanups from Al Viro:
 "sigaltstack infrastructure + conversion for x86, alpha and um,
  COMPAT_SYSCALL_DEFINE infrastructure.

  Note that there are several conflicts between "unify
  SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
  resolution is trivial - just remove definitions of SS_ONSTACK and
  SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
  include/uapi/linux/signal.h contains the unified variant."

Fixed up conflicts as per Al.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  alpha: switch to generic sigaltstack
  new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
  generic compat_sys_sigaltstack()
  introduce generic sys_sigaltstack(), switch x86 and um to it
  new helper: compat_user_stack_pointer()
  new helper: restore_altstack()
  unify SS_ONSTACK/SS_DISABLE definitions
  new helper: current_user_stack_pointer()
  missing user_stack_pointer() instances
  Bury the conditionals from kernel_thread/kernel_execve series
  COMPAT_SYSCALL_DEFINE: infrastructure

12 years agoMerge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Linus Torvalds [Fri, 21 Dec 2012 01:56:23 +0000 (17:56 -0800)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
 "A number of smallish fixes scattered around the ARM code.  Probably
  the most serious one is the one from Al addressing the missing locking
  in the swap emulation code."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7607/1: realview: fix private peripheral memory base for EB rev. B boards
  ARM: 7606/1: cache: flush to LoUU instead of LoUIS on uniprocessor CPUs
  ARM: missing ->mmap_sem around find_vma() in swp_emulate.c
  ARM: 7605/1: vmlinux.lds: Move .notes section next to the rodata
  ARM: 7602/1: Pass real "__machine_arch_type" variable to setup_machine_tags() procedure
  ARM: 7600/1: include CONFIG_DEBUG_LL_INCLUDE rather than mach/debug-macro.S

12 years agoMerge tag 'fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Fri, 21 Dec 2012 01:55:34 +0000 (17:55 -0800)]
Merge tag 'fixes2' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes part 2 from Olof Johansson:
 "Here are a few more fixes for 3.8.  Two branches of fixes for Samsung
  platforms, including fixes for the audio build errors on all non-DT
  platforms.  There's also a fixup to the sunxi device-tree file renames
  due to a bad patch application by me, and a fix for OMAP due to
  function renames merged through the powerpc tree."

* tag 'fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: OMAP2+: Fix compillation error in mach-omap2/timer.c
  ARM: sunxi: rename device tree source files
  ARM: EXYNOS: Avoid passing the clks through platform data
  ARM: S5PV210: Avoid passing the clks through platform data
  ARM: S5P64X0: Add I2S clkdev support
  ARM: S5PC100: Add I2S clkdev support
  ARM: S3C64XX: Add I2S clkdev support
  ARM: EXYNOS: Fix MSHC clocks instance names
  ARM: EXYNOS: Fix NULL pointer dereference bug in SMDKV310
  ARM: EXYNOS: Fix NULL pointer dereference bug in SMDK4X12
  ARM: EXYNOS: Fix NULL pointer dereference bug in Origen
  ARM: SAMSUNG: Add missing include guard to gpio-core.h
  pinctrl: exynos5440/samsung: Staticize pcfgs
  pinctrl: samsung: Fix a typo in pinctrl-samsung.h
  ARM: EXYNOS: fix skip scu_enable() for EXYNOS5440
  ARM: EXYNOS: fix GIC using for EXYNOS5440
  ARM: EXYNOS: fix build error when MFC is not selected

12 years agoMerge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Linus Torvalds [Fri, 21 Dec 2012 01:52:06 +0000 (17:52 -0800)]
Merge branch 'misc' of git://git./linux/kernel/git/mmarek/kbuild

Pull kbuild misc changes from Michal Marek:
 "This is the non-critical part of kbuild

   - scripts/kernel-doc requires a "Return:" section for non-void
     functions
   - ARCH=arm SUBARCH=... support for make tags
   - COMPILED_SOURCE=1 support for make tags (only indexes .c files for
     which a .o exists)
   - New coccinelle check
   - Option parsing fix for scripts/config"

* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  scripts/config: Fix wrong "shift" for --keep-case
  scripts/tags.sh: Support compiled source
  scripts/tags.sh: Support subarch for ARM
  scripts/coccinelle/misc/warn.cocci: use WARN
  scripts/kernel-doc: check that non-void fcts describe their return value
  Kernel-doc: Convention: Use a "Return" section to describe return values

12 years agokeys: use keyring_alloc() to create module signing keyring
David Howells [Thu, 20 Dec 2012 23:05:56 +0000 (15:05 -0800)]
keys: use keyring_alloc() to create module signing keyring

Use keyring_alloc() to create special keyrings now that it has
a permissions parameter rather than using key_alloc() +
key_instantiate_and_link().

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agokeys: fix unreachable code
Alan Cox [Thu, 20 Dec 2012 23:05:54 +0000 (15:05 -0800)]
keys: fix unreachable code

We set ret to NULL then test it. Remove the bogus test

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agosendfile: allows bypassing of notifier events
Scott Wolchok [Thu, 20 Dec 2012 23:05:52 +0000 (15:05 -0800)]
sendfile: allows bypassing of notifier events

do_sendfile() in fs/read_write.c does not call the fsnotify functions,
unlike its neighbors.  This manifests as a lack of inotify ACCESS events
when a file is sent using sendfile(2).

Addresses
  https://bugzilla.kernel.org/show_bug.cgi?id=12812

[akpm@linux-foundation.org: use fsnotify_modify(out.file), not fsnotify_access(), per Dave]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Scott Wolchok <swolchok@umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoSGI-XP: handle non-fatal traps
Robin Holt [Thu, 20 Dec 2012 23:05:50 +0000 (15:05 -0800)]
SGI-XP: handle non-fatal traps

We found a user code which was raising a divide-by-zero trap.  That trap
would lead to XPC connections between system-partitions being torn down
due to the die_chain notifier callouts it received.

This also revealed a different issue where multiple callers into
xpc_die_deactivate() would all attempt to do the disconnect in parallel
which would sometimes lock up but often overwhelm the console on very
large machines as each would print at least one line of output at the
end of the deactivate.

I reviewed all the users of the die_chain notifier and changed the code
to ignore the notifier callouts for reasons which will not actually lead
to a system to continue on to call die().

[akpm@linux-foundation.org: fix ia64]
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agofat: fix incorrect function comment
Ravishankar N [Thu, 20 Dec 2012 23:05:46 +0000 (15:05 -0800)]
fat: fix incorrect function comment

fat_search_long() returns 0 on success, -ENOENT/ENOMEM on failure.
Change the function comment accordingly.

While at it, fix some trivial typos.

Signed-off-by: Ravishankar N <cyberax82@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoDocumentation: ABI: remove testing/sysfs-devices-node
Davidlohr Bueso [Thu, 20 Dec 2012 23:05:45 +0000 (15:05 -0800)]
Documentation: ABI: remove testing/sysfs-devices-node

This file is already documented in the stable ABI (see commit
5bbe1ec11fcf).

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Greg KH <greg@kroah.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoproc: fix inconsistent lock state
Xiaotian Feng [Thu, 20 Dec 2012 23:05:44 +0000 (15:05 -0800)]
proc: fix inconsistent lock state

Lockdep found an inconsistent lock state when rcu is processing delayed
work in softirq.  Currently, kernel is using spin_lock/spin_unlock to
protect proc_inum_ida, but proc_free_inum is called by rcu in softirq
context.

Use spin_lock_bh/spin_unlock_bh fix following lockdep warning.

  =================================
  [ INFO: inconsistent lock state ]
  3.7.0 #36 Not tainted
  ---------------------------------
  inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
  swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
   (proc_inum_lock){+.?...}, at: proc_free_inum+0x1c/0x50
  {SOFTIRQ-ON-W} state was registered at:
     __lock_acquire+0x8ae/0xca0
     lock_acquire+0x199/0x200
     _raw_spin_lock+0x41/0x50
     proc_alloc_inum+0x4c/0xd0
     alloc_mnt_ns+0x49/0xc0
     create_mnt_ns+0x25/0x70
     mnt_init+0x161/0x1c7
     vfs_caches_init+0x107/0x11a
     start_kernel+0x348/0x38c
     x86_64_start_reservations+0x131/0x136
     x86_64_start_kernel+0x103/0x112
  irq event stamp: 2993422
  hardirqs last  enabled at (2993422):  _raw_spin_unlock_irqrestore+0x55/0x80
  hardirqs last disabled at (2993421):  _raw_spin_lock_irqsave+0x29/0x70
  softirqs last  enabled at (2993394):  _local_bh_enable+0x13/0x20
  softirqs last disabled at (2993395):  call_softirq+0x1c/0x30

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(proc_inum_lock);
    <Interrupt>
      lock(proc_inum_lock);

   *** DEADLOCK ***

  no locks held by swapper/1/0.

  stack backtrace:
  Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36
  Call Trace:
   <IRQ>  [<ffffffff810a40f1>] ? vprintk_emit+0x471/0x510
    print_usage_bug+0x2a5/0x2c0
    mark_lock+0x33b/0x5e0
    __lock_acquire+0x813/0xca0
    lock_acquire+0x199/0x200
    _raw_spin_lock+0x41/0x50
    proc_free_inum+0x1c/0x50
    free_pid_ns+0x1c/0x50
    put_pid_ns+0x2e/0x50
    put_pid+0x4a/0x60
    delayed_put_pid+0x12/0x20
    rcu_process_callbacks+0x462/0x790
    __do_softirq+0x1b4/0x3b0
    call_softirq+0x1c/0x30
    do_softirq+0x59/0xd0
    irq_exit+0x54/0xd0
    smp_apic_timer_interrupt+0x95/0xa3
    apic_timer_interrupt+0x72/0x80
    cpuidle_enter_tk+0x10/0x20
    cpuidle_enter_state+0x17/0x50
    cpuidle_idle_call+0x287/0x520
    cpu_idle+0xba/0x130
    start_secondary+0x2b3/0x2bc

Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agolinux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
Guenter Roeck [Thu, 20 Dec 2012 23:05:42 +0000 (15:05 -0800)]
linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors

Commit 263a523d18bc ("linux/kernel.h: Fix warning seen with W=1 due to
change in DIV_ROUND_CLOSEST") fixes a warning seen with W=1 due to
change in DIV_ROUND_CLOSEST.

Unfortunately, the C compiler converts divide operations with unsigned
divisors to unsigned, even if the dividend is signed and negative (for
example, -10 / 5U = 858993457).  The C standard says "If one operand has
unsigned int type, the other operand is converted to unsigned int", so
the compiler is not to blame.  As a result, DIV_ROUND_CLOSEST(0, 2U) and
similar operations now return bad values, since the automatic conversion
of expressions such as "0 - 2U/2" to unsigned was not taken into
account.

Fix by checking for the divisor variable type when deciding which
operation to perform.  This fixes DIV_ROUND_CLOSEST(0, 2U), but still
returns bad values for negative dividends divided by unsigned divisors.
Mark the latter case as unsupported.

One observed effect of this problem is that the s2c_hwmon driver reports
a value of 4198403 instead of 0 if the ADC reads 0.

Other impact is unpredictable.  Problem is seen if the divisor is an
unsigned variable or constant and the dividend is less than (divisor/2).

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Juergen Beisert <jbe@pengutronix.de>
Tested-by: Juergen Beisert <jbe@pengutronix.de>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: <stable@vger.kernel.org> [3.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomemcg: don't register hotcpu notifier from ->css_alloc()
Tejun Heo [Thu, 20 Dec 2012 23:05:40 +0000 (15:05 -0800)]
memcg: don't register hotcpu notifier from ->css_alloc()

Commit 648bb56d076b ("cgroup: lock cgroup_mutex in cgroup_init_subsys()")
made cgroup_init_subsys() grab cgroup_mutex before invoking
->css_alloc() for the root css.  Because memcg registers hotcpu notifier
from ->css_alloc() for the root css, this introduced circular locking
dependency between cgroup_mutex and cpu hotplug.

Fix it by moving hotcpu notifier registration to a subsys initcall.

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.7.0-rc4-work+ #42 Not tainted
  -------------------------------------------------------
  bash/645 is trying to acquire lock:
   (cgroup_mutex){+.+.+.}, at: [<ffffffff8110c5b7>] cgroup_lock+0x17/0x20

  but task is already holding lock:
   (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109300f>] cpu_hotplug_begin+0x2f/0x60

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

 -> #1 (cpu_hotplug.lock){+.+.+.}:
         lock_acquire+0x97/0x1e0
         mutex_lock_nested+0x61/0x3b0
         get_online_cpus+0x3c/0x60
         rebuild_sched_domains_locked+0x1b/0x70
         cpuset_write_resmask+0x298/0x2c0
         cgroup_file_write+0x1ef/0x300
         vfs_write+0xa8/0x160
         sys_write+0x52/0xa0
         system_call_fastpath+0x16/0x1b

 -> #0 (cgroup_mutex){+.+.+.}:
         __lock_acquire+0x14ce/0x1d20
         lock_acquire+0x97/0x1e0
         mutex_lock_nested+0x61/0x3b0
         cgroup_lock+0x17/0x20
         cpuset_handle_hotplug+0x1b/0x560
         cpuset_update_active_cpus+0xe/0x10
         cpuset_cpu_inactive+0x47/0x50
         notifier_call_chain+0x66/0x150
         __raw_notifier_call_chain+0xe/0x10
         __cpu_notify+0x20/0x40
         _cpu_down+0x7e/0x2f0
         cpu_down+0x36/0x50
         store_online+0x5d/0xe0
         dev_attr_store+0x18/0x30
         sysfs_write_file+0xe0/0x150
         vfs_write+0xa8/0x160
         sys_write+0x52/0xa0
         system_call_fastpath+0x16/0x1b
  other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(cpu_hotplug.lock);
                                 lock(cgroup_mutex);
                                 lock(cpu_hotplug.lock);
    lock(cgroup_mutex);

   *** DEADLOCK ***

  5 locks held by bash/645:
   #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8123bab8>] sysfs_write_file+0x48/0x150
   #1:  (s_active#42){.+.+.+}, at: [<ffffffff8123bb38>] sysfs_write_file+0xc8/0x150
   #2:  (x86_cpu_hotplug_driver_mutex){+.+...}, at: [<ffffffff81079277>] cpu_hotplug_driver_lock+0x1
+7/0x20
   #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81093157>] cpu_maps_update_begin+0x17/0x20
   #4:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109300f>] cpu_hotplug_begin+0x2f/0x60

  stack backtrace:
  Pid: 645, comm: bash Not tainted 3.7.0-rc4-work+ #42
  Call Trace:
   print_circular_bug+0x28e/0x29f
   __lock_acquire+0x14ce/0x1d20
   lock_acquire+0x97/0x1e0
   mutex_lock_nested+0x61/0x3b0
   cgroup_lock+0x17/0x20
   cpuset_handle_hotplug+0x1b/0x560
   cpuset_update_active_cpus+0xe/0x10
   cpuset_cpu_inactive+0x47/0x50
   notifier_call_chain+0x66/0x150
   __raw_notifier_call_chain+0xe/0x10
   __cpu_notify+0x20/0x40
   _cpu_down+0x7e/0x2f0
   cpu_down+0x36/0x50
   store_online+0x5d/0xe0
   dev_attr_store+0x18/0x30
   sysfs_write_file+0xe0/0x150
   vfs_write+0xa8/0x160
   sys_write+0x52/0xa0
   system_call_fastpath+0x16/0x1b

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>