GitHub/moto-9609/android_kernel_motorola_exynos9610.git
14 years agohwmon: (w83795) Pack similar register reads
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Pack similar register reads

Pack similar register reads using for loops.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Make W83795_REG_PWM more efficient
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Make W83795_REG_PWM more efficient

Cascaded conditionals are inefficient. Reorder the fields so that
PWM register addresses can be computed more efficiently.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
14 years agohwmon: (w83795) Don't pre-read values we'll update later
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Don't pre-read values we'll update later

There is no point in reading registers during initialization if we
will refresh the values in the update function later. This is only
slowing down the driver loading with no benefit, stop doing it.
This change saves 480 ms on driver load on my test system.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Simplify temperature sensor type handling
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Simplify temperature sensor type handling

All 3 temperature sensor type sysfs functions (show_temp_mode,
store_temp_mode and show_dts_mode) can be simplified. We don't
create these files when the correponding input isn't in temperature
monitoring mode, so there is no point in handling that case.
Likewise, we don't allow changing inputs from temperature to voltage,
so the code handling this case is dead and can be removed.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Drop _NUM constants
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Drop _NUM constants

Consistently use ARRAY_SIZE() to control for loops.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Drop REST_VLT_BEGIN/END
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Drop REST_VLT_BEGIN/END

Get rid of REST_VLT_BEGIN and REST_VLT_END, they don't make the code
more readable.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Fix parity checks
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Fix parity checks

x % 1 is obviously wrong, as it always evaluates to 0. You want
x % 2, or x & 1, for parity checking.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Get rid of VRLSB_SHIFT
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Get rid of VRLSB_SHIFT

VRLSB_SHIFT is a non-sense, the actual shift depends on the sensor
type (fans need 4, other sensors need 6). Get rid of it to prevent
any confusion. Also get rid of the useless masking, the meaningful
bits are always the MSb so there's nothing to mask out after
shifting.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Avoid reading the same register twice
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Avoid reading the same register twice

Shorten driver load time by avoiding duplicate register access during
initialization. This saves 112 ms on modprobe on my test system.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Rework beep_enable implementation
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Rework beep_enable implementation

Handle beep_enable just like all other beep bits. It doesn't need
anything special, so let's avoid redundant code. This also saves a
duplicate register read at initialization time.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Report PECI agent Tbase values
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Report PECI agent Tbase values

On systems with PECI, report PECI agent Tbase temperature values.
This is informative only.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Properly handle negative temperatures
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Properly handle negative temperatures

The temperature registers hold regular 2's complement values, no need
to add any arithmetics.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Rename temperature limit attributes
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Rename temperature limit attributes

Follow the standard for temperature limit attribute naming, so that
libsensors will pick the values.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Fix PWM duty cycle frequency attributes
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Fix PWM duty cycle frequency attributes

The PWM duty cycle frequenty attributes are improperly named
(fanN_div instead of pwmN_div) and contain raw values instead of
actual frequencies. Rename them and fix their contents.

Also improve the logic when the user asks for a new frequency, to
always pick the closest supported frequency. The algorithm could
certainly be optimized, but the operation is infrequent enough that
I don't think it's worth the effort.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Add support for dynamic in0-2 limits
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Add support for dynamic in0-2 limits

The W83795G can be configured to set the in0, in1 and/or in2 voltage
limits dynamically based on VID input pins. Switch the respective
sysfs attributes to read-only.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Fix LSB reading of fan speeds
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Fix LSB reading of fan speeds

Misplaced parentheses caused the wrong register value to be read,
resulting in random LSB for fan speed values and limits.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Clean up probe function
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Clean up probe function

* The data structure is zalloc'd, so no need to set individual fields
  to 0 explicitly.
* Refactor the handling of pins that can be used for either
  temperature or voltage monitoring.
* Misc other clean-ups.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Fix in17-in20 gain factor
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Fix in17-in20 gain factor

Gain bit set means 1x gain and cleared means 8x gain, not the other
way around.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Only start monitoring if needed
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Only start monitoring if needed

This saves an SMBus write if monitoring was already enabled.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Add const markers
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Add const markers

Attribute structures can be made const. Same for the I2C address
list.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Only create fan[1-8]_target files when needed
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Only create fan[1-8]_target files when needed

Only create fan[1-8]_target files when the fan in question can be
controlled (PWM output is present.) Also name these files according
to the standard.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Move PWM attributes to a dedidated array
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Move PWM attributes to a dedidated array

Use a dedicated 2D array for PWM attributes. This way, PWM attributes
are handled the same way as other attributes, this is more consistent.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Use 2D arrays for many device attributes
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Use 2D arrays for many device attributes

Use 2D arrays for in, fan, temp and dts device attributes. Using
linear arrays is too risky as we have to skip some groups depending
on the device model and configuration. Adding or removing an
attribute would let the driver build silently but then it would crash
at runtime. With 2D arrays, the consistency checking happens at build
time, which is much safer.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Merge w83795_create_files and w83795_remove_files
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Merge w83795_create_files and w83795_remove_files

Functions w83795_create_files and w83795_remove_files iterate over
the same set of files, just calling a different function. Merge them
into a single function which takes the action as a parameter. This
saves code, and also ensure that file creation and deletion are in
sync.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Move file creation to a separate function too
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Move file creation to a separate function too

Function w83795_probe() is way too big, move file creation to a separate
function to make it more readable.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Move files removal to a separate function
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Move files removal to a separate function

Sysfs files must be removed on device removal but also when device
registration fails. Move the code to a separate function to avoid
code redundancy.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Improve detection routine
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Improve detection routine

Check for additional identification registers. Improve debugging
messages on failed detection.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Refactor bank selection
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Refactor bank selection

Move the bank selection code to a separate function, to avoid
duplicating it in read and write functions. Improve error reporting
on register access error.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Drop duplicate enum
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Drop duplicate enum

Enum chips and chip_types are redundant, get rid of the former. Fix
the detection code to properly identify the chip variant and name the
client accordingly.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (w83795) Misc cleanups
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Misc cleanups

* Improve driver description.
* Drop unused macro.
* Drop unreachable code.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: New driver for the W83795G/ADG monitoring chips
Wei Song [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: New driver for the W83795G/ADG monitoring chips

There is still much work needed, but I wanted to give Wei the credit
he deserves. I've merged some of my own fixes already, to make
gcc and checkpatch happy. Individual fixes and improvements from me
will follow.

[JD: Fix build errors]
[JD: Coding style cleanups]
[JD: Get rid of forward declarations]
[JD: Drop VID support]
[JD: Drop fault output control feature]
[JD: Use lowercase for inline function names]
[JD: Use strict variants of the strtol/ul functions]
[JD: Shorten the read and write function names]

Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (s3c-hwmon) Depend on S3C_ADC
Maurus Cuelenaere [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (s3c-hwmon) Depend on S3C_ADC

This way we don't need to modify Kconfig every time a new SoC comes along to
make this driver support it. Also fix some typos while I'm at it.

Signed-off-by: Maurus Cuelenaere <mcuelenaere@gmail.com>
Reviewed-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (s3c-hwmon) Use a real mutex
Thomas Gleixner [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (s3c-hwmon) Use a real mutex

The semaphore which protects the ADC is semantically a mutex. Use a
real mutex.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm75) Trivial changes to pacify the checkpatch
Shubhrajyoti D [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (lm75) Trivial changes to pacify the checkpatch

Some trivial changes to pacify the checkpatch.

Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm75) Make the writing to sysfs more robust
Shubhrajyoti D [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (lm75) Make the writing to sysfs more robust

Currently we get the checkpatch warning
consider using strict_strtol in preference to simple_strtol.
Also we should not allow any partially numeric values.

Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Add support for the W83L771W/G
Jean Delvare [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (lm90) Add support for the W83L771W/G

I was wondering if that chip ever existed publicly... Apparently yes,
so add support for it.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Alexander Stein <alexander.stein@informatik.tu-chemnitz.de>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
14 years agohwmon: (lm90) Add support for update_interval sysfs attribute
Guenter Roeck [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (lm90) Add support for update_interval sysfs attribute

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Introduce capability flag to indicate broken ALERT functionality
Guenter Roeck [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (lm90) Introduce capability flag to indicate broken ALERT functionality

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Introduce chip parameter structure
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Introduce chip parameter structure

Instead of using switch/case and if statements in probe, define chip specific
functionality in a parameter structure array.

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Rearrange code to no longer require forward declarations
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Rearrange code to no longer require forward declarations

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Add support for max6695 and max6696
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Add support for max6695 and max6696

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Add support for extra features of max6659
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Add support for extra features of max6659

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Add explicit support for max6659
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Add explicit support for max6659

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Simplify set_temp11 register calculations
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Simplify set_temp11 register calculations

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Introduce function to delete sysfs files
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Introduce function to delete sysfs files

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Introduce device feature bits
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Introduce device feature bits

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (lm90) Fix checkpatch errors
Guenter Roeck [Thu, 28 Oct 2010 18:31:42 +0000 (20:31 +0200)]
hwmon: (lm90) Fix checkpatch errors

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: Add tempX_emergency attribute to sysfs ABI
Guenter Roeck [Thu, 28 Oct 2010 18:31:42 +0000 (20:31 +0200)]
hwmon: Add tempX_emergency attribute to sysfs ABI

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agohwmon: (k8temp) Remove superfluous CPU family check
Andreas Herrmann [Thu, 28 Oct 2010 18:31:42 +0000 (20:31 +0200)]
hwmon: (k8temp) Remove superfluous CPU family check

The family check in k8temp is not required because the driver is
already bound to a northbridge device only used with K8 CPUs.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
14 years agoMerge branch 'upstream-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso...
Linus Torvalds [Thu, 28 Oct 2010 04:54:31 +0000 (21:54 -0700)]
Merge branch 'upstream-merge' of git://git./linux/kernel/git/tytso/ext4

* 'upstream-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
  ext4,jbd2: convert tracepoints to use major/minor numbers
  ext4: optimize orphan_list handling for ext4_setattr
  ext4: fix unbalanced mutex unlock in error path of ext4_li_request_new
  ext4: fix compile error in ext4_fallocate()
  ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static
  ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end()
  ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static
  ext4: rename {ext,idx}_pblock and inline small extent functions
  ext4: make various ext4 functions be static
  ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()
  ext4: fix kernel oops if the journal superblock has a non-zero j_errno
  ext4: update writeback_index based on last page scanned
  ext4: implement writeback livelock avoidance using page tagging
  ext4: tidy up a void argument in inode.c
  ext4: add batched_discard into ext4 feature list
  ext4: Add batched discard support for ext4
  fs: Add FITRIM ioctl
  ext4: Use return value from sb_issue_discard()
  ext4: Check return value of sb_getblk() and friends
  ext4: use bio layer instead of buffer layer in mpage_da_submit_io
  ...

14 years agoMerge branch 'next' into upstream-merge
Theodore Ts'o [Thu, 28 Oct 2010 03:44:47 +0000 (23:44 -0400)]
Merge branch 'next' into upstream-merge

Conflicts:
fs/ext4/inode.c
fs/ext4/mballoc.c
include/trace/events/ext4.h

14 years agoMerge branch 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Thu, 28 Oct 2010 03:37:06 +0000 (20:37 -0700)]
Merge branch 'drm-core-next' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: enable unmappable vram for evergreen
  drm/radeon/kms: fix tiled db height calculation on 6xx/7xx
  drm/radeon/kms: fix handling of tex lookup disable in cs checker on r2xx

14 years agoMerge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux...
Linus Torvalds [Thu, 28 Oct 2010 03:13:18 +0000 (20:13 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs-2.6

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (24 commits)
  quota: Fix possible oops in __dquot_initialize()
  ext3: Update kernel-doc comments
  jbd/2: fixed typos
  ext2: fixed typo.
  ext3: Fix debug messages in ext3_group_extend()
  jbd: Convert atomic_inc() to get_bh()
  ext3: Remove misplaced BUFFER_TRACE() in ext3_truncate()
  jbd: Fix debug message in do_get_write_access()
  jbd: Check return value of __getblk()
  ext3: Use DIV_ROUND_UP() on group desc block counting
  ext3: Return proper error code on ext3_fill_super()
  ext3: Remove unnecessary casts on bh->b_data
  ext3: Cleanup ext3_setup_super()
  quota: Fix issuing of warnings from dquot_transfer
  quota: fix dquot_disable vs dquot_transfer race v2
  jbd: Convert bitops to buffer fns
  ext3/jbd: Avoid WARN() messages when failing to write the superblock
  jbd: Use offset_in_page() instead of manual calculation
  jbd: Remove unnecessary goto statement
  jbd: Use printk_ratelimited() in journal_alloc_journal_head()
  ...

14 years agoext4,jbd2: convert tracepoints to use major/minor numbers
Theodore Ts'o [Thu, 28 Oct 2010 02:08:50 +0000 (22:08 -0400)]
ext4,jbd2: convert tracepoints to use major/minor numbers

Unfortunately perf can't deal with anything other than direct structure
accesses in the TP_printk() section.  It will drop dead when it sees
jbd2_dev_to_name() in the "print fmt" section of the tracepoint.

Addresses-Google-Bug: 3138508

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: optimize orphan_list handling for ext4_setattr
Dmitry Monakhov [Thu, 28 Oct 2010 02:08:46 +0000 (22:08 -0400)]
ext4: optimize orphan_list handling for ext4_setattr

Surprisingly chown() on ext4 is not SMP scalable operation.
Due to unconditional orphan_del(NULL, inode) in ext4_setattr()
result in significant performance overhead because of global orphan
mutex, especially in no-journal mode (where orphan_add() is noop).
It is possible to skip explicit orphan_del if possible.
Results of fchown() micro-benchmark in no-journal mode
while (1) {
   iteration++;
   fchown(fd, uid, gid);
   fchown(fd, uid + 1, gid + 1)
}
measured: iterations per millisecond
| nr_tasks | w/o patch | with patch |
|        1 |       142 |        185 |
|        4 |       109 |        642 |

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: fix unbalanced mutex unlock in error path of ext4_li_request_new
Nicolas Kaiser [Thu, 28 Oct 2010 02:08:42 +0000 (22:08 -0400)]
ext4: fix unbalanced mutex unlock in error path of ext4_li_request_new

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoMerge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
Linus Torvalds [Thu, 28 Oct 2010 02:04:36 +0000 (19:04 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/djbw/async_tx

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (48 commits)
  DMAENGINE: move COH901318 to arch_initcall
  dma: imx-dma: fix signedness bug
  dma/timberdale: simplify conditional
  ste_dma40: remove channel_type
  ste_dma40: remove enum for endianess
  ste_dma40: remove TIM_FOR_LINK option
  ste_dma40: move mode_opt to separate config
  ste_dma40: move channel mode to a separate field
  ste_dma40: move priority to separate field
  ste_dma40: add variable to indicate valid dma_cfg
  async_tx: make async_tx channel switching opt-in
  move async raid6 test to lib/Kconfig.debug
  dmaengine: Add Freescale i.MX1/21/27 DMA driver
  intel_mid_dma: change the slave interface
  intel_mid_dma: fix the WARN_ONs
  intel_mid_dma: Add sg list support to DMA driver
  intel_mid_dma: Allow DMAC2 to share interrupt
  intel_mid_dma: Allow IRQ sharing
  intel_mid_dma: Add runtime PM support
  DMAENGINE: define a dummy filter function for ste_dma40
  ...

14 years agoMerge branch 'viafb-next' of git://github.com/schandinat/linux-2.6
Linus Torvalds [Thu, 28 Oct 2010 02:02:41 +0000 (19:02 -0700)]
Merge branch 'viafb-next' of git://github.com/schandinat/linux-2.6

* 'viafb-next' of git://github.com/schandinat/linux-2.6: (29 commits)
  viafb: add initial VX900 support
  viafb: fix hardware acceleration for suspend & resume
  viafb: make suspend and resume work (on all machines?)
  viafb: restore display on resume
  Minimal support for viafb suspend/resume
  viafb: use proper register for colour when doing fill ops
  viafb: add documentation for proc interface
  viafb: rename output devices
  viafb: add a mapping of supported output devices
  viafb: set sync polarity for all output devices
  viafb: add function to change sync polarity per device
  viafb: reduce I2C timeout and delay
  viafb: enable I2C for CRT
  viafb: fix i2c_transfer error handling
  viafb: vt1636 cleanup
  viafb: introduce per output device power management
  viafb: limit LCD code impact
  viafb: add interface for output device configuration
  viafb: merge the remaining output path with enable functions
  viafb: use new device routing
  ...

14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300
Linus Torvalds [Thu, 28 Oct 2010 01:53:26 +0000 (18:53 -0700)]
Merge git://git./linux/kernel/git/dhowells/linux-2.6-mn10300

* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300: (44 commits)
  MN10300: Save frame pointer in thread_info struct rather than global var
  MN10300: Change "Matsushita" to "Panasonic".
  MN10300: Create a defconfig for the ASB2364 board
  MN10300: Update the ASB2303 defconfig
  MN10300: ASB2364: Add support for SMSC911X and SMC911X
  MN10300: ASB2364: Handle the IRQ multiplexer in the FPGA
  MN10300: Generic time support
  MN10300: Specify an ELF HWCAP flag for MN10300 Atomic Operations Unit support
  MN10300: Map userspace atomic op regs as a vmalloc page
  MN10300: And Panasonic AM34 subarch and implement SMP
  MN10300: Delete idle_timestamp from irq_cpustat_t
  MN10300: Make various interrupt priority settings configurable
  MN10300: Optimise do_csum()
  MN10300: Implement atomic ops using atomic ops unit
  MN10300: Make the FPU operate in non-lazy mode under SMP
  MN10300: SMP TLB flushing
  MN10300: Use the [ID]PTEL2 registers rather than [ID]PTEL for TLB control
  MN10300: Make the use of PIDR to mark TLB entries controllable
  MN10300: Rename __flush_tlb*() to local_flush_tlb*()
  MN10300: AM34 erratum requires MMUCTR read and write on exception entry
  ...

14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Thu, 28 Oct 2010 01:52:49 +0000 (18:52 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: usb-audio: automatically detect feedback format
  ASoC: sound/wm9090: add missing __devexit marker
  ASoC: sound/max98088: add missing __devexit marker
  ASoC: sound/ad73311: add missing __devexit marker
  ASoC: fsl - fix build error in pcm030-audio-fabric.c
  sound/oss/sb_ess.c: delete double assignment
  ALSA: hda - Change BTL amp level on some HP notebooks

14 years agoMerge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 28 Oct 2010 01:48:00 +0000 (18:48 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
  perf python scripting: Add futex-contention script
  perf python scripting: Fixup cut'n'paste error in sctop script
  perf scripting: Shut up 'perf record' final status
  perf record: Remove newline character from perror() argument
  perf python scripting: Support fedora 11 (audit 1.7.17)
  perf python scripting: Improve the syscalls-by-pid script
  perf python scripting: print the syscall name on sctop
  perf python scripting: Improve the syscalls-counts script
  perf python scripting: Improve the failed-syscalls-by-pid script
  kprobes: Remove redundant text_mutex lock in optimize
  x86/oprofile: Fix uninitialized variable use in debug printk
  tracing: Fix 'faild' -> 'failed' typo
  perf probe: Fix format specified for Dwarf_Off parameter
  perf trace: Fix detection of script extension
  perf trace: Use $PERF_EXEC_PATH in canned report scripts
  perf tools: Document event modifiers
  perf tools: Remove direct slang.h include
  perf_events: Fix for transaction recovery in group_sched_in()
  perf_events: Revert: Fix transaction recovery in group_sched_in()
  perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations
  ...

14 years agoMerge branch 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux...
Linus Torvalds [Thu, 28 Oct 2010 01:47:39 +0000 (18:47 -0700)]
Merge branch 'module' of git://git./linux/kernel/git/rusty/linux-2.6-for-linus

* 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  NULL-terminate all pci_device_id tables
  (trivial) Fix compiler warning in kernel/modules.c

14 years agoMerge branch 'akpm-incoming-2'
Linus Torvalds [Thu, 28 Oct 2010 01:42:52 +0000 (18:42 -0700)]
Merge branch 'akpm-incoming-2'

* akpm-incoming-2: (139 commits)
  epoll: make epoll_wait() use the hrtimer range feature
  select: rename estimate_accuracy() to select_estimate_accuracy()
  Remove duplicate includes from many files
  ramoops: use the platform data structure instead of module params
  kernel/resource.c: handle reinsertion of an already-inserted resource
  kfifo: fix kfifo_alloc() to return a signed int value
  w1: don't allow arbitrary users to remove w1 devices
  alpha: remove dma64_addr_t usage
  mips: remove dma64_addr_t usage
  sparc: remove dma64_addr_t usage
  fuse: use release_pages()
  taskstats: use real microsecond granularity for CPU times
  taskstats: split fill_pid function
  taskstats: separate taskstats commands
  delayacct: align to 8 byte boundary on 64-bit systems
  delay-accounting: reimplement -c for getdelays.c to report information on a target command
  namespaces Kconfig: move namespace menu location after the cgroup
  namespaces Kconfig: remove the cgroup device whitelist experimental tag
  namespaces Kconfig: remove pointless cgroup dependency
  namespaces Kconfig: make namespace a submenu
  ...

14 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 28 Oct 2010 01:38:55 +0000 (18:38 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  percpu: Remove the multi-page alignment facility
  x86-32: Allocate irq stacks seperate from percpu area
  x86-32, mm: Remove duplicated #include
  x86, printk: Get rid of <0> from stack output
  x86, kexec: Make sure to stop all CPUs before exiting the kernel
  x86/vsmp: Eliminate kconfig dependency warning

14 years agoproc_bus_pci_ioctl: remove pointless BKL usage
Linus Torvalds [Thu, 28 Oct 2010 01:34:59 +0000 (18:34 -0700)]
proc_bus_pci_ioctl: remove pointless BKL usage

The BKL was pushed into this function when it was converted to use the
unlocked_ioctl interface, but nothing that the function touches is
actually protected by the BKL.  So just remove the BKL entirely, so that
we finally can get a realistic system build without the BKL being
enabled at all.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoext4: fix compile error in ext4_fallocate()
Kazuya Mio [Thu, 28 Oct 2010 01:30:15 +0000 (21:30 -0400)]
ext4: fix compile error in ext4_fallocate()

When I compiled 2.6.36-rc3 kernel with EXT4FS_DEBUG definition, I got
the following compile error.

  CC [M]  fs/ext4/extents.o
fs/ext4/extents.c: In function 'ext4_fallocate':
fs/ext4/extents.c:3772: error: 'block' undeclared (first use in this function)
fs/ext4/extents.c:3772: error: (Each undeclared identifier is reported only once
fs/ext4/extents.c:3772: error: for each function it appears in.)
make[2]: *** [fs/ext4/extents.o] Error 1

The patch fixes this problem.

Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static
Eric Sandeen [Thu, 28 Oct 2010 01:30:15 +0000 (21:30 -0400)]
ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static

These functions are only used within fs/ext4/mballoc.c, so move them
so they are used after they are defined, and then make them be static.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: rename mark_bitmap_end() to ext4_mark_bitmap_end()
Theodore Ts'o [Thu, 28 Oct 2010 01:30:15 +0000 (21:30 -0400)]
ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end()

Fix a namespace leak from fs/ext4

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: move flush_completed_IO to fs/ext4/fsync.c and make it static
Theodore Ts'o [Thu, 28 Oct 2010 01:30:14 +0000 (21:30 -0400)]
ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static

Fix a namespace leak by moving the function to the file where it is
used and making it static.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: rename {ext,idx}_pblock and inline small extent functions
Theodore Ts'o [Thu, 28 Oct 2010 01:30:14 +0000 (21:30 -0400)]
ext4: rename {ext,idx}_pblock and inline small extent functions

Cleanup namespace leaks from fs/ext4 and the inline trivial functions
ext4_{ext,idx}_pblock() and ext4_{ext,idx}_store_pblock() since the
code size actually shrinks when we make these functions inline,
they're so trivial.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: make various ext4 functions be static
Theodore Ts'o [Thu, 28 Oct 2010 01:30:14 +0000 (21:30 -0400)]
ext4: make various ext4 functions be static

These functions have no need to be exported beyond file context.

No functions needed to be moved for this commit; just some function
declarations changed to be static and removed from header files.

(A similar patch was submitted by Eric Sandeen, but I wanted to handle
code movement in separate patches to make sure code changes didn't
accidentally get dropped.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()
Theodore Ts'o [Thu, 28 Oct 2010 01:30:14 +0000 (21:30 -0400)]
ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()

This is a cleanup to avoid namespace leaks out of fs/ext4

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: fix kernel oops if the journal superblock has a non-zero j_errno
Theodore Ts'o [Thu, 28 Oct 2010 01:30:13 +0000 (21:30 -0400)]
ext4: fix kernel oops if the journal superblock has a non-zero j_errno

Commit 84061e0 fixed an accounting bug only to introduce the
possibility of a kernel OOPS if the journal has a non-zero j_errno
field indicating that the file system had detected a fs inconsistency.
After the journal replay, if the journal superblock indicates that the
file system has an error, this indication is transfered to the file
system and then ext4_commit_super() is called to write this to the
disk.

But since the percpu counters are now initialized after the journal
replay, the call to ext4_commit_super() will cause a kernel oops since
it needs to use the percpu counters the ext4 superblock structure.

The fix is to skip setting the ext4 free block and free inode fields
if the percpu counter has not been set.

Thanks to Ken Sumrall for reporting and analyzing the root causes of
this bug.

Addresses-Google-Bug: #3054080

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: update writeback_index based on last page scanned
Eric Sandeen [Thu, 28 Oct 2010 01:30:13 +0000 (21:30 -0400)]
ext4: update writeback_index based on last page scanned

As pointed out in a prior patch, updating the mapping's
writeback_index based on pages written isn't quite right;
what the writeback index is really supposed to reflect is
the next page which should be scanned for writeback during
periodic flush.

As in write_cache_pages(), write_cache_pages_da() does
this scanning for us as we assemble the mpd for later
writeout.  If we keep track of the next page after the
current scan, we can easily update writeback_index without
worrying about pages written vs. pages skipped, etc.

Without this, an fsync will reset writeback_index to
0 (its starting index) + however many pages it wrote, which
can mess up the progress of periodic flush.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: implement writeback livelock avoidance using page tagging
Eric Sandeen [Thu, 28 Oct 2010 01:30:13 +0000 (21:30 -0400)]
ext4: implement writeback livelock avoidance using page tagging

This is analogous to Jan Kara's commit,
f446daaea9d4a420d16c606f755f3689dcb2d0ce
mm: implement writeback livelock avoidance using page tagging

but since we forked write_cache_pages, we need to reimplement
it there (and in ext4_da_writepages, since range_cyclic handling
was moved to there)

If you start a large buffered IO to a file, and then set
fsync after it, you'll find that fsync does not complete
until the other IO stops.

If you continue re-dirtying the file (say, putting dd
with conv=notrunc in a loop), when fsync finally completes
(after all IO is done), it reports via tracing that
it has written many more pages than the file contains;
in other words it has synced and re-synced pages in
the file multiple times.

This then leads to problems with our writeback_index
update, since it advances it by pages written, and
essentially sets writeback_index off the end of the
file...

With the following patch, we only sync as much as was
dirty at the time of the sync.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: tidy up a void argument in inode.c
Eric Sandeen [Thu, 28 Oct 2010 01:30:12 +0000 (21:30 -0400)]
ext4: tidy up a void argument in inode.c

This doesn't fix anything at all, it just removes a vestige
of prior use from __mpage_da_writepage()

__mpage_da_writepage() had a *void argument leftover from
its previous life as a callback; make it reflect the actual type.

Fixing this up makes it slightly more obvious to read, and
enables proper typechecking.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: add batched_discard into ext4 feature list
Lukas Czerner [Thu, 28 Oct 2010 01:30:12 +0000 (21:30 -0400)]
ext4: add batched_discard into ext4 feature list

Should be applied on the top of "lazy inode table initialization"
and "batched discard support" patch-sets.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: Add batched discard support for ext4
Lukas Czerner [Thu, 28 Oct 2010 01:30:12 +0000 (21:30 -0400)]
ext4: Add batched discard support for ext4

Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).

It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks
are marked as used and then trimmed. Afterwards these blocks are marked
as free in per-group bitmap.

Since fstrim is a long operation it is good to have an ability to
interrupt it by a signal. This was added by Dmitry Monakhov.
Thanks Dimitry.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agofs: Add FITRIM ioctl
Lukas Czerner [Thu, 28 Oct 2010 01:30:11 +0000 (21:30 -0400)]
fs: Add FITRIM ioctl

Adds an filesystem independent ioctl to allow implementation of file
system batched discard support. I takes fstrim_range structure as an
argument. fstrim_range is definec in the include/fs.h and its
definition is as follows.

struct fstrim_range {
start;
len;
minlen;
}

start - first Byte to trim
len - number of Bytes to trim from start
minlen - minimum extent length to trim, free extents shorter than this
  number of Bytes will be ignored. This will be rounded up to fs
  block size.

It is also possible to specify NULL as an argument. In this case the
arguments will set itself as follows:

start = 0;
len = ULLONG_MAX;
minlen = 0;

So it will trim the whole file system at one run.

After the FITRIM is done, the number of actually discarded Bytes is stored
in fstrim_range.len to give the user better insight on how much storage
space has been really released for wear-leveling.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: Use return value from sb_issue_discard()
Lukas Czerner [Thu, 28 Oct 2010 01:30:11 +0000 (21:30 -0400)]
ext4: Use return value from sb_issue_discard()

Use return value from sb_issue_discard() as return value in
ext4_issue_discard(). Since sb_issue_discard() may result in more
serious errors than just -EOPNOTSUPP it is worth to inform user of this
function about them to handle error cases properly.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: Check return value of sb_getblk() and friends
Namhyung Kim [Thu, 28 Oct 2010 01:30:11 +0000 (21:30 -0400)]
ext4: Check return value of sb_getblk() and friends

Fail block allocation if sb_getblk() returns NULL. In that case,
sb_find_get_block() also likely to fail so that it should skip
calling ext4_forget().

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: use bio layer instead of buffer layer in mpage_da_submit_io
Theodore Ts'o [Thu, 28 Oct 2010 01:30:10 +0000 (21:30 -0400)]
ext4: use bio layer instead of buffer layer in mpage_da_submit_io

Call the block I/O layer directly instad of going through the buffer
layer.  This should give us much better performance and scalability,
as well as lowering our CPU utilization when doing buffered writeback.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: move mpage_put_bnr_to_bhs()'s functionality to mpage_da_submit_io()
Theodore Ts'o [Thu, 28 Oct 2010 01:30:10 +0000 (21:30 -0400)]
ext4: move mpage_put_bnr_to_bhs()'s functionality to mpage_da_submit_io()

This massively simplifies the ext4_da_writepages() code path by
completely removing mpage_put_bnr_bhs(), which is almost 100 lines of
code iterating over a set of pages using pagevec_lookup(), and folds
that functionality into mpage_da_submit_io()'s existing
pagevec_lookup() loop.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: inline walk_page_buffers() into mpage_da_submit_io
Theodore Ts'o [Thu, 28 Oct 2010 01:30:10 +0000 (21:30 -0400)]
ext4: inline walk_page_buffers() into mpage_da_submit_io

Expand the call:

  if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
                        ext4_bh_delay_or_unwritten))
goto redirty_page

into mpage_da_submit_io().

This will allow us to merge in mpage_put_bnr_to_bhs() in the next
patch.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: inline ext4_writepage() into mpage_da_submit_io()
Theodore Ts'o [Thu, 28 Oct 2010 01:30:09 +0000 (21:30 -0400)]
ext4: inline ext4_writepage() into mpage_da_submit_io()

As a prepratory step to switching to bio_submit, inline
ext4_writepage() into mpage_da_submit() and then simplify things a
bit.  This makes it clearer what mpage_da_submit needs to do.

Also, move the ClearPageChecked(page) call into
__ext4_journalled_writepage(), as a minor bit of cleanup refactoring.

This also allows us to pull i_size_read() and
ext4_should_journal_data() out of the loop, which should be a very
minor CPU savings.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: simplify ext4_writepage()
Theodore Ts'o [Thu, 28 Oct 2010 01:30:09 +0000 (21:30 -0400)]
ext4: simplify ext4_writepage()

The actual code in ext4_writepage() is unnecessarily convoluted.
Simplify it so it is easier to understand, but otherwise logically
equivalent.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: call mpage_da_submit_io() from mpage_da_map_blocks()
Theodore Ts'o [Thu, 28 Oct 2010 01:30:09 +0000 (21:30 -0400)]
ext4: call mpage_da_submit_io() from mpage_da_map_blocks()

Eventually we need to completely reorganize the ext4 writepage
callpath, but for now, we simplify things a little by calling
mpage_da_submit_io() from mpage_da_map_blocks(), since all of the
places where we call mpage_da_map_blocks() it is followed up by a call
to mpage_da_submit_io().

We're also a wee bit better with respect to error handling, but there
are still a number of issues where it's not clear what the right thing
is to do with ext4 functions deep in the writeback codepath fails.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: use KMEM_CACHE instead of kmem_cache_create
Theodore Ts'o [Thu, 28 Oct 2010 01:30:09 +0000 (21:30 -0400)]
ext4: use KMEM_CACHE instead of kmem_cache_create

Also remove the SLAB_RECLAIM_ACCOUNT flag from the system zone kmem
cache.  This slab tends to be fairly static, so it shouldn't be marked
as likely to have free pages that can be reclaimed.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: use search_dirblock() in ext4_dx_find_entry()
Theodore Ts'o [Thu, 28 Oct 2010 01:30:08 +0000 (21:30 -0400)]
ext4: use search_dirblock() in ext4_dx_find_entry()

Use the search_dirblock() in ext4_dx_find_entry().  It makes the code
easier to read, and it takes advantage of common code.  It also saves
100 bytes or so of text space.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Brad Spengler <spender@grsecurity.net>
14 years agoext4: avoid uninitialized memory references in ext3_htree_next_block()
Theodore Ts'o [Thu, 28 Oct 2010 01:30:08 +0000 (21:30 -0400)]
ext4: avoid uninitialized memory references in ext3_htree_next_block()

If the first block of htree directory is missing '.' or '..' but is
otherwise a valid directory, and we do a lookup for '.' or '..', it's
possible to dereference an uninitialized memory pointer in
ext4_htree_next_block().

We avoid this by moving the special case from ext4_dx_find_entry() to
ext4_find_entry(); this also means we can optimize ext4_find_entry()
slightly when NFS looks up "..".

Thanks to Brad Spengler for pointing a Clang warning that led me to
look more closely at this code.  The warning was harmless, but it was
useful in pointing out code that was too ugly to live.  This warning was
also reported by Roman Borisov.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Brad Spengler <spender@grsecurity.net>
14 years agoext4: remove unused ext4_sb_info members
Eric Sandeen [Thu, 28 Oct 2010 01:30:08 +0000 (21:30 -0400)]
ext4: remove unused ext4_sb_info members

Not that these take up a lot of room, but the structure is long enough
as it is, and there's no need to confuse people with these various
undocumented & unused structure members...

Signed-off-by: Eric Sandeen <sandeen@redaht.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: queue conversion after adding to inode's completed IO list
Eric Sandeen [Thu, 28 Oct 2010 01:30:07 +0000 (21:30 -0400)]
ext4: queue conversion after adding to inode's completed IO list

By queuing the io end on the unwritten workqueue before adding it
to our inode's list of completed IOs, I think we run the risk
of the work getting completed, and the IO freed, before we try
to add it to the inode's i_completed_io_list.

It should be safe to add it to the inode's list of completed
IOs, and -then- queue it for completion, I think.

Thanks to Dave Chinner for pointing out the race.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: don't use ext4_allocation_contexts for tracing
Eric Sandeen [Thu, 28 Oct 2010 01:30:07 +0000 (21:30 -0400)]
ext4: don't use ext4_allocation_contexts for tracing

Many tracepoints were populating an ext4_allocation_context
to pass in, but this requires a slab allocation even when
tracepoints are off.  In fact, 4 of 5 of these allocations
were only for tracing.  In addition, we were only using a
small fraction of the 144 bytes of this structure for this
purpose.

We can do away with all these alloc/frees of the ac and
simply pass in the bits we care about, instead.

I tested this by turning on tracing and running through
xfstests on x86_64.  I did not actually do anything with
the trace output, however.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: fix oops in trace_ext4_mb_release_group_pa
Eric Sandeen [Thu, 28 Oct 2010 01:30:07 +0000 (21:30 -0400)]
ext4: fix oops in trace_ext4_mb_release_group_pa

Our QA reported an oops in the ext4_mb_release_group_pa tracing,
and Josef Bacik pointed out that it was because we may have a
non-null but uninitialized ac_inode in the allocation context.

I can reproduce it when running xfstests with ext4 tracepoints on,
on a CONFIG_SLAB_DEBUG kernel.

We call trace_ext4_mb_release_group_pa from 2 places,
ext4_mb_discard_group_preallocations and
ext4_mb_discard_lg_preallocations

In both cases we allocate an ac as a container just for tracing (!)
and never fill in the ac_inode.  There's no reason to be assigning,
testing, or printing it as far as I can see, so just remove it from
the tracepoint.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: fix potential infinite loop in ext4_da_writepages()
Toshiyuki Okajima [Thu, 28 Oct 2010 01:30:07 +0000 (21:30 -0400)]
ext4: fix potential infinite loop in ext4_da_writepages()

On linux-2.6.36-rc2, if we execute the following script, we can hang
the system when the /bin/sync command is executed:

========================================================================
#!/bin/sh

echo -n "HANG UP TEST: "
/bin/dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1M 2> /dev/null
/sbin/mkfs.ext4 -Fq /tmp/img
/bin/mount -o loop -t ext4 /tmp/img /mnt
/bin/dd if=/dev/zero of=/mnt/file bs=1 count=1 \
seek=$((16*1024*1024*1024*1024-4096)) 2> /dev/null
/bin/sync
/bin/umount /mnt
echo "DONE"
exit 0
========================================================================

We can see the following backtrace if we get the kdump when this
hangup occurs:

======================================================================
kthread()
=> bdi_writeback_thread()
   => wb_do_writeback()
      => wb_writeback()
         => writeback_inodes_wb()
            => writeback_sb_inodes()
               => writeback_single_inode()
                  => ext4_da_writepages()  ---+
                                ^ infinite    |
                                |   loop      |
                                +-------------+
======================================================================

The reason why this hangup happens is described as follows:
1) We write the last extent block of the file whose size is the filesystem
   maximum size.
2) "BH_Delay" flag is set on the buffer_head of its block.
3) - the member, "m_lblk" of struct mpage_da_data is 4294967295 (UINT_MAX)
   - the member, "m_len" of struct mpage_da_data is 1
  mpage_put_bnr_to_bhs() which is called via ext4_da_writepages()
  cannot clear "BH_Delay" flag of the buffer_head because the type of
  m_lblk is ext4_lblk_t and then m_lblk + m_len is overflow.

  Therefore an infinite loop occurs because ext4_da_writepages()
  cannot write the page (which corresponds to the block) since
  "BH_Delay" flag isn't cleared.
----------------------------------------------------------------------
static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd,
struct ext4_map_blocks *map)
{
...
int blocks = map->m_len;
...
do {
// cur_logical = 4294967295
// map->m_lblk = 4294967295
// blocks = 1
// *** map->m_lblk + blocks == 0 (OVERFLOW!) ***
// (cur_logical >= map->m_lblk + blocks) => true
if (cur_logical >= map->m_lblk + blocks)
break;
----------------------------------------------------------------------

NOTE: Mounting with the nodelalloc option will avoid this codepath,
and thus, avoid this hang

Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: improve llseek error handling for overly large seek offsets
Toshiyuki Okajima [Thu, 28 Oct 2010 01:30:06 +0000 (21:30 -0400)]
ext4: improve llseek error handling for overly large seek offsets

The llseek system call should return EINVAL if passed a seek offset
which results in a write error.  What this maximum offset should be
depends on whether or not the huge_file file system feature is set,
and whether or not the file is extent based or not.

If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be
written (write systemcall) is different from the maximum size which can be
sought (lseek systemcall).

For example, the following 2 cases demonstrates the differences
between the maximum size which can be written, versus the seek offset
allowed by the llseek system call:

#1: mkfs.ext3 <dev>; mount -t ext4 <dev>
#2: mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev>

Table. the max file size which we can write or seek
       at each filesystem feature tuning and file flag setting
+============+===============================+===============================+
| \ File flag|                               |                               |
|      \     |     !EXT4_EXTENTS_FL          |        EXT4_EXTETNS_FL        |
|case       \|                               |                               |
+------------+-------------------------------+-------------------------------+
| #1         |   write:      2194719883264   | write:       --------------   |
|            |   seek:       2199023251456   | seek:        --------------   |
+------------+-------------------------------+-------------------------------+
| #2         |   write:      4402345721856   | write:       17592186044415   |
|            |   seek:      17592186044415   | seek:        17592186044415   |
+------------+-------------------------------+-------------------------------+

The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes
(= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped
maxbytes).  Although generic_file_llseek uses only extent-mapped maxbytes.
(llseek of ext4_file_operations is generic_file_llseek which uses
sb->s_maxbytes.)

Therefore we create ext4 llseek function which uses 2 maxbytes.

The new own function originates from generic_file_llseek().
If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters
inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes.

Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
14 years agoext4: don't update sb journal_devnum when RO dev
Maciej Żenczykowski [Thu, 28 Oct 2010 01:30:06 +0000 (21:30 -0400)]
ext4: don't update sb journal_devnum when RO dev

An ext4 filesystem on a read-only device, with an external journal
which is at a different device number then recorded in the superblock
will fail to honor the read-only setting of the device and trigger
a superblock update (write).

For example:
  - ext4 on a software raid which is in read-only mode
  - external journal on a read-write device which has changed device num
  - attempt to mount with -o journal_dev=<new_number>
  - hits BUG_ON(mddev->ro = 1) in md.c

Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: use sb_issue_zeroout in ext4_ext_zeroout
Lukas Czerner [Thu, 28 Oct 2010 01:30:06 +0000 (21:30 -0400)]
ext4: use sb_issue_zeroout in ext4_ext_zeroout

Change ext4_ext_zeroout to use sb_issue_zeroout instead of its
own approach to zero out extents.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: use sb_issue_zeroout in setup_new_group_blocks
Lukas Czerner [Thu, 28 Oct 2010 01:30:05 +0000 (21:30 -0400)]
ext4: use sb_issue_zeroout in setup_new_group_blocks

Use sb_issue_zeroout to zero out inode table and descriptor table
blocks instead of old approach which involves journaling.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: add interface to advertise ext4 features in sysfs
Lukas Czerner [Thu, 28 Oct 2010 01:30:05 +0000 (21:30 -0400)]
ext4: add interface to advertise ext4 features in sysfs

User-space should have the opportunity to check what features doest ext4
support in each particular copy. This adds easy interface by creating new
"features" directory in sys/fs/ext4/. In that directory files
advertising feature names can be created.

Add lazy_itable_init to the feature list.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>