Jean Delvare [Thu, 28 Oct 2010 18:31:49 +0000 (20:31 +0200)]
hwmon: (w83795) Use dev_get_drvdata() where possible
When we don't need the client pointer, calling dev_get_drvdata() is
more efficient that calling to_i2c_client() and then
i2c_get_clientdata().
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Delay reading pwm config registers
Wait until we need the pwm config register values, instead of
pre-reading them. This saves over 1 second on modprobe on my test
system.
Obviously this time is added when first accessing pwm config
attributes, however not everybody will use them, so it seems unfair
to slow down driver loading (and thus boot) for an optional feature.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Delay reading limit registers
Wait until we need the limit register values, instead of pre-reading
them. This saves 544 ms on modprobe on my test system. Obviously this
time is added when first running "sensors" or any other monitoring
application, but I think it is better than slowing down the boot.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Move register reads to dedicated functions
Move initial register reads out of probe, to dedicated functions.
This makes the code clearer, and will be needed if we want to delay
calling these functions until they are needed, or want to call them
periodically.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Pack similar register reads
Pack similar register reads using for loops.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Make W83795_REG_PWM more efficient
Cascaded conditionals are inefficient. Reorder the fields so that
PWM register addresses can be computed more efficiently.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Don't pre-read values we'll update later
There is no point in reading registers during initialization if we
will refresh the values in the update function later. This is only
slowing down the driver loading with no benefit, stop doing it.
This change saves 480 ms on driver load on my test system.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Simplify temperature sensor type handling
All 3 temperature sensor type sysfs functions (show_temp_mode,
store_temp_mode and show_dts_mode) can be simplified. We don't
create these files when the correponding input isn't in temperature
monitoring mode, so there is no point in handling that case.
Likewise, we don't allow changing inputs from temperature to voltage,
so the code handling this case is dead and can be removed.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Drop _NUM constants
Consistently use ARRAY_SIZE() to control for loops.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:48 +0000 (20:31 +0200)]
hwmon: (w83795) Drop REST_VLT_BEGIN/END
Get rid of REST_VLT_BEGIN and REST_VLT_END, they don't make the code
more readable.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Fix parity checks
x % 1 is obviously wrong, as it always evaluates to 0. You want
x % 2, or x & 1, for parity checking.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Get rid of VRLSB_SHIFT
VRLSB_SHIFT is a non-sense, the actual shift depends on the sensor
type (fans need 4, other sensors need 6). Get rid of it to prevent
any confusion. Also get rid of the useless masking, the meaningful
bits are always the MSb so there's nothing to mask out after
shifting.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Avoid reading the same register twice
Shorten driver load time by avoiding duplicate register access during
initialization. This saves 112 ms on modprobe on my test system.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Rework beep_enable implementation
Handle beep_enable just like all other beep bits. It doesn't need
anything special, so let's avoid redundant code. This also saves a
duplicate register read at initialization time.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Report PECI agent Tbase values
On systems with PECI, report PECI agent Tbase temperature values.
This is informative only.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Properly handle negative temperatures
The temperature registers hold regular 2's complement values, no need
to add any arithmetics.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Rename temperature limit attributes
Follow the standard for temperature limit attribute naming, so that
libsensors will pick the values.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:47 +0000 (20:31 +0200)]
hwmon: (w83795) Fix PWM duty cycle frequency attributes
The PWM duty cycle frequenty attributes are improperly named
(fanN_div instead of pwmN_div) and contain raw values instead of
actual frequencies. Rename them and fix their contents.
Also improve the logic when the user asks for a new frequency, to
always pick the closest supported frequency. The algorithm could
certainly be optimized, but the operation is infrequent enough that
I don't think it's worth the effort.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Add support for dynamic in0-2 limits
The W83795G can be configured to set the in0, in1 and/or in2 voltage
limits dynamically based on VID input pins. Switch the respective
sysfs attributes to read-only.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Fix LSB reading of fan speeds
Misplaced parentheses caused the wrong register value to be read,
resulting in random LSB for fan speed values and limits.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Clean up probe function
* The data structure is zalloc'd, so no need to set individual fields
to 0 explicitly.
* Refactor the handling of pins that can be used for either
temperature or voltage monitoring.
* Misc other clean-ups.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Fix in17-in20 gain factor
Gain bit set means 1x gain and cleared means 8x gain, not the other
way around.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Only start monitoring if needed
This saves an SMBus write if monitoring was already enabled.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Add const markers
Attribute structures can be made const. Same for the I2C address
list.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Only create fan[1-8]_target files when needed
Only create fan[1-8]_target files when the fan in question can be
controlled (PWM output is present.) Also name these files according
to the standard.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:46 +0000 (20:31 +0200)]
hwmon: (w83795) Move PWM attributes to a dedidated array
Use a dedicated 2D array for PWM attributes. This way, PWM attributes
are handled the same way as other attributes, this is more consistent.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Use 2D arrays for many device attributes
Use 2D arrays for in, fan, temp and dts device attributes. Using
linear arrays is too risky as we have to skip some groups depending
on the device model and configuration. Adding or removing an
attribute would let the driver build silently but then it would crash
at runtime. With 2D arrays, the consistency checking happens at build
time, which is much safer.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Merge w83795_create_files and w83795_remove_files
Functions w83795_create_files and w83795_remove_files iterate over
the same set of files, just calling a different function. Merge them
into a single function which takes the action as a parameter. This
saves code, and also ensure that file creation and deletion are in
sync.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Move file creation to a separate function too
Function w83795_probe() is way too big, move file creation to a separate
function to make it more readable.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Move files removal to a separate function
Sysfs files must be removed on device removal but also when device
registration fails. Move the code to a separate function to avoid
code redundancy.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Improve detection routine
Check for additional identification registers. Improve debugging
messages on failed detection.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Refactor bank selection
Move the bank selection code to a separate function, to avoid
duplicating it in read and write functions. Improve error reporting
on register access error.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Drop duplicate enum
Enum chips and chip_types are redundant, get rid of the former. Fix
the detection code to properly identify the chip variant and name the
client accordingly.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:45 +0000 (20:31 +0200)]
hwmon: (w83795) Misc cleanups
* Improve driver description.
* Drop unused macro.
* Drop unreachable code.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Wei Song [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: New driver for the W83795G/ADG monitoring chips
There is still much work needed, but I wanted to give Wei the credit
he deserves. I've merged some of my own fixes already, to make
gcc and checkpatch happy. Individual fixes and improvements from me
will follow.
[JD: Fix build errors]
[JD: Coding style cleanups]
[JD: Get rid of forward declarations]
[JD: Drop VID support]
[JD: Drop fault output control feature]
[JD: Use lowercase for inline function names]
[JD: Use strict variants of the strtol/ul functions]
[JD: Shorten the read and write function names]
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Maurus Cuelenaere [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (s3c-hwmon) Depend on S3C_ADC
This way we don't need to modify Kconfig every time a new SoC comes along to
make this driver support it. Also fix some typos while I'm at it.
Signed-off-by: Maurus Cuelenaere <mcuelenaere@gmail.com>
Reviewed-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Thomas Gleixner [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (s3c-hwmon) Use a real mutex
The semaphore which protects the ADC is semantically a mutex. Use a
real mutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Shubhrajyoti D [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (lm75) Trivial changes to pacify the checkpatch
Some trivial changes to pacify the checkpatch.
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Shubhrajyoti D [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (lm75) Make the writing to sysfs more robust
Currently we get the checkpatch warning
consider using strict_strtol in preference to simple_strtol.
Also we should not allow any partially numeric values.
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (lm90) Add support for the W83L771W/G
I was wondering if that chip ever existed publicly... Apparently yes,
so add support for it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Alexander Stein <alexander.stein@informatik.tu-chemnitz.de>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Guenter Roeck [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (lm90) Add support for update_interval sysfs attribute
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:44 +0000 (20:31 +0200)]
hwmon: (lm90) Introduce capability flag to indicate broken ALERT functionality
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Introduce chip parameter structure
Instead of using switch/case and if statements in probe, define chip specific
functionality in a parameter structure array.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Rearrange code to no longer require forward declarations
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Add support for max6695 and max6696
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Add support for extra features of max6659
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Add explicit support for max6659
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Simplify set_temp11 register calculations
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Introduce function to delete sysfs files
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:43 +0000 (20:31 +0200)]
hwmon: (lm90) Introduce device feature bits
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:42 +0000 (20:31 +0200)]
hwmon: (lm90) Fix checkpatch errors
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Guenter Roeck [Thu, 28 Oct 2010 18:31:42 +0000 (20:31 +0200)]
hwmon: Add tempX_emergency attribute to sysfs ABI
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Andreas Herrmann [Thu, 28 Oct 2010 18:31:42 +0000 (20:31 +0200)]
hwmon: (k8temp) Remove superfluous CPU family check
The family check in k8temp is not required because the driver is
already bound to a northbridge device only used with K8 CPUs.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Linus Torvalds [Thu, 28 Oct 2010 04:54:31 +0000 (21:54 -0700)]
Merge branch 'upstream-merge' of git://git./linux/kernel/git/tytso/ext4
* 'upstream-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
ext4,jbd2: convert tracepoints to use major/minor numbers
ext4: optimize orphan_list handling for ext4_setattr
ext4: fix unbalanced mutex unlock in error path of ext4_li_request_new
ext4: fix compile error in ext4_fallocate()
ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static
ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end()
ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static
ext4: rename {ext,idx}_pblock and inline small extent functions
ext4: make various ext4 functions be static
ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()
ext4: fix kernel oops if the journal superblock has a non-zero j_errno
ext4: update writeback_index based on last page scanned
ext4: implement writeback livelock avoidance using page tagging
ext4: tidy up a void argument in inode.c
ext4: add batched_discard into ext4 feature list
ext4: Add batched discard support for ext4
fs: Add FITRIM ioctl
ext4: Use return value from sb_issue_discard()
ext4: Check return value of sb_getblk() and friends
ext4: use bio layer instead of buffer layer in mpage_da_submit_io
...
Theodore Ts'o [Thu, 28 Oct 2010 03:44:47 +0000 (23:44 -0400)]
Merge branch 'next' into upstream-merge
Conflicts:
fs/ext4/inode.c
fs/ext4/mballoc.c
include/trace/events/ext4.h
Linus Torvalds [Thu, 28 Oct 2010 03:37:06 +0000 (20:37 -0700)]
Merge branch 'drm-core-next' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: enable unmappable vram for evergreen
drm/radeon/kms: fix tiled db height calculation on 6xx/7xx
drm/radeon/kms: fix handling of tex lookup disable in cs checker on r2xx
Linus Torvalds [Thu, 28 Oct 2010 03:13:18 +0000 (20:13 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (24 commits)
quota: Fix possible oops in __dquot_initialize()
ext3: Update kernel-doc comments
jbd/2: fixed typos
ext2: fixed typo.
ext3: Fix debug messages in ext3_group_extend()
jbd: Convert atomic_inc() to get_bh()
ext3: Remove misplaced BUFFER_TRACE() in ext3_truncate()
jbd: Fix debug message in do_get_write_access()
jbd: Check return value of __getblk()
ext3: Use DIV_ROUND_UP() on group desc block counting
ext3: Return proper error code on ext3_fill_super()
ext3: Remove unnecessary casts on bh->b_data
ext3: Cleanup ext3_setup_super()
quota: Fix issuing of warnings from dquot_transfer
quota: fix dquot_disable vs dquot_transfer race v2
jbd: Convert bitops to buffer fns
ext3/jbd: Avoid WARN() messages when failing to write the superblock
jbd: Use offset_in_page() instead of manual calculation
jbd: Remove unnecessary goto statement
jbd: Use printk_ratelimited() in journal_alloc_journal_head()
...
Theodore Ts'o [Thu, 28 Oct 2010 02:08:50 +0000 (22:08 -0400)]
ext4,jbd2: convert tracepoints to use major/minor numbers
Unfortunately perf can't deal with anything other than direct structure
accesses in the TP_printk() section. It will drop dead when it sees
jbd2_dev_to_name() in the "print fmt" section of the tracepoint.
Addresses-Google-Bug:
3138508
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Dmitry Monakhov [Thu, 28 Oct 2010 02:08:46 +0000 (22:08 -0400)]
ext4: optimize orphan_list handling for ext4_setattr
Surprisingly chown() on ext4 is not SMP scalable operation.
Due to unconditional orphan_del(NULL, inode) in ext4_setattr()
result in significant performance overhead because of global orphan
mutex, especially in no-journal mode (where orphan_add() is noop).
It is possible to skip explicit orphan_del if possible.
Results of fchown() micro-benchmark in no-journal mode
while (1) {
iteration++;
fchown(fd, uid, gid);
fchown(fd, uid + 1, gid + 1)
}
measured: iterations per millisecond
| nr_tasks | w/o patch | with patch |
| 1 | 142 | 185 |
| 4 | 109 | 642 |
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Nicolas Kaiser [Thu, 28 Oct 2010 02:08:42 +0000 (22:08 -0400)]
ext4: fix unbalanced mutex unlock in error path of ext4_li_request_new
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Linus Torvalds [Thu, 28 Oct 2010 02:04:36 +0000 (19:04 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (48 commits)
DMAENGINE: move COH901318 to arch_initcall
dma: imx-dma: fix signedness bug
dma/timberdale: simplify conditional
ste_dma40: remove channel_type
ste_dma40: remove enum for endianess
ste_dma40: remove TIM_FOR_LINK option
ste_dma40: move mode_opt to separate config
ste_dma40: move channel mode to a separate field
ste_dma40: move priority to separate field
ste_dma40: add variable to indicate valid dma_cfg
async_tx: make async_tx channel switching opt-in
move async raid6 test to lib/Kconfig.debug
dmaengine: Add Freescale i.MX1/21/27 DMA driver
intel_mid_dma: change the slave interface
intel_mid_dma: fix the WARN_ONs
intel_mid_dma: Add sg list support to DMA driver
intel_mid_dma: Allow DMAC2 to share interrupt
intel_mid_dma: Allow IRQ sharing
intel_mid_dma: Add runtime PM support
DMAENGINE: define a dummy filter function for ste_dma40
...
Linus Torvalds [Thu, 28 Oct 2010 02:02:41 +0000 (19:02 -0700)]
Merge branch 'viafb-next' of git://github.com/schandinat/linux-2.6
* 'viafb-next' of git://github.com/schandinat/linux-2.6: (29 commits)
viafb: add initial VX900 support
viafb: fix hardware acceleration for suspend & resume
viafb: make suspend and resume work (on all machines?)
viafb: restore display on resume
Minimal support for viafb suspend/resume
viafb: use proper register for colour when doing fill ops
viafb: add documentation for proc interface
viafb: rename output devices
viafb: add a mapping of supported output devices
viafb: set sync polarity for all output devices
viafb: add function to change sync polarity per device
viafb: reduce I2C timeout and delay
viafb: enable I2C for CRT
viafb: fix i2c_transfer error handling
viafb: vt1636 cleanup
viafb: introduce per output device power management
viafb: limit LCD code impact
viafb: add interface for output device configuration
viafb: merge the remaining output path with enable functions
viafb: use new device routing
...
Linus Torvalds [Thu, 28 Oct 2010 01:53:26 +0000 (18:53 -0700)]
Merge git://git./linux/kernel/git/dhowells/linux-2.6-mn10300
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300: (44 commits)
MN10300: Save frame pointer in thread_info struct rather than global var
MN10300: Change "Matsushita" to "Panasonic".
MN10300: Create a defconfig for the ASB2364 board
MN10300: Update the ASB2303 defconfig
MN10300: ASB2364: Add support for SMSC911X and SMC911X
MN10300: ASB2364: Handle the IRQ multiplexer in the FPGA
MN10300: Generic time support
MN10300: Specify an ELF HWCAP flag for MN10300 Atomic Operations Unit support
MN10300: Map userspace atomic op regs as a vmalloc page
MN10300: And Panasonic AM34 subarch and implement SMP
MN10300: Delete idle_timestamp from irq_cpustat_t
MN10300: Make various interrupt priority settings configurable
MN10300: Optimise do_csum()
MN10300: Implement atomic ops using atomic ops unit
MN10300: Make the FPU operate in non-lazy mode under SMP
MN10300: SMP TLB flushing
MN10300: Use the [ID]PTEL2 registers rather than [ID]PTEL for TLB control
MN10300: Make the use of PIDR to mark TLB entries controllable
MN10300: Rename __flush_tlb*() to local_flush_tlb*()
MN10300: AM34 erratum requires MMUCTR read and write on exception entry
...
Linus Torvalds [Thu, 28 Oct 2010 01:52:49 +0000 (18:52 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: usb-audio: automatically detect feedback format
ASoC: sound/wm9090: add missing __devexit marker
ASoC: sound/max98088: add missing __devexit marker
ASoC: sound/
ad73311: add missing __devexit marker
ASoC: fsl - fix build error in pcm030-audio-fabric.c
sound/oss/sb_ess.c: delete double assignment
ALSA: hda - Change BTL amp level on some HP notebooks
Linus Torvalds [Thu, 28 Oct 2010 01:48:00 +0000 (18:48 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
perf python scripting: Add futex-contention script
perf python scripting: Fixup cut'n'paste error in sctop script
perf scripting: Shut up 'perf record' final status
perf record: Remove newline character from perror() argument
perf python scripting: Support fedora 11 (audit 1.7.17)
perf python scripting: Improve the syscalls-by-pid script
perf python scripting: print the syscall name on sctop
perf python scripting: Improve the syscalls-counts script
perf python scripting: Improve the failed-syscalls-by-pid script
kprobes: Remove redundant text_mutex lock in optimize
x86/oprofile: Fix uninitialized variable use in debug printk
tracing: Fix 'faild' -> 'failed' typo
perf probe: Fix format specified for Dwarf_Off parameter
perf trace: Fix detection of script extension
perf trace: Use $PERF_EXEC_PATH in canned report scripts
perf tools: Document event modifiers
perf tools: Remove direct slang.h include
perf_events: Fix for transaction recovery in group_sched_in()
perf_events: Revert: Fix transaction recovery in group_sched_in()
perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations
...
Linus Torvalds [Thu, 28 Oct 2010 01:47:39 +0000 (18:47 -0700)]
Merge branch 'module' of git://git./linux/kernel/git/rusty/linux-2.6-for-linus
* 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
NULL-terminate all pci_device_id tables
(trivial) Fix compiler warning in kernel/modules.c
Linus Torvalds [Thu, 28 Oct 2010 01:42:52 +0000 (18:42 -0700)]
Merge branch 'akpm-incoming-2'
* akpm-incoming-2: (139 commits)
epoll: make epoll_wait() use the hrtimer range feature
select: rename estimate_accuracy() to select_estimate_accuracy()
Remove duplicate includes from many files
ramoops: use the platform data structure instead of module params
kernel/resource.c: handle reinsertion of an already-inserted resource
kfifo: fix kfifo_alloc() to return a signed int value
w1: don't allow arbitrary users to remove w1 devices
alpha: remove dma64_addr_t usage
mips: remove dma64_addr_t usage
sparc: remove dma64_addr_t usage
fuse: use release_pages()
taskstats: use real microsecond granularity for CPU times
taskstats: split fill_pid function
taskstats: separate taskstats commands
delayacct: align to 8 byte boundary on 64-bit systems
delay-accounting: reimplement -c for getdelays.c to report information on a target command
namespaces Kconfig: move namespace menu location after the cgroup
namespaces Kconfig: remove the cgroup device whitelist experimental tag
namespaces Kconfig: remove pointless cgroup dependency
namespaces Kconfig: make namespace a submenu
...
Linus Torvalds [Thu, 28 Oct 2010 01:38:55 +0000 (18:38 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
percpu: Remove the multi-page alignment facility
x86-32: Allocate irq stacks seperate from percpu area
x86-32, mm: Remove duplicated #include
x86, printk: Get rid of <0> from stack output
x86, kexec: Make sure to stop all CPUs before exiting the kernel
x86/vsmp: Eliminate kconfig dependency warning
Linus Torvalds [Thu, 28 Oct 2010 01:34:59 +0000 (18:34 -0700)]
proc_bus_pci_ioctl: remove pointless BKL usage
The BKL was pushed into this function when it was converted to use the
unlocked_ioctl interface, but nothing that the function touches is
actually protected by the BKL. So just remove the BKL entirely, so that
we finally can get a realistic system build without the BKL being
enabled at all.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kazuya Mio [Thu, 28 Oct 2010 01:30:15 +0000 (21:30 -0400)]
ext4: fix compile error in ext4_fallocate()
When I compiled 2.6.36-rc3 kernel with EXT4FS_DEBUG definition, I got
the following compile error.
CC [M] fs/ext4/extents.o
fs/ext4/extents.c: In function 'ext4_fallocate':
fs/ext4/extents.c:3772: error: 'block' undeclared (first use in this function)
fs/ext4/extents.c:3772: error: (Each undeclared identifier is reported only once
fs/ext4/extents.c:3772: error: for each function it appears in.)
make[2]: *** [fs/ext4/extents.o] Error 1
The patch fixes this problem.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Thu, 28 Oct 2010 01:30:15 +0000 (21:30 -0400)]
ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static
These functions are only used within fs/ext4/mballoc.c, so move them
so they are used after they are defined, and then make them be static.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:15 +0000 (21:30 -0400)]
ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end()
Fix a namespace leak from fs/ext4
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:14 +0000 (21:30 -0400)]
ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static
Fix a namespace leak by moving the function to the file where it is
used and making it static.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:14 +0000 (21:30 -0400)]
ext4: rename {ext,idx}_pblock and inline small extent functions
Cleanup namespace leaks from fs/ext4 and the inline trivial functions
ext4_{ext,idx}_pblock() and ext4_{ext,idx}_store_pblock() since the
code size actually shrinks when we make these functions inline,
they're so trivial.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:14 +0000 (21:30 -0400)]
ext4: make various ext4 functions be static
These functions have no need to be exported beyond file context.
No functions needed to be moved for this commit; just some function
declarations changed to be static and removed from header files.
(A similar patch was submitted by Eric Sandeen, but I wanted to handle
code movement in separate patches to make sure code changes didn't
accidentally get dropped.)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:14 +0000 (21:30 -0400)]
ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()
This is a cleanup to avoid namespace leaks out of fs/ext4
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:13 +0000 (21:30 -0400)]
ext4: fix kernel oops if the journal superblock has a non-zero j_errno
Commit
84061e0 fixed an accounting bug only to introduce the
possibility of a kernel OOPS if the journal has a non-zero j_errno
field indicating that the file system had detected a fs inconsistency.
After the journal replay, if the journal superblock indicates that the
file system has an error, this indication is transfered to the file
system and then ext4_commit_super() is called to write this to the
disk.
But since the percpu counters are now initialized after the journal
replay, the call to ext4_commit_super() will cause a kernel oops since
it needs to use the percpu counters the ext4 superblock structure.
The fix is to skip setting the ext4 free block and free inode fields
if the percpu counter has not been set.
Thanks to Ken Sumrall for reporting and analyzing the root causes of
this bug.
Addresses-Google-Bug: #
3054080
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Thu, 28 Oct 2010 01:30:13 +0000 (21:30 -0400)]
ext4: update writeback_index based on last page scanned
As pointed out in a prior patch, updating the mapping's
writeback_index based on pages written isn't quite right;
what the writeback index is really supposed to reflect is
the next page which should be scanned for writeback during
periodic flush.
As in write_cache_pages(), write_cache_pages_da() does
this scanning for us as we assemble the mpd for later
writeout. If we keep track of the next page after the
current scan, we can easily update writeback_index without
worrying about pages written vs. pages skipped, etc.
Without this, an fsync will reset writeback_index to
0 (its starting index) + however many pages it wrote, which
can mess up the progress of periodic flush.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Thu, 28 Oct 2010 01:30:13 +0000 (21:30 -0400)]
ext4: implement writeback livelock avoidance using page tagging
This is analogous to Jan Kara's commit,
f446daaea9d4a420d16c606f755f3689dcb2d0ce
mm: implement writeback livelock avoidance using page tagging
but since we forked write_cache_pages, we need to reimplement
it there (and in ext4_da_writepages, since range_cyclic handling
was moved to there)
If you start a large buffered IO to a file, and then set
fsync after it, you'll find that fsync does not complete
until the other IO stops.
If you continue re-dirtying the file (say, putting dd
with conv=notrunc in a loop), when fsync finally completes
(after all IO is done), it reports via tracing that
it has written many more pages than the file contains;
in other words it has synced and re-synced pages in
the file multiple times.
This then leads to problems with our writeback_index
update, since it advances it by pages written, and
essentially sets writeback_index off the end of the
file...
With the following patch, we only sync as much as was
dirty at the time of the sync.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Thu, 28 Oct 2010 01:30:12 +0000 (21:30 -0400)]
ext4: tidy up a void argument in inode.c
This doesn't fix anything at all, it just removes a vestige
of prior use from __mpage_da_writepage()
__mpage_da_writepage() had a *void argument leftover from
its previous life as a callback; make it reflect the actual type.
Fixing this up makes it slightly more obvious to read, and
enables proper typechecking.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Thu, 28 Oct 2010 01:30:12 +0000 (21:30 -0400)]
ext4: add batched_discard into ext4 feature list
Should be applied on the top of "lazy inode table initialization"
and "batched discard support" patch-sets.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Thu, 28 Oct 2010 01:30:12 +0000 (21:30 -0400)]
ext4: Add batched discard support for ext4
Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).
It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks
are marked as used and then trimmed. Afterwards these blocks are marked
as free in per-group bitmap.
Since fstrim is a long operation it is good to have an ability to
interrupt it by a signal. This was added by Dmitry Monakhov.
Thanks Dimitry.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Thu, 28 Oct 2010 01:30:11 +0000 (21:30 -0400)]
fs: Add FITRIM ioctl
Adds an filesystem independent ioctl to allow implementation of file
system batched discard support. I takes fstrim_range structure as an
argument. fstrim_range is definec in the include/fs.h and its
definition is as follows.
struct fstrim_range {
start;
len;
minlen;
}
start - first Byte to trim
len - number of Bytes to trim from start
minlen - minimum extent length to trim, free extents shorter than this
number of Bytes will be ignored. This will be rounded up to fs
block size.
It is also possible to specify NULL as an argument. In this case the
arguments will set itself as follows:
start = 0;
len = ULLONG_MAX;
minlen = 0;
So it will trim the whole file system at one run.
After the FITRIM is done, the number of actually discarded Bytes is stored
in fstrim_range.len to give the user better insight on how much storage
space has been really released for wear-leveling.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Thu, 28 Oct 2010 01:30:11 +0000 (21:30 -0400)]
ext4: Use return value from sb_issue_discard()
Use return value from sb_issue_discard() as return value in
ext4_issue_discard(). Since sb_issue_discard() may result in more
serious errors than just -EOPNOTSUPP it is worth to inform user of this
function about them to handle error cases properly.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Namhyung Kim [Thu, 28 Oct 2010 01:30:11 +0000 (21:30 -0400)]
ext4: Check return value of sb_getblk() and friends
Fail block allocation if sb_getblk() returns NULL. In that case,
sb_find_get_block() also likely to fail so that it should skip
calling ext4_forget().
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:10 +0000 (21:30 -0400)]
ext4: use bio layer instead of buffer layer in mpage_da_submit_io
Call the block I/O layer directly instad of going through the buffer
layer. This should give us much better performance and scalability,
as well as lowering our CPU utilization when doing buffered writeback.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:10 +0000 (21:30 -0400)]
ext4: move mpage_put_bnr_to_bhs()'s functionality to mpage_da_submit_io()
This massively simplifies the ext4_da_writepages() code path by
completely removing mpage_put_bnr_bhs(), which is almost 100 lines of
code iterating over a set of pages using pagevec_lookup(), and folds
that functionality into mpage_da_submit_io()'s existing
pagevec_lookup() loop.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:10 +0000 (21:30 -0400)]
ext4: inline walk_page_buffers() into mpage_da_submit_io
Expand the call:
if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
ext4_bh_delay_or_unwritten))
goto redirty_page
into mpage_da_submit_io().
This will allow us to merge in mpage_put_bnr_to_bhs() in the next
patch.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:09 +0000 (21:30 -0400)]
ext4: inline ext4_writepage() into mpage_da_submit_io()
As a prepratory step to switching to bio_submit, inline
ext4_writepage() into mpage_da_submit() and then simplify things a
bit. This makes it clearer what mpage_da_submit needs to do.
Also, move the ClearPageChecked(page) call into
__ext4_journalled_writepage(), as a minor bit of cleanup refactoring.
This also allows us to pull i_size_read() and
ext4_should_journal_data() out of the loop, which should be a very
minor CPU savings.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:09 +0000 (21:30 -0400)]
ext4: simplify ext4_writepage()
The actual code in ext4_writepage() is unnecessarily convoluted.
Simplify it so it is easier to understand, but otherwise logically
equivalent.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:09 +0000 (21:30 -0400)]
ext4: call mpage_da_submit_io() from mpage_da_map_blocks()
Eventually we need to completely reorganize the ext4 writepage
callpath, but for now, we simplify things a little by calling
mpage_da_submit_io() from mpage_da_map_blocks(), since all of the
places where we call mpage_da_map_blocks() it is followed up by a call
to mpage_da_submit_io().
We're also a wee bit better with respect to error handling, but there
are still a number of issues where it's not clear what the right thing
is to do with ext4 functions deep in the writeback codepath fails.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:09 +0000 (21:30 -0400)]
ext4: use KMEM_CACHE instead of kmem_cache_create
Also remove the SLAB_RECLAIM_ACCOUNT flag from the system zone kmem
cache. This slab tends to be fairly static, so it shouldn't be marked
as likely to have free pages that can be reclaimed.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:08 +0000 (21:30 -0400)]
ext4: use search_dirblock() in ext4_dx_find_entry()
Use the search_dirblock() in ext4_dx_find_entry(). It makes the code
easier to read, and it takes advantage of common code. It also saves
100 bytes or so of text space.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Brad Spengler <spender@grsecurity.net>
Theodore Ts'o [Thu, 28 Oct 2010 01:30:08 +0000 (21:30 -0400)]
ext4: avoid uninitialized memory references in ext3_htree_next_block()
If the first block of htree directory is missing '.' or '..' but is
otherwise a valid directory, and we do a lookup for '.' or '..', it's
possible to dereference an uninitialized memory pointer in
ext4_htree_next_block().
We avoid this by moving the special case from ext4_dx_find_entry() to
ext4_find_entry(); this also means we can optimize ext4_find_entry()
slightly when NFS looks up "..".
Thanks to Brad Spengler for pointing a Clang warning that led me to
look more closely at this code. The warning was harmless, but it was
useful in pointing out code that was too ugly to live. This warning was
also reported by Roman Borisov.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Brad Spengler <spender@grsecurity.net>
Eric Sandeen [Thu, 28 Oct 2010 01:30:08 +0000 (21:30 -0400)]
ext4: remove unused ext4_sb_info members
Not that these take up a lot of room, but the structure is long enough
as it is, and there's no need to confuse people with these various
undocumented & unused structure members...
Signed-off-by: Eric Sandeen <sandeen@redaht.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Thu, 28 Oct 2010 01:30:07 +0000 (21:30 -0400)]
ext4: queue conversion after adding to inode's completed IO list
By queuing the io end on the unwritten workqueue before adding it
to our inode's list of completed IOs, I think we run the risk
of the work getting completed, and the IO freed, before we try
to add it to the inode's i_completed_io_list.
It should be safe to add it to the inode's list of completed
IOs, and -then- queue it for completion, I think.
Thanks to Dave Chinner for pointing out the race.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Thu, 28 Oct 2010 01:30:07 +0000 (21:30 -0400)]
ext4: don't use ext4_allocation_contexts for tracing
Many tracepoints were populating an ext4_allocation_context
to pass in, but this requires a slab allocation even when
tracepoints are off. In fact, 4 of 5 of these allocations
were only for tracing. In addition, we were only using a
small fraction of the 144 bytes of this structure for this
purpose.
We can do away with all these alloc/frees of the ac and
simply pass in the bits we care about, instead.
I tested this by turning on tracing and running through
xfstests on x86_64. I did not actually do anything with
the trace output, however.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Thu, 28 Oct 2010 01:30:07 +0000 (21:30 -0400)]
ext4: fix oops in trace_ext4_mb_release_group_pa
Our QA reported an oops in the ext4_mb_release_group_pa tracing,
and Josef Bacik pointed out that it was because we may have a
non-null but uninitialized ac_inode in the allocation context.
I can reproduce it when running xfstests with ext4 tracepoints on,
on a CONFIG_SLAB_DEBUG kernel.
We call trace_ext4_mb_release_group_pa from 2 places,
ext4_mb_discard_group_preallocations and
ext4_mb_discard_lg_preallocations
In both cases we allocate an ac as a container just for tracing (!)
and never fill in the ac_inode. There's no reason to be assigning,
testing, or printing it as far as I can see, so just remove it from
the tracepoint.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Toshiyuki Okajima [Thu, 28 Oct 2010 01:30:07 +0000 (21:30 -0400)]
ext4: fix potential infinite loop in ext4_da_writepages()
On linux-2.6.36-rc2, if we execute the following script, we can hang
the system when the /bin/sync command is executed:
========================================================================
#!/bin/sh
echo -n "HANG UP TEST: "
/bin/dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1M 2> /dev/null
/sbin/mkfs.ext4 -Fq /tmp/img
/bin/mount -o loop -t ext4 /tmp/img /mnt
/bin/dd if=/dev/zero of=/mnt/file bs=1 count=1 \
seek=$((16*1024*1024*1024*1024-4096)) 2> /dev/null
/bin/sync
/bin/umount /mnt
echo "DONE"
exit 0
========================================================================
We can see the following backtrace if we get the kdump when this
hangup occurs:
======================================================================
kthread()
=> bdi_writeback_thread()
=> wb_do_writeback()
=> wb_writeback()
=> writeback_inodes_wb()
=> writeback_sb_inodes()
=> writeback_single_inode()
=> ext4_da_writepages() ---+
^ infinite |
| loop |
+-------------+
======================================================================
The reason why this hangup happens is described as follows:
1) We write the last extent block of the file whose size is the filesystem
maximum size.
2) "BH_Delay" flag is set on the buffer_head of its block.
3) - the member, "m_lblk" of struct mpage_da_data is
4294967295 (UINT_MAX)
- the member, "m_len" of struct mpage_da_data is 1
mpage_put_bnr_to_bhs() which is called via ext4_da_writepages()
cannot clear "BH_Delay" flag of the buffer_head because the type of
m_lblk is ext4_lblk_t and then m_lblk + m_len is overflow.
Therefore an infinite loop occurs because ext4_da_writepages()
cannot write the page (which corresponds to the block) since
"BH_Delay" flag isn't cleared.
----------------------------------------------------------------------
static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd,
struct ext4_map_blocks *map)
{
...
int blocks = map->m_len;
...
do {
// cur_logical =
4294967295
// map->m_lblk =
4294967295
// blocks = 1
// *** map->m_lblk + blocks == 0 (OVERFLOW!) ***
// (cur_logical >= map->m_lblk + blocks) => true
if (cur_logical >= map->m_lblk + blocks)
break;
----------------------------------------------------------------------
NOTE: Mounting with the nodelalloc option will avoid this codepath,
and thus, avoid this hang
Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Toshiyuki Okajima [Thu, 28 Oct 2010 01:30:06 +0000 (21:30 -0400)]
ext4: improve llseek error handling for overly large seek offsets
The llseek system call should return EINVAL if passed a seek offset
which results in a write error. What this maximum offset should be
depends on whether or not the huge_file file system feature is set,
and whether or not the file is extent based or not.
If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be
written (write systemcall) is different from the maximum size which can be
sought (lseek systemcall).
For example, the following 2 cases demonstrates the differences
between the maximum size which can be written, versus the seek offset
allowed by the llseek system call:
#1: mkfs.ext3 <dev>; mount -t ext4 <dev>
#2: mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev>
Table. the max file size which we can write or seek
at each filesystem feature tuning and file flag setting
+============+===============================+===============================+
| \ File flag| | |
| \ | !EXT4_EXTENTS_FL | EXT4_EXTETNS_FL |
|case \| | |
+------------+-------------------------------+-------------------------------+
| #1 | write:
2194719883264 | write: -------------- |
| | seek:
2199023251456 | seek: -------------- |
+------------+-------------------------------+-------------------------------+
| #2 | write:
4402345721856 | write:
17592186044415 |
| | seek:
17592186044415 | seek:
17592186044415 |
+------------+-------------------------------+-------------------------------+
The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes
(= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped
maxbytes). Although generic_file_llseek uses only extent-mapped maxbytes.
(llseek of ext4_file_operations is generic_file_llseek which uses
sb->s_maxbytes.)
Therefore we create ext4 llseek function which uses 2 maxbytes.
The new own function originates from generic_file_llseek().
If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters
inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes.
Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>