Artem Bityutskiy [Thu, 22 Mar 2012 02:34:55 +0000 (22:34 -0400)]
vfs: remove unused superblock helpers
Remove the 'sb_mark_dirty()', 'sb_mark_clean()' and 'sb_is_dirty()'
helpers which are not used. I introduced them 2 years and the
intention was to make all file-systems use them in order to be able to
optimize 'sync_supers()'. However, Al Viro vetoed my patches at the
end and asked me to push superblock management down to file-systems
and get rid of the 's_dirt' flag completely, as well as kill
'sync_supers()' altogether. Thus, remove the helpers.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Artem Bityutskiy [Thu, 22 Mar 2012 02:33:00 +0000 (22:33 -0400)]
mm: export dirty_writeback_interval
Export 'dirty_writeback_interval' to make it visible to
file-systems. We are going to push superblock management down to
file-systems and get rid of the 'sync_supers' kernel thread completly.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Artem Bityutskiy [Thu, 22 Mar 2012 02:30:06 +0000 (22:30 -0400)]
ext4: remove useless s_dirt assignment
Clean-up ext4 a tiny bit by removing useless s_dirt assignment in
'ext4_fill_super()' because a bit later we anyway call
'ext4_setup_super()' which writes the superblock to the media
unconditionally.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Artem Bityutskiy [Thu, 22 Mar 2012 02:29:15 +0000 (22:29 -0400)]
ext4: write superblock only once on unmount
In some rather rare cases it is possible that ext4 may the superblock
to the media twice. This patch makes sure this does not happen. This
should speed up unmounting in those cases.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Artem Bityutskiy [Thu, 22 Mar 2012 02:28:29 +0000 (22:28 -0400)]
ext4: do not mark superblock as dirty unnecessarily
Commit
a0375156ca1041574b5d47cc7e32f10b891151b0 cleaned up superblock
dirtying handling, but missed one place. This patch does what was
intended: if we have the journal, then we update the superblock
through the journal rather than doing this directly.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Allison Henderson [Thu, 22 Mar 2012 02:23:31 +0000 (22:23 -0400)]
ext4: correct ext4_punch_hole return codes
ext4_punch_hole returns -ENOTSUPP but it should be using -EOPNOTSUPP
Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Thu, 22 Mar 2012 01:47:55 +0000 (21:47 -0400)]
ext4: remove restrictive checks for EOFBLOCKS_FL
We are going to remove the EOFBLOCKS_FL flag in the future, so this is
the first part of the removal. We can not remove it entirely just now,
since the e2fsck is still checking for it and it might cause headache to
some people. Instead, remove the restrictive checks now and the rest
later, when the new e2fsck code is out and common enough.
This is also needed because punch hole already breaks the EOFBLOCKS_FL
semantics, so it might cause the some troubles. So simply remove it.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Thu, 22 Mar 2012 01:26:22 +0000 (21:26 -0400)]
ext4: always set then trimmed blocks count into len
Currently if the range to trim is too small, for example on 1K fs
the request to trim the first block, then the 'range->len' is not set
reporting wrong number of discarded block to the caller.
Fix this by always setting the 'range->len' before we return. Note that
when there is a failure (-EINVAL) caller can not depend on 'range->len'
being set properly.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Thu, 22 Mar 2012 01:24:22 +0000 (21:24 -0400)]
ext4: fix trimmed block count accunting
Currently when there is not enough free blocks in the block group to
discard (grp->bb_free < minlen) the 'trimmed' is bumped up anyway with
the number of discarded blocks from the previous iteration. Fix this
by bumping up 'trimmed' only if the ext4_trim_all_free() was actually
run.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Thu, 22 Mar 2012 01:22:22 +0000 (21:22 -0400)]
ext4: fix start and len arguments handling in ext4_trim_fs()
The overflow can happen when we are calling get_group_no_and_offset()
which stores the group number in the ext4_grpblk_t type which is
actually int. However when the blocknr is big enough the group number
might be bigger than ext4_grpblk_t resulting in overflow. This will
most likely happen with FITRIM default argument len = ULLONG_MAX.
Fix this by using "end" variable instead of "start+len" as it is easier
to get right and specifically check that the end is not beyond the end
of the file system, so we are sure that the result of
get_group_no_and_offset() will not overflow. Otherwise truncate it to
the size of the file system.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Tue, 20 Mar 2012 19:46:11 +0000 (15:46 -0400)]
ext4: update s_free_{inodes,blocks}_count during online resize
When we're doing an online resize of an ext4 filesystem, we need to
update the free inode and block counts in the superblock so that fsck
doesn't complain.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 20 Mar 2012 03:41:49 +0000 (23:41 -0400)]
ext4: change some printk() calls to use ext4_msg() instead
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Joe Perches [Tue, 20 Mar 2012 03:15:43 +0000 (23:15 -0400)]
ext4: avoid output message interleaving in ext4_error_<foo>()
Using KERN_CONT means that messages from multiple threads may be
interleaved. Avoid this by using a single printk call in
ext4_error_inode and ext4_error_file.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 20 Mar 2012 03:13:43 +0000 (23:13 -0400)]
ext4: remove trailing newlines from ext4_msg() and ext4_error() messages
The functions ext4_msg() and ext4_error() already tack on a trailing
newline, so remove the unnecessary extra newline.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Joe Perches [Tue, 20 Mar 2012 03:11:43 +0000 (23:11 -0400)]
ext4: add no_printk argument validation, fix fallout
Add argument validation to debug functions.
Use ##__VA_ARGS__.
Fix format and argument mismatches.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Joe Perches [Tue, 20 Mar 2012 03:09:43 +0000 (23:09 -0400)]
ext4: remove redundant "EXT4-fs: " from uses of ext4_msg
ext4_msg adds "EXT4-fs: " to the messsage output.
Remove the redundant bits from uses.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Tue, 20 Mar 2012 03:07:43 +0000 (23:07 -0400)]
ext4: give more helpful error message in ext4_ext_rm_leaf()
The error message produced by the ext4_ext_rm_leaf() when we are
removing blocks which accidentally ends up inside the existing extent,
is not very helpful, because we would like to also know which extent did
we collide with.
This commit changes the error message to get us also the information
about the extent we are colliding with.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Tue, 20 Mar 2012 03:05:43 +0000 (23:05 -0400)]
ext4: remove unused code from ext4_ext_map_blocks()
Since the commit 'Rewrite punch hole to use ext4_ext_remove_space()'
reworked the punch hole implementation to use ext4_ext_remove_space()
instead of ext4_ext_map_blocks(), we can remove the code which is no
longer needed from the ext4_ext_map_blocks().
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Tue, 20 Mar 2012 03:03:19 +0000 (23:03 -0400)]
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Wed, 14 Mar 2012 02:45:38 +0000 (22:45 -0400)]
jbd2: cleanup journal tail after transaction commit
Normally, we have to issue a cache flush before we can update journal tail in
journal superblock, effectively wiping out old transactions from the journal.
So use the fact that during transaction commit we issue cache flush anyway and
opportunistically push journal tail as far as we can. Since update of journal
superblock is still costly (we have to use WRITE_FUA), we update log tail only
if we can free significant amount of space.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Wed, 14 Mar 2012 02:45:25 +0000 (22:45 -0400)]
jbd2: remove bh_state lock from checkpointing code
All accesses to checkpointing entries in journal_head are protected
by j_list_lock. Thus __jbd2_journal_remove_checkpoint() doesn't really
need bh_state lock.
Also the only part of journal head that the rest of checkpointing code
needs to check is jh->b_transaction which is safe to read under
j_list_lock.
So we can safely remove bh_state lock from all of checkpointing code which
makes it considerably prettier.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Wed, 14 Mar 2012 02:27:44 +0000 (22:27 -0400)]
jbd2: remove always true condition in __journal_try_to_free_buffer()
The check b_jlist == BJ_None in __journal_try_to_free_buffer() is
always true (__jbd2_journal_temp_unlink_buffer() also checks this in
an assertion) so just remove it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Wed, 14 Mar 2012 02:25:06 +0000 (22:25 -0400)]
jbd2: declare __jbd2_journal_temp_unlink_buffer() static
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Wed, 14 Mar 2012 02:24:54 +0000 (22:24 -0400)]
jbd2: fix BH_JWrite setting in checkpointing code
BH_JWrite bit should be set when buffer is written to the journal. So
checkpointing shouldn't set this bit when writing out buffer. This didn't
cause any observable bug since BH_JWrite bit is used only for debugging
purposes but it's good to have this consistent.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Wed, 14 Mar 2012 02:22:54 +0000 (22:22 -0400)]
jbd2: issue cache flush after checkpointing even with internal journal
When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
checkpointed buffers are on a stable storage - especially if buffers were
written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
caches. Thus when we update journal superblock effectively removing old
transaction from journal, this write of superblock can get to stable storage
before those checkpointed buffers which can result in filesystem corruption
after a crash. Thus we must unconditionally issue a cache flush before we
update journal superblock in these cases.
A similar problem can also occur if journal superblock is written only in
disk's caches, other transaction starts reusing space of the transaction
cleaned from the log and power failure happens. Subsequent journal replay would
still try to replay the old transaction but some of it's blocks may be already
overwritten by the new transaction. For this reason we must use WRITE_FUA when
updating log tail and we must first write new log tail to disk and update
in-memory information only after that.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Tue, 13 Mar 2012 19:43:04 +0000 (15:43 -0400)]
jbd2: protect all log tail updates with j_checkpoint_mutex
There are some log tail updates that are not protected by j_checkpoint_mutex.
Some of these are harmless because they happen during startup or shutdown but
updates in jbd2_journal_commit_transaction() and jbd2_journal_flush() can
really race with other log tail updates (e.g. someone doing
jbd2_journal_flush() with someone running jbd2_cleanup_journal_tail()). So
protect all log tail updates with j_checkpoint_mutex.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Tue, 13 Mar 2012 19:41:04 +0000 (15:41 -0400)]
jbd2: split updating of journal superblock and marking journal empty
There are three case of updating journal superblock. In the first case, we want
to mark journal as empty (setting s_sequence to 0), in the second case we want
to update log tail, in the third case we want to update s_errno. Split these
cases into separate functions. It makes the code slightly more straightforward
and later patches will make the distinction even more important.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 12 Mar 2012 03:30:16 +0000 (23:30 -0400)]
ext4: check for zero length extent
Explicitly test for an extent whose length is zero, and flag that as a
corrupted extent.
This avoids a kernel BUG_ON assertion failure.
Tested: Without this patch, the file system image found in
tests/f_ext_zero_len/image.gz in the latest e2fsprogs sources causes a
kernel panic. With this patch, an ext4 file system error is noted
instead, and the file system is marked as being corrupted.
https://bugzilla.kernel.org/show_bug.cgi?id=42859
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Curt Wohlgemuth [Mon, 5 Mar 2012 15:40:22 +0000 (10:40 -0500)]
ext4: add comments to definition of ext4_io_end_t
This should make it more clear what this structure is used
for, and how some of the (mutually exclusive) fields are
used to keep page cache references.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Curt Wohlgemuth [Mon, 5 Mar 2012 15:40:15 +0000 (10:40 -0500)]
ext4: don't release page refs in ext4_end_bio()
We can clear PageWriteback on each page when the IO
completes, but we can't release the references on the page
until we convert any uninitialized extents.
Without this patch, the use of the dioread_nolock mount
option can break buffered writes, because extents may
not be converted by the time a subsequent buffered read
comes in; if the page is not in the page cache, a read
will return zeros if the extent is still uninitialized.
I tested this with a (temporary) patch that adds a call
to msleep(1000) at the start of ext4_end_io_work(), to delay
processing of each DIO-unwritten work queue item. With this
msleep(), a simple workload of
fallocate
write
fadvise
read
will fail without this patch, succeeds with it.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jeff Moyer [Mon, 5 Mar 2012 15:29:52 +0000 (10:29 -0500)]
ext4: fix race between sync and completed io work
The following command line will leave the aio-stress process unkillable
on an ext4 file system (in my case, mounted on /mnt/test):
aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2
This is using the aio-stress program from the xfstests test suite.
That particular command line tells aio-stress to do random writes to
20 files from 20 threads (one thread per file). The files are NOT
preallocated, so you will get writes to random offsets within the
file, thus creating holes and extending i_size. It also opens the
file with O_DIRECT and O_SYNC.
On to the problem. When an I/O requires unwritten extent conversion,
it is queued onto the completed_io_list for the ext4 inode. Two code
paths will pull work items from this list. The first is the
ext4_end_io_work routine, and the second is ext4_flush_completed_IO,
which is called via the fsync path (and O_SYNC handling, as well).
There are two issues I've found in these code paths. First, if the
fsync path beats the work routine to a particular I/O, the work
routine will free the io_end structure! It does not take into account
the fact that the io_end may still be in use by the fsync path. I've
fixed this issue by adding yet another IO_END flag, indicating that
the io_end is being processed by the fsync path.
The second problem is that the work routine will make an assignment to
io->flag outside of the lock. I have witnessed this result in a hang
at umount. Moving the flag setting inside the lock resolved that
problem.
The problem was introduced by commit
b82e384c7b ("ext4: optimize
locking for end_io extent conversion"), which first appeared in 3.2.
As such, the fix should be backported to that release (probably along
with the unwritten extent conversion race fix).
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
CC: stable@kernel.org
Jeff Moyer [Mon, 5 Mar 2012 15:19:52 +0000 (10:19 -0500)]
ext4: clean up the flags passed to __blockdev_direct_IO
For extent-based files, you can perform DIO to holes, as mentioned in
the comments in ext4_ext_direct_IO. However, that function passes
DIO_SKIP_HOLES to __blockdev_direct_IO, which is *really* confusing to
the uninitiated reader. The key, here, is that the get_block function
passed in, ext4_get_block_write, completely ignores the create flag
that is passed to it (the create flag is passed in from the direct I/O
code, which uses the DIO_SKIP_HOLES flag to determine whether or not
it should be cleared).
This is a long-winded way of saying that the DIO_SKIP_HOLES flag is
ultimately ignored. So let's remove it.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 5 Mar 2012 03:06:20 +0000 (22:06 -0500)]
ext4: try to deprecate noacl and noxattr_user mount options
No other file system allows ACL's and extended attributes to be
enabled or disabled via a mount option. So let's try to deprecate
these options from ext4.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 5 Mar 2012 03:00:53 +0000 (22:00 -0500)]
ext4: ignore mount options supported by ext2/3 (but have since been removed)
Users who tried to use the ext4 file system driver is being used for
the ext2 or ext3 file systems (via the CONFIG_EXT4_USE_FOR_EXT23
option) could have failed mounts if their /etc/fstab contains options
recognized by ext2 or ext3 but which have since been removed in ext4.
So teach ext4 to recognize them and give a warning that the mount
option was removed.
Report: https://bbs.archlinux.org/profile.php?id=33804
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Baechler <thomas@archlinux.org>
Cc: Tobias Powalowski <tobias.powalowski@googlemail.com>
Cc: Dave Reisner <d@falconindy.com>
Theodore Ts'o [Mon, 5 Mar 2012 01:21:38 +0000 (20:21 -0500)]
ext4: add debugging /proc file showing file system options
Now that /proc/mounts is consistently showing only those mount options
which need to be specified in /etc/fstab or on the mount command line,
it is useful to have file which shows exactly which file system
options are enabled. This can be useful when debugging a user
problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 5 Mar 2012 00:27:31 +0000 (19:27 -0500)]
ext4: make ext4_show_options() be table-driven
Consistently show mount options which are the non-default, so that
/proc/mounts accurately shows the mount options that would be
necessary to mount the file system in its current mode of operation.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sun, 4 Mar 2012 04:20:50 +0000 (23:20 -0500)]
ext4: move ext4_show_options() after parse_options()
This commit is strictly a code movement so in preparation of changing
ext4_show_options to be table driven.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sun, 4 Mar 2012 04:20:47 +0000 (23:20 -0500)]
ext4: use a table-driven handler for mount options
By using a table-drive approach, we shave about 100 lines of code from
ext4, and make the code a bit more regular and factored out. This
will also make it possible in a future patch to use this table for
displaying the mount options that were specified in /proc/mounts.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 3 Mar 2012 23:04:40 +0000 (18:04 -0500)]
ext4: unify handling of mount options which have been removed
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 3 Mar 2012 22:56:23 +0000 (17:56 -0500)]
ext4: simplify handling of the errors=* mount options
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Fri, 2 Mar 2012 17:23:11 +0000 (12:23 -0500)]
ext4: remove the I_VERSION mount flag and use the super_block flag instead
There's no point to have two bits that are set in parallel; so use the
MS_I_VERSION flag that is needed by the VFS anyway, and that way we
free up a bit in sbi->s_mount_opts.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Fri, 2 Mar 2012 17:14:24 +0000 (12:14 -0500)]
ext4: remove Opt_ignore
This is completely unused so let's just get rid of it.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Fri, 2 Mar 2012 05:03:21 +0000 (00:03 -0500)]
ext4: remove deprecation warnings for minix_df and grpid
People complained about removing both of these features, so per
Linus's dictate, we won't be able to remove them. Sigh...
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Santosh Nayak [Mon, 27 Feb 2012 06:09:03 +0000 (01:09 -0500)]
ext4: Fix endianness bug when reading the MMP block
Sparse complained about this endian bug in fs/ext4/mmp.c.
Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Zheng Liu [Tue, 21 Feb 2012 04:09:36 +0000 (23:09 -0500)]
ext4: format flag in dx_probe()
Fix ext4_warning format flag in dx_probe().
CC: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Tue, 21 Feb 2012 04:06:18 +0000 (23:06 -0500)]
ext4: avoid deadlock on sync-mounted FS w/o journal
Processes hang forever on a sync-mounted ext2 file system that
is mounted with the ext4 module (default in Fedora 16).
I can reproduce this reliably by mounting an ext2 partition with
"-o sync" and opening a new file an that partition with vim. vim
will hang in "D" state forever. The same happens on ext4 without
a journal.
I am attaching a small patch here that solves this issue for me.
In the sync mounted case without a journal,
ext4_handle_dirty_metadata() may call sync_dirty_buffer(), which
can't be called with buffer lock held.
Also move mb_cache_entry_release inside lock to avoid race
fixed previously by
8a2bfdcb ext[34]: EA block reference count racing fix
Note too that ext2 fixed this same problem in 2006 with
b2f49033 [PATCH] fix deadlock in ext2
Signed-off-by: Martin.Wilck@ts.fujitsu.com
[sandeen@redhat.com: move mb_cache_entry_release before unlock, edit commit msg]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Tue, 21 Feb 2012 04:02:06 +0000 (23:02 -0500)]
ext4: fix resize when resizing within single group
When resizing file system in the way that the new size of the file
system is still in the same group (no new groups are added), then we can
hit a BUG_ON in ext4_alloc_group_tables()
BUG_ON(flex_gd->count == 0 || group_data == NULL);
because flex_gd->count is zero. The reason is the missing check for such
case, so the code always extend the last group fully and then attempt to
add more groups, but at that time n_blocks_count is actually smaller
than o_blocks_count.
It can be easily reproduced like this:
mkfs.ext4 -b 4096 /dev/sda 30M
mount /dev/sda /mnt/test
resize2fs /dev/sda 50M
Fix this by checking whether the resize happens within the singe group
and only add that many blocks into the last group to satisfy user
request. Then o_blocks_count == n_blocks_count and the resize will exit
successfully without and attempt to add more groups into the fs.
Also fix mixing together block number and blocks count which might be
confusing and can easily lead to off-by-one errors (but it is actually
not the case here since the two occurrence of this mix-up will cancel
each other).
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Milan Broz <mbroz@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jeff Moyer [Mon, 20 Feb 2012 22:59:24 +0000 (17:59 -0500)]
ext4: fix race between unwritten extent conversion and truncate
The following comment in ext4_end_io_dio caught my attention:
/* XXX: probably should move into the real I/O completion handler */
inode_dio_done(inode);
The truncate code takes i_mutex, then calls inode_dio_wait. Because the
ext4 code path above will end up dropping the mutex before it is
reacquired by the worker thread that does the extent conversion, it
seems to me that the truncate can happen out of order. Jan Kara
mentioned that this might result in error messages in the system logs,
but that should be the extent of the "damage."
The fix is pretty straight-forward: don't call inode_dio_done until the
extent conversion is complete.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Heiko Carstens [Mon, 20 Feb 2012 22:57:24 +0000 (17:57 -0500)]
ext4: fix balloc.c printk-format-warning
Get rid of this one:
fs/ext4/balloc.c: In function 'ext4_wait_block_bitmap':
fs/ext4/balloc.c:405:3: warning: format '%llu' expects argument of
type 'long long unsigned int', but argument 6 has type 'sector_t' [-Wformat]
Happens because sector_t is u64 (unsigned long long) or unsigned long
dependent on CONFIG_64BIT.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 20 Feb 2012 22:54:06 +0000 (17:54 -0500)]
ext4: remove EXT4_MB_{BITMAP,BUDDY} macros
The EXT4_MB_BITMAP and EXT4_MB_BUDDY macros obfuscate more than they
provide any abstraction. So remove them.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Dan Carpenter [Mon, 20 Feb 2012 22:53:06 +0000 (17:53 -0500)]
ext4: using PTR_ERR() on the wrong variable in ext4_ext_migrate()
"inode" is a valid pointer here. "tmp_inode" was intended.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Dan Carpenter [Mon, 20 Feb 2012 22:53:05 +0000 (17:53 -0500)]
ext4: remove an unneeded NULL check in __ext4_check_dir_entry()
We dereference "bh" unconditionally a couple lines down to find
"by->b_size". This function is never called with a NULL "bh" so I have
removed the check.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Zheng Liu [Mon, 20 Feb 2012 22:53:05 +0000 (17:53 -0500)]
ext4: remove unneeded variable in ext4_xattr_check_block()
We could return directly from ext4_xattr_check_block(). Thus, we
shouldn't need to define a 'error' variable.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Mon, 20 Feb 2012 22:53:04 +0000 (17:53 -0500)]
ext4: remove the resize mount option
The resize mount option seems to be of limited value,
especially in the age of online resize2fs. Nuke it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Mon, 20 Feb 2012 22:53:04 +0000 (17:53 -0500)]
ext4: remove the journal=update mount option
The V2 journal format was introduced around ten years ago,
for ext3. It seems highly unlikely that anyone will need this
migration option for ext4.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Curt Wohlgemuth [Mon, 20 Feb 2012 22:53:03 +0000 (17:53 -0500)]
ext4: mark possibly unused variable in ext4_mb_normalize_request()
The 'orig_size' local variable is only used in a call to
mb_debug(). Mark it with '__maybe_unused'.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Yongqiang Yang [Mon, 20 Feb 2012 22:53:03 +0000 (17:53 -0500)]
jbd2: use KMEM_CACHE instead of kmem_cache_create()
Use the KMEM_CACHE helper macro instead of kmem_cache_create().
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Yongqiang Yang [Mon, 20 Feb 2012 22:53:03 +0000 (17:53 -0500)]
jbd2: rename functions which initialize slab caches
This patch renames functions initializing the slab caches for the
journal head and handle structures to so they are consistent with the
names of the corresponding functions which destroys those slab caches.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Yongqiang Yang [Mon, 20 Feb 2012 22:53:02 +0000 (17:53 -0500)]
jbd2: allocate transaction from separate slab cache
There is normally only a handful of these active at any one time, but
putting them in a separate slab cache makes debugging memory
corruption problems easier. Manish Katiyar also wanted this make it
easier to test memory failure scenarios in the jbd2 layer.
Cc: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Bobi Jam [Mon, 20 Feb 2012 22:53:02 +0000 (17:53 -0500)]
ext4: expand commit callback and
The per-commit callback was used by mballoc code to manage free space
bitmaps after deleted blocks have been released. This patch expands
it to support multiple different callbacks, to allow other things to
be done after the commit has been completed.
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Mon, 20 Feb 2012 22:53:01 +0000 (17:53 -0500)]
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer
journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head
state ala discard_buffer(), but does not touch _Delay or _Unwritten as
discard_buffer() does.
This can be problematic in some areas of the ext4 code which assume
that if they have found a buffer marked unwritten or delay, then it's
a live one. Perhaps those spots should check whether it is mapped
as well, but if jbd2 is going to tear down a buffer, let's really
tear it down completely.
Without this I get some fsx failures on sub-page-block filesystems
up until v3.2, at which point
4e96b2dbbf1d7e81f22047a50f862555a6cb87cb
and
189e868fa8fdca702eb9db9d8afc46b5cb9144c9 make the failures go
away, because buried within that large change is some more flag
clearing. I still think it's worth doing in jbd2, since
->invalidatepage leads here directly, and it's the right place
to clear away these flags.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Mon, 20 Feb 2012 22:53:01 +0000 (17:53 -0500)]
ext4: fix INCOMPAT feature codepoint reservation for INLINEDATA
In commit
9b90e5e028 I incorrectly reserved the wrong bit for
EXT4_FEATURE_INCOMPAT_INLINEDATA per the discussion on the linux-ext4
list on December 7, 2011. The codepoint 0x2000 should be used for
EXT4_FEATURE_INCOMPAT_USE_META_CSUM, so INLINEDATA will be assigned
the value 0x8000.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Seiji Aguchi [Mon, 20 Feb 2012 22:53:01 +0000 (17:53 -0500)]
jbd2: add drop_transaction/update_superblock_end tracepoints
This patch adds trace_jbd2_drop_transaction and
trace_jbd2_update_superblock_end because there are similar tracepoints
in jbd and they are needed in jbd2 as well.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lukas Czerner [Mon, 20 Feb 2012 22:53:00 +0000 (17:53 -0500)]
ext4: ignore EXT4_INODE_JOURNAL_DATA flag with delalloc
Ext4 does not support data journalling with delayed allocation enabled.
We even do not allow to mount the file system with delayed allocation
and data journalling enabled, however it can be set via FS_IOC_SETFLAGS
so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file
system mounted with delayed allocation (default) and that's where
problem arises. The easies way to reproduce this problem is with the
following set of commands:
mkfs.ext4 /dev/sdd
mount /dev/sdd /mnt/test1
dd if=/dev/zero of=/mnt/test1/file bs=1M count=4
chattr +j /mnt/test1/file
dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 conv=notrunc
chattr -j /mnt/test1/file
Additionally it can be reproduced quite reliably with xfstests 272 and
269. In fact the above reproducer is a part of test 272.
To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if
the file system is mounted with delayed allocation. This can be easily
done by fixing ext4_should_*_data() functions do ignore data journal
flag when delalloc is set (suggested by Ted). We also have to set the
appropriate address space operations for the inode (again, ignoring data
journal flag if delalloc enabled).
Additionally this commit introduces ext4_inode_journal_mode() function
because ext4_should_*_data() has already had a lot of common code and
this change is putting it all into one function so it is easier to
read.
Successfully tested with xfstests in following configurations:
delalloc + data=ordered
delalloc + data=writeback
data=journal
nodelalloc + data=ordered
nodelalloc + data=writeback
nodelalloc + data=journal
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Mon, 20 Feb 2012 22:52:46 +0000 (17:52 -0500)]
ext4: fix race when setting bitmap_uptodate flag
In ext4_read_{inode,block}_bitmap() we were setting bitmap_uptodate()
before submitting the buffer for read. The is bad, since we check
bitmap_uptodate() without locking the buffer, and so if another
process is racing with us, it's possible that they will think the
bitmap is uptodate even though the read has not completed yet,
resulting in inodes and blocks potentially getting allocated more than
once if we get really unlucky.
Addresses-Google-Bug:
2828254
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 7 Feb 2012 01:12:03 +0000 (20:12 -0500)]
ext4: fold ext4_claim_inode into ext4_new_inode
The function ext4_claim_inode() is only called by one function,
ext4_new_inode(), and by folding the functionality into
ext4_new_inode(), we can remove almost 50 lines of code, and put all
of the logic of allocating a new inode into a single place.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Linus Torvalds [Tue, 31 Jan 2012 21:31:54 +0000 (13:31 -0800)]
Linux 3.3-rc2
Linus Torvalds [Tue, 31 Jan 2012 17:23:59 +0000 (09:23 -0800)]
Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream
There are few important bug fixes for LogFS
* tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream:
Logfs: Allow NULL block_isbad() methods
logfs: Grow inode in delete path
logfs: Free areas before calling generic_shutdown_super()
logfs: remove useless BUG_ON
MAINTAINERS: Add Prasad Joshi in LogFS maintiners
logfs: Propagate page parameter to __logfs_write_inode
logfs: set superblock shutdown flag after generic sb shutdown
logfs: take write mutex lock during fsync and sync
logfs: Prevent memory corruption
logfs: update page reference count for pined pages
Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what
"mtd->block_isbad" means in commit
f2933e86ad93: "Logfs: Allow NULL
block_isbad() methods" clashing with the abstraction changes in the
commits
7086c19d0742: "mtd: introduce mtd_block_isbad interface" and
d58b27ed58a3: "logfs: do not use 'mtd->block_isbad' directly".
This resolution takes the semantics from commit
f2933e86ad93, and just
makes mtd_block_isbad() return zero (false) if the 'block_isbad'
function is NULL. But that also means that now "mtd_can_have_bb()"
always returns 0.
Now, "mtd_block_markbad()" will obviously return an error if the
low-level driver doesn't support bad blocks, so this is somewhat
non-symmetric, but it actually makes sense if a NULL "block_isbad"
function is considered to mean "I assume that all my blocks are always
good".
Linus Torvalds [Tue, 31 Jan 2012 01:08:40 +0000 (17:08 -0800)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (w83627ehf) Disable setting DC mode for pwm2, pwm3 on NCT6776F
hwmon: (sht15) fix bad error code
MAINTAINERS: Drop maintainer for MAX1668 hwmon driver
MAINTAINERS: Add hwmon entries for Wolfson
hwmon: (
f71805f) Fix clamping of temperature limits
Linus Torvalds [Tue, 31 Jan 2012 01:06:26 +0000 (17:06 -0800)]
Merge branch 'for-torvalds' of git://git./linux/kernel/git/linusw/linux-pinctrl
Here are some fixes to the pin control system that has accumulated since
-rc1. Mainly Tony Lindgren fixed the module load/unload logic and the
rest are minor fixes and documentation.
* 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: add checks for empty function names
pinctrl: fix pinmux_hog_maps when ctrl_dev_name is not set
pinctrl: fix some pinmux typos
pinctrl: free debugfs entries when unloading a pinmux driver
pinctrl: unbreak error messages
Documentation/pinctrl: fix a few syntax errors in code examples
pinctrl: fix pinconf_pins_show iteration
Linus Torvalds [Mon, 30 Jan 2012 23:17:21 +0000 (15:17 -0800)]
Merge tag 'tty-3.3-rc1' of git://git./linux/kernel/git/gregkh/tty
Here are some tty/serial patches for 3.3-rc1
Big thing here is the movement of the 8250 serial drivers to their own
directory, now that the patch churn has calmed down.
Other than that, only minor stuff (omap patches were reverted as they
were found to be wrong), and another broken driver removed from the
system.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* tag 'tty-3.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: Kill off Moorestown code
Revert "tty: serial: OMAP: ensure FIFO levels are set correctly in non-DMA mode"
Revert "tty: serial: OMAP: transmit FIFO threshold interrupts don't wake the chip"
serial: Fix wakeup init logic to speed up startup
docbook: don't use serial_core.h in device-drivers book
serial: amba-pl011: lock console writes against interrupts
amba-pl011: do not disable RTS during shutdown
tty: serial: OMAP: transmit FIFO threshold interrupts don't wake the chip
tty: serial: OMAP: ensure FIFO levels are set correctly in non-DMA mode
omap-serial: make serial_omap_restore_context depend on CONFIG_PM_RUNTIME
omap-serial :Make the suspend/resume functions depend on CONFIG_PM_SLEEP.
TTY: fix UV serial console regression
jsm: Fixed EEH recovery error
Updated TTY MAINTAINERS info
serial: group all the 8250 related code together
Linus Torvalds [Mon, 30 Jan 2012 19:38:28 +0000 (11:38 -0800)]
Merge tag 'usb-3.3-rc1' of git://git./linux/kernel/git/gregkh/usb
Here are a bunch of USB patches for 3.3-rc1.
Nothing major, largest thing here is the removal of some drivers that
did not work at all. Other than that, the normal collection of bugfixes
and new device ids.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* tag 'usb-3.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (52 commits)
uwb & wusb: fix kconfig error
USB: Realtek cr: fix autopm scheduling while atomic
USB: ftdi_sio: Add more identifiers
xHCI: Cleanup isoc transfer ring when TD length mismatch found
usb: musb: omap2430: minor cleanups.
qcaux: add more Pantech UML190 and UML290 ports
Revert "drivers: usb: Fix dependency for USB_HWA_HCD"
usb: mv-otg - Fix build if CONFIG_USB is not set
USB: cdc-wdm: Avoid hanging on interface with no USB_CDC_DMM_TYPE
usb: add support for STA2X11 host driver
drivers: usb: Fix dependency for USB_HWA_HCD
kernel-doc: fix new warning in usb.h
USB: OHCI: fix new compiler warnings
usb: serial: kobil_sct: fix compile warning:
drivers/usb/host/ehci-fsl.c: add missing iounmap
USB: cdc-wdm: better allocate a buffer that is at least as big as we tell the USB core
USB: cdc-wdm: call wake_up_all to allow driver to shutdown on device removal
USB: cdc-wdm: use two mutexes to allow simultaneous read and write
USB: cdc-wdm: updating desc->length must be protected by spin_lock
USB: usbsevseg: fix max length
...
Linus Torvalds [Mon, 30 Jan 2012 18:53:20 +0000 (10:53 -0800)]
Merge git://git./linux/kernel/git/davem/net
1) Setting link attributes can modify the size of the attributes that
would be reported on a subsequent getlink netlink operation,
therefore min_ifinfo_dump_size needs to be adjusted. From Stefan
Gula.
2) Resegmentation of TSO frames while trimming can violate invariants
expected by callers, namely that the number of segments can only stay
the same or decrease, never increase. If MSS changes, however, we
can trim data but then end up with more segments. Fix this by only
segmenting to the MSS already recorded in the SKB. That's the
simplest fix for now and if we want to get more fancy in the future
that's a more involved change.
This probably explains some retransmit counter inaccuracies.
From Neal Cardwell.
3) Fix too-many-wakeups in POLL with AF_UNIX sockets, from Eric Dumazet.
4) Fix CAIF crashes wrt. namespace handling. From Eric Dumazet and
Eric W. Biederman.
5) TCP port selection fixes from Flavio Leitner.
6) More socket memory cgroup build fixes in certain randonfig
situations. From Glauber Costa.
7) Fix TCP memory sysctl regression reported by Ingo Molnar, also from
Glauber Costa.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
af_unix: fix EPOLLET regression for stream sockets
tcp: fix tcp_trim_head() to adjust segment count with skb MSS
net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTL
net caif: Register properly as a pernet subsystem.
netns: Fail conspicously if someone uses net_generic at an inappropriate time.
net: explicitly add jump_label.h header to sock.h
net: RTNETLINK adjusting values of min_ifinfo_dump_size
ipv6: Fix ip_gre lockless xmits.
xen-netfront: correct MAX_TX_TARGET calculation.
netns: fix net_alloc_generic()
tcp: bind() optimize port allocation
tcp: bind() fix autoselection to share ports
l2tp: l2tp_ip - fix possible oops on packet receive
iwlwifi: fix PCI-E transport "inta" race
mac80211: set bss_conf.idle when vif is connected
mac80211: update oper_channel on ibss join
Linus Torvalds [Mon, 30 Jan 2012 18:16:25 +0000 (10:16 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/broonie/regulator
This fixes an integration issue with the regulator device tree bindings
which shook out in -rc. The bindings were overly enthusiatic when
deciding to set a voltage on a regulator and would try to set zero volts
on an unconfigured regulator which isn't supported.
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Set apply_uV only when min and max voltages are defined
Eric Dumazet [Sat, 28 Jan 2012 16:11:03 +0000 (16:11 +0000)]
af_unix: fix EPOLLET regression for stream sockets
Commit
0884d7aa24 (AF_UNIX: Fix poll blocking problem when reading from
a stream socket) added a regression for epoll() in Edge Triggered mode
(EPOLLET)
Appropriate fix is to use skb_peek()/skb_unlink() instead of
skb_dequeue(), and only call skb_unlink() when skb is fully consumed.
This remove the need to requeue a partial skb into sk_receive_queue head
and the extra sk->sk_data_ready() calls that added the regression.
This is safe because once skb is given to sk_receive_queue, it is not
modified by a writer, and readers are serialized by u->readlock mutex.
This also reduce number of spinlock acquisition for small reads or
MSG_PEEK users so should improve overall performance.
Reported-by: Nick Mathewson <nickm@freehaven.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexey Moiseytsev <himeraster@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Sat, 28 Jan 2012 17:29:46 +0000 (17:29 +0000)]
tcp: fix tcp_trim_head() to adjust segment count with skb MSS
This commit fixes tcp_trim_head() to recalculate the number of
segments in the skb with the skb's existing MSS, so trimming the head
causes the skb segment count to be monotonically non-increasing - it
should stay the same or go down, but not increase.
Previously tcp_trim_head() used the current MSS of the connection. But
if there was a decrease in MSS between original transmission and ACK
(e.g. due to PMTUD), this could cause tcp_trim_head() to
counter-intuitively increase the segment count when trimming bytes off
the head of an skb. This violated assumptions in tcp_tso_acked() that
tcp_trim_head() only decreases the packet count, so that packets_acked
in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to
pass u32 pkts_acked values as large as 0xffffffff to
ca_ops->pkts_acked().
As an aside, if tcp_trim_head() had really wanted the skb to reflect
the current MSS, it should have called tcp_set_skb_tso_segs()
unconditionally, since a decrease in MSS would mean that a
single-packet skb should now be sliced into multiple segments.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Glauber Costa [Mon, 30 Jan 2012 01:20:17 +0000 (01:20 +0000)]
net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTL
sysctl_tcp_mem() initialization was moved to sysctl_tcp_ipv4.c
in commit
3dc43e3e4d0b52197d3205214fe8f162f9e0c334, since it
became a per-ns value.
That code, however, will never run when CONFIG_SYSCTL is
disabled, leading to bogus values on those fields - causing hung
TCP sockets.
This patch fixes it by keeping an initialization code in
tcp_init(). It will be overwritten by the first net namespace
init if CONFIG_SYSCTL is compiled in, and do the right thing if
it is compiled out.
It is also named properly as tcp_init_mem(), to properly signal
its non-sysctl side effect on TCP limits.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: David S. Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/4F22D05A.8030604@parallels.com
[ renamed the function, tidied up the changelog a bit ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 30 Jan 2012 17:02:10 +0000 (09:02 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
[S390] dasd: revalidate server for new pathgroup
[S390] dasd: revert LCU optimization
[S390] cleanup entry point definition
Linus Torvalds [Mon, 30 Jan 2012 16:59:46 +0000 (08:59 -0800)]
Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze
* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: generic atomic64 support
Linus Torvalds [Mon, 30 Jan 2012 16:56:41 +0000 (08:56 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
vmwgfx: Fix assignment in vmw_framebuffer_create_handle
drm/radeon/kms: Fix device tree linkage of i2c buses
drm: Pass the real error code back during GEM bo initialisation
Revert "drm/i810: cleanup reclaim_buffers"
Linus Torvalds [Mon, 30 Jan 2012 16:47:49 +0000 (08:47 -0800)]
Merge tag 'nfs-for-3.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
NFS client bugfixes for Linux 3.3 (pull 3)
* tag 'nfs-for-3.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Fix machine creds in generic_create_cred and generic_match
Linus Torvalds [Mon, 30 Jan 2012 16:33:40 +0000 (08:33 -0800)]
Merge tag 'pm-fix-for-3.3-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Power management fix for 3.3-rc2
Fix for a hibernate (s2disk) regression introduced during the 3.2
merge window that causes s2disk to trigger BUG_ON() in
freeze_workqueues_begin() if there is not enough swap space to save
the image.
* tag 'pm-fix-for-3.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Hibernate: Fix s2disk regression related to freezing workqueues
Ryan Mallon [Fri, 27 Jan 2012 21:51:40 +0000 (08:51 +1100)]
vmwgfx: Fix assignment in vmw_framebuffer_create_handle
The assignment of handle in vmw_framebuffer_create_handle doesn't actually do anything useful and is incorrectly assigning an integer value to a pointer argument. It appears that this is a typo and should be dereferencing handle rather than assigning to it directly. This fixes a bug where an undefined handle value is potentially returned to user-space.
Signed-off-by: Ryan Mallon <rmallon@gmail.com>
Reviewed-by: Jakob Bornecrantz<jakob@vmware.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jean Delvare [Sat, 28 Jan 2012 10:10:38 +0000 (11:10 +0100)]
drm/radeon/kms: Fix device tree linkage of i2c buses
Properly set the parent device of i2c buses before registering them so
that they will show at the right place in the device tree (rather than
in /sys/devices directly.)
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Dave Airlie <airlied@gmail.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Chris Wilson [Sun, 29 Jan 2012 16:45:32 +0000 (16:45 +0000)]
drm: Pass the real error code back during GEM bo initialisation
In particular, I found I was hitting the max-file limit in the VFS,
and the EFILE was being magically transformed into ENOMEM. Confusion
reigns.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Daniel Vetter [Sun, 29 Jan 2012 16:05:52 +0000 (17:05 +0100)]
Revert "drm/i810: cleanup reclaim_buffers"
This reverts commit
87499ffdcb1c70f66988cd8febc4ead0ba2f9118.
Where is that paper bag ... ah here.
I've failed to take an odd interaction between my other cleanups and
this reclaim_buffers patch into account and also failed to properly
test it. Looks like there are more dragons and hidden trapdoors in the
drm release path than actual lines of code.
Until I get a clue, let's just revert this.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Guenter Roeck [Sat, 28 Jan 2012 01:56:06 +0000 (17:56 -0800)]
hwmon: (w83627ehf) Disable setting DC mode for pwm2, pwm3 on NCT6776F
NCT6776F only supports pwm mode for pwm2 and pwm3. Return error if an attempt
is made to set those pwm channels to DC mode.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Rafael J. Wysocki [Sun, 29 Jan 2012 19:35:52 +0000 (20:35 +0100)]
PM / Hibernate: Fix s2disk regression related to freezing workqueues
Commit
2aede851ddf08666f68ffc17be446420e9d2a056
PM / Hibernate: Freeze kernel threads after preallocating memory
introduced a mechanism by which kernel threads were frozen after
the preallocation of hibernate image memory to avoid problems with
frozen kernel threads not responding to memory freeing requests.
However, it overlooked the s2disk code path in which the
SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
which caused freeze_workqueues_begin() to BUG(), because it saw
that worqueues had been already frozen.
Although in principle this issue might be addressed by removing
the relevant BUG_ON() from freeze_workqueues_begin(), that would
reintroduce the very problem that commit
2aede851ddf08666f68ffc17be4
attempted to avoid into that particular code path. For this reason,
to fix the issue at hand, introduce thaw_kernel_threads() and make
the SNAPSHOT_FREE ioctl execute it.
Special thanks to Srivatsa S. Bhat for detailed analysis of the
problem.
Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: stable@kernel.org
Vivien Didelot [Thu, 26 Jan 2012 20:59:00 +0000 (15:59 -0500)]
hwmon: (sht15) fix bad error code
When no platform data was supplied, returned error code was 0.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: stable@vger.kernel.org # 2.6.32+
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Linus Torvalds [Sun, 29 Jan 2012 02:20:48 +0000 (18:20 -0800)]
Merge tag 'driver-core-3.3-rc1-bugfixes' of git://git./linux/kernel/git/gregkh/driver-core
Here are some patches for the 3.3-rc1 tree.
It contains the removal of the sysdev code, now that all users of it are
gone, as well as some sysfs bugfixes that have been reported by users.
There are also some documentation updates here as well.
* tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: Complain bitterly about attempts to remove files from nonexistent directories.
stable: update documentation to ask for kernel version
base/core.c:fix typo in comment in function device_add
Documentation: devres: add allocation functions to list of supported calls
Documentation update for the driver model core
kernel-doc: fix new warnings in driver-core
kernel-doc: fix new warnings in debugfs
kernel-doc: fix new warnings in device.h
driver core: remove drivers/base/sys.c and include/linux/sysdev.h
Linus Torvalds [Sun, 29 Jan 2012 02:16:09 +0000 (18:16 -0800)]
Merge tag 'for-linus' of git://github.com/rustyrussell/linux
* tag 'for-linus' of git://github.com/rustyrussell/linux:
lguest: remove reference from Documentation/virtual/00-INDEX
virtio: correct the memory barrier in virtqueue_kick_prepare()
virtio: fix typos of memory barriers
Linus Torvalds [Sun, 29 Jan 2012 02:15:33 +0000 (18:15 -0800)]
Merge branch 'stable/for-linus-fixes-3.3' of git://git./linux/kernel/git/konrad/xen
* 'stable/for-linus-fixes-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/granttable: Disable grant v2 for HVM domains.
x86: xen: size struct xen_spinlock to always fit in arch_spinlock_t
Linus Torvalds [Sun, 29 Jan 2012 01:00:19 +0000 (17:00 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix reservations in btrfs_page_mkwrite
Btrfs: advance window_start if we're using a bitmap
btrfs: mask out gfp flags in releasepage
Btrfs: fix enospc error caused by wrong checks of the chunk
Btrfs: do not defrag a file partially
Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c
Btrfs: use cluster->window_start when allocating from a cluster bitmap
Btrfs: Check for NULL page in extent_range_uptodate
btrfs: Fix busyloops in transaction waiting code
Btrfs: make sure a bitmap has enough bytes
Btrfs: fix uninit warning in backref.c
Linus Torvalds [Sun, 29 Jan 2012 00:57:15 +0000 (16:57 -0800)]
Merge git://www.linux-watchdog.org/linux-watchdog
* git://www.linux-watchdog.org/linux-watchdog:
watchdog: iTCO_wdt: add Intel Lynx Point DeviceIDs
watchdog: via_wdt: Set min_timeout and max_timeout for wdt_dev
watchdog: Fix typo "unexpectdly"
watchdog: wafer5823wdt: Fix handling WDIOS_DISABLECARD/WDIOS_ENABLECARD options
watchdog: wm8350_wdt: Fix handling WDIOS_DISABLECARD/WDIOS_ENABLECARD options
watchdog: Return proper error in nuc900wdt_probe if misc_register fails
watchdog: Staticise nuc900_wdt
watchdog: via_wdt: Staticise wdt_pci_table
watchdog: omap_wdt.c: Fix the mismatch of pm_runtime enable and disable
watchdog: dw_wdt.c: use devm_request_and_ioremap
watchdog: imx2_wdt.c: use devm_request_and_ioremap
Linus Torvalds [Sat, 28 Jan 2012 21:27:10 +0000 (13:27 -0800)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm: (31 commits)
ARM: 7304/1: ioremap: fix boundary check when reusing static mapping
ARM: 7301/1: Rename the T() macro to TUSER() to avoid namespace conflicts
ARM: 7299/1: ftrace: clear zero bit in reported IPs for Thumb-2
ARM: 7298/1: realview: fix mapping of MPCore private memory region
PCMCIA: fix sa1111 oops on remove
ARM: 7288/1: mach-sa1100: add missing module_init() call
ARM: 7297/1: smp_twd: make sure timer is stopped before registering it
ARM: 7296/1: proc-v7.S: remove HARVARD_CACHE preprocessor guards
ARM: 7295/1: cortex-a7: move proc_info out of !CONFIG_ARM_LPAE block
ARM: 7293/1: logical_cpu_map: decouple CPU mapping from SMP
ARM: 7291/1: cache: assume 64-byte L1 cachelines for ARMv7 CPUs
ARM: 7290/1: vmlinux.lds.S: align the exception fixup table to a 4-byte boundary
ARM: 7289/1: vmlinux.lds.S: do not hardcode cacheline size as 32 bytes
MFD: ucb1x00-ts: fix resume failure
MFD: ucb1x00-core: fix gpiolib direction_output handling
MFD: ucb1x00-core: fix missing restore of io output data on resume
MFD: mcp-core: fix mcp_priv() to be more type safe
MFD: mcp-core: fix complaints from the genirq layer
Revert "ARM: sa11x0: Implement autoloading of codec and codec pdata for mcp bus."
Revert "ARM: sa1100: Refactor mcp-sa11x0 to use platform resources."
...
Fix up conflict due to arch/arm/mach-mx5/Kconfig having been merged into
mach-imx5 (commit
784a90c0a7d8: "ARM i.MX: Merge i.MX5 support into
mach-imx"), but the ARM_L1_CACHE_SHIFT_6 entry was moved to be driven by
the CPU_V7 logic from it in the old location in rmk's branch (commit
a092f2b15399: "ARM: 7291/1: cache: assume 64-byte L1 cachelines for
ARMv7 CPUs").
Linus Torvalds [Sat, 28 Jan 2012 21:21:54 +0000 (13:21 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
arm-soc fixes for 3.3-rc:
AT91 needed reset fixes which resulted in some minor code refactoring,
it also adds a feature-removal for one of their platforms for 3.4.
The USB patches have been acked by Greg K-H.
i.MX and ux500 both have some minor fixes, nothing controversial.
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arch/arm/mach-imx/mach-mx53_ard.c: add missing iounmap
ARM: imx: iomux-v1.h: Fix build error due to __init annotation
ARM: at91: Fix at91sam9g45 and at91cap9 reset
ARM: at91: make rstc soc independent
ARM: at91: introduce AT91_SAM9_ALT_RESET to select the at91sam9 alternative reset
ARM: at91: merge at91cap9_ddrsdr.h in at91sam9_ddrsdr.h
ARM: at91: fix cap9 ddrsdr register
ARM/USB: at91/ohci-at91: rename vbus_pin_inverted to vbus_pin_active_low
USB: at91: fix clk_get error handling
ARM: at91: removal of CAP9 SoC family
ARM: at91: fix at91rm9200 soc subtype handling
mach-ux500: no MMC_CAP_SD_HIGHSPEED on Snowball
mach-ux500: enable ARM errata 764369
mach-ux500: do not override outer.inv_all
mach-ux500: musb: now musb is always in OTG mode
ARM: imx6: add missing twd_clk for imx6q clock
Joern Engel [Fri, 5 Aug 2011 09:09:55 +0000 (11:09 +0200)]
Logfs: Allow NULL block_isbad() methods
Not all mtd drivers define block_isbad(). Let's assume no bad blocks
instead of refusing to mount.
Signed-off-by: Joern Engel <joern@logfs.org>
Joern Engel [Fri, 5 Aug 2011 09:13:30 +0000 (11:13 +0200)]
logfs: Grow inode in delete path
Can be necessary if an inode gets deleted (through -ENOSPC) before being
written. Might be better to move this into logfs_write_rec(), but for
now go with the stupid&safe patch.
Signed-off-by: Joern Engel <joern@logfs.org>
Joern Engel [Fri, 5 Aug 2011 09:18:19 +0000 (11:18 +0200)]
logfs: Free areas before calling generic_shutdown_super()
Or hit an assertion in map_invalidatepage() instead.
Signed-off-by: Joern Engel <joern@logfs.org>
Joern Engel [Mon, 12 Sep 2011 15:39:16 +0000 (21:09 +0530)]
logfs: remove useless BUG_ON
It prevents write sizes >4k.
Signed-off-by: Joern Engel <joern@logfs.org>