H Hartley Sweeten [Fri, 14 May 2010 20:35:21 +0000 (15:35 -0500)]
inotify_user.c: make local symbol static
The symbol inotify_max_user_watches is not used outside this
file and should be static.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Wed, 12 May 2010 15:42:29 +0000 (11:42 -0400)]
fsnotify: initialize mask in fsnotify_perm
akpm got a warning the fsnotify_mask could be used uninitialized in
fsnotify_perm(). It's not actually possible but his compiler complained
about it. This patch just initializes it to 0 to shut up the compiler.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Wed, 21 Apr 2010 20:49:38 +0000 (16:49 -0400)]
fsnotify: call iput on inodes when no longer marked
fsnotify takes an igrab on an inode when it adds a mark. The code was
supposed to drop the reference when the mark was removed but didn't.
This caused problems when an fs was unmounted because those inodes would
clearly not be gone. Thus resulting in the most devistating of messages:
VFS: Busy inodes after unmount of loop0. Self-destruct in 5 seconds.
>>> Have a nice day...
Jiri Slaby bisected the problem to a patch in the fsnotify tree. The
code snippets below show my stupidity quite clearly.
void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
{
...
mark->inode = NULL;
...
}
void fsnotify_destroy_mark(struct fsnotify_mark *mark)
{
struct inode *inode = NULL;
...
if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
fsnotify_destroy_inode_mark(mark);
inode = mark->i.inode;
}
...
if (inode)
iput(inode);
...
}
Obviously the intent was to capture the inode before it was set to NULL in
fsnotify_destory_inode_mark() so we wouldn't be leaking inodes forever.
Instead we leaked them (and exploded on umount)
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Jean-Christophe Dubois [Tue, 23 Mar 2010 07:08:09 +0000 (08:08 +0100)]
fanotify: do not always return 0 in fsnotify
It seems to me you are always returning 0 in fsnotify, when you should return
the error (EPERM) returned by fanotify.
Signed-off-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Wed, 23 Dec 2009 05:10:25 +0000 (00:10 -0500)]
fanotify: do not return 0 in a void function
remove_access_response() is supposed to have a void return, but was
returning 0;
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fanotify: userspace interface for permission responses
fanotify groups need to respond to events which include permissions types.
To do so groups will send a response using write() on the fanotify_fd they
have open.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fanotify: permissions and blocking
This is the backend work needed for fanotify to support the new
FS_OPEN_PERM and FS_ACCESS_PERM fsnotify events. This is done using the
new fsnotify secondary queue. No userspace interface is provided actually
respond to or request these events.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fsnotify: new fsnotify hooks and events types for access decisions
introduce a new fsnotify hook, fsnotify_perm(), which is called from the
security code. This hook is used to allow fsnotify groups to make access
control decisions about events on the system. We also must change the
generic fsnotify function to return an error code if we intend these hooks
to be in any way useful.
Signed-off-by: Eric Paris <eparis@redhat.com>
Dave Young [Fri, 26 Feb 2010 01:28:57 +0000 (20:28 -0500)]
sysctl extern cleanup: inotify
Extern declarations in sysctl.c should be move to their own head file, and
then include them in relavant .c files.
Move inotify_table extern declaration to linux/inotify.h
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Alexey Dobriyan [Wed, 20 Jan 2010 20:27:56 +0000 (22:27 +0200)]
dnotify: move dir_notify_enable declaration
Move dir_notify_enable declaration to where it belongs -- dnotify.h .
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Mon, 8 Feb 2010 17:53:52 +0000 (12:53 -0500)]
fsnotify: use unsigned char * for dentry->d_name.name
fsnotify was using char * when it passed around the d_name.name string
internally but it is actually an unsigned char *. This patch switches
fsnotify to use unsigned and should silence some pointer signess warnings
which have popped out of xfs. I do not add -Wpointer-sign to the fsnotify
code as there are still issues with kstrdup and strlen which would pop
out needless warnings.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fanotify: use merge argument to determine actual event added to queue
fanotify needs to know the actual event added to queues so it can be
correctly checked for return values from userspace. To do this we need to
pass that information from the merger code back to the main even handling
routine. Currently that information is unused, but it will be.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fsnotify: intoduce a notification merge argument
Each group can define their own notification (and secondary_q) merge
function. Inotify does tail drop, fanotify does matching and drop which
can actually allocate a completely new event. But for fanotify to properly
deal with permissions events it needs to know the new event which was
ultimately added to the notification queue. This patch just implements a
void ** argument which is passed to the merge function. fanotify can use
this field to pass the new event back to higher layers.
Signed-off-by: Eric Paris <eparis@redhat.com>
for fanotify to properly deal with permissions events
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fsnotify: add group priorities
This introduces an ordering to fsnotify groups. With purely asynchronous
notification based "things" implementing fsnotify (inotify, dnotify) ordering
isn't particularly important. But if people want to use fsnotify for the
basis of sycronous notification or blocking notification ordering becomes
important.
eg. A Hierarchical Storage Management listener would need to get its event
before an AV scanner could get its event (since the HSM would need to
bring the data in for the AV scanner to scan.) Typically asynchronous notification
would want to run after the AV scanner made any relevant access decisions
so as to not send notification about an event that was denied.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fanotify: clear all fanotify marks
fanotify listeners may want to clear all marks. They may want to do this
to destroy all of their inode marks which have nothing but ignores.
Realistically this is useful for av vendors who update policy and want to
clear all of their cached allows.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fanotify: allow ignored_masks to survive modify
Some users may want to truely ignore an inode even if it has been modified.
Say you are wanting a mount which contains a log file and you really don't
want any notification about that file. This patch allows the listener to
do that.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fsnotify: allow ignored_mask to survive modification
Some inodes a group may want to never hear about a set of events even if
the inode is modified. We add a new mark flag which indicates that these
marks should not have their ignored_mask cleared on modification.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fsnotify: clear ignored mask on modify
On inode modification we clear the ignored mask for all of the marks on the
inode. This allows userspace to ignore accesses to inodes until there is
something different.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fanotify: allow users to set an ignored_mask
Change the sys_fanotify_mark() system call so users can set ignored_masks
on inodes. Remember, if a user new sets a real mask, and only sets ignored
masks, the ignore will never be pinned in memory. Thus ignored_masks can
be lost under memory pressure and the user may again get events they
previously thought were ignored.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fanotify: ignored_mask to ignore events
When fanotify receives an event it will check event->mask & ~ignored_mask.
If no bits are left the event will not be sent.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fsnotify: ignored_mask - excluding notification
The ignored_mask is a new mask which is part of fsnotify marks. A group's
should_send_event() function can use the ignored mask to determine that
certain events are not of interest. In particular if a group registers a
mask including FS_OPEN on a vfsmount they could add FS_OPEN to the
ignored_mask for individual inodes and not send open events for those
inodes.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fsnotify: allow marks to not pin inodes in core
inotify marks must pin inodes in core. dnotify doesn't technically need to
since they are closed when the directory is closed. fanotify also need to
pin inodes in core as it works today. But the next step is to introduce
the concept of 'ignored masks' which is actually a mask of events for an
inode of no interest. I claim that these should be liberally sent to the
kernel and should not pin the inode in core. If the inode is brought back
in the listener will get an event it may have thought excluded, but this is
not a serious situation and one any listener should deal with.
This patch lays the ground work for non-pinning inode marks by using lazy
inode pinning. We do not pin a mark until it has a non-zero mask entry. If a
listener new sets a mask we never pin the inode.
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:29 +0000 (21:24 -0500)]
fanotify: remove outgoing function checks in fanotify.h
A number of validity checks on outgoing data are done in static inlines but
are only used in one place. Instead just do them where they are used for
readability.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:29 +0000 (21:24 -0500)]
fanotify: remove fanotify.h declarations
fanotify_mark_validate functions are all needlessly declared in headers as
static inlines. Instead just do the checks where they are needed for code
readability.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:29 +0000 (21:24 -0500)]
fanotify: split fanotify_remove_mark
split fanotify_remove_mark into fanotify_remove_inode_mark and
fanotify_remove_vfsmount_mark.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:29 +0000 (21:24 -0500)]
fanotify: rename FAN_MARK_ON_VFSMOUNT to FAN_MARK_MOUNT
the term 'vfsmount' isn't sensicle to userspace. instead call is 'mount.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:29 +0000 (21:24 -0500)]
fanotify: hooks the fanotify_mark syscall to the vfsmount code
Create a new fanotify_mark flag which indicates we should attach the mark
to the vfsmount holding the object referenced by dfd and pathname rather
than the inode itself.
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:28 +0000 (21:24 -0500)]
fanotify: remove fanotify_add_mark
fanotify_add_mark now does nothing useful anymore, drop it.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:28 +0000 (21:24 -0500)]
fanotify: do not return pointer from fanotify_add_*_mark
No need to return the mark from fanotify_add_*_mark to fanotify_add_mark
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:28 +0000 (21:24 -0500)]
fanotify: do not call fanotify_update_object_mask in fanotify_add_mark
Recalculate masks in fanotify_add_mark, don't use
fanotify_update_object_mask. This gets us one step closers to readable
code.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:28 +0000 (21:24 -0500)]
fanotify: do not call fanotify_update_object_mask in fanotify_remove_mark
Recalculate masks in fanotify_remove_mark, don't use
fanotify_update_object_mask. This gets us one step closers to readable
code.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:28 +0000 (21:24 -0500)]
fanotify: remove fanotify_update_mark
fanotify_update_mark() doesn't do much useful; remove it.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:28 +0000 (21:24 -0500)]
fanotify: infrastructure to add an remove marks on vfsmounts
infrastructure work to add and remove marks on vfsmounts. This should get
every set up except wiring the functions to the syscalls.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:28 +0000 (21:24 -0500)]
fanotify: should_send_event needs to handle vfsmounts
currently should_send_event in fanotify only cares about marks on inodes.
This patch extends that interface to indicate that it cares about events
that happened on vfsmounts.
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:27 +0000 (21:24 -0500)]
fsnotify: Infrastructure for per-mount watches
Per-mount watches allow groups to listen to fsnotify events on an entire
mount. This patch simply adds and initializes the fields needed in the
vfsmount struct to make this happen.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:27 +0000 (21:24 -0500)]
fsnotify: vfsmount marks generic functions
Much like inode-mark.c has all of the code dealing with marks on inodes
this patch adds a vfsmount-mark.c which has similar code but is intended
for marks on vfsmounts.
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:27 +0000 (21:24 -0500)]
fsnotify/vfsmount: add fsnotify fields to struct vfsmount
This patch adds the list and mask fields needed to support vfsmount marks.
These are the same fields fsnotify needs on an inode. They are not used,
just declared and we note where the cleanup hook should be (the function is
not yet defined)
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:27 +0000 (21:24 -0500)]
fsnotify: clear marks to 0 in fsnotify_init_mark
Currently fsnotify_init_mark sets some fields to 0/NULL. Some users
already used some sorts of zalloc, some didn't. This patch uses memset to
explicitly zero everything in the fsnotify_mark when it is initialized so we
don't have to be careful if fields are later added to marks.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:27 +0000 (21:24 -0500)]
fsnotify: split generic and inode specific mark code
currently all marking is done by functions in inode-mark.c. Some of this
is pretty generic and should be instead done in a generic function and we
should only put the inode specific code in inode-mark.c
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:27 +0000 (21:24 -0500)]
fanotify: Add pids to events
Pass the process identifiers of the triggering processes to fanotify
listeners: this information is useful for event filtering and logging.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:26 +0000 (21:24 -0500)]
fanotify: create_fd cleanup
Code cleanup which does the fd creation work seperately from the userspace
metadata creation. It fits better with the other code.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Heiko Carstens [Fri, 18 Dec 2009 02:24:26 +0000 (21:24 -0500)]
fanotify: CONFIG_HAVE_SYSCALL_WRAPPERS for sys_fanotify_mark
Please note that you need the patch below in addition, otherwise the
syscall wrapper stuff won't work on those 32 bit architectures which enable
the wrappers.
When enabled the syscall wrapper defines always take long parameters and then
cast them to whatever is needed. This approach doesn't work for the 32 bit
case where the original syscall takes a long long parameter, since we would
lose the upper 32 bits.
So syscalls with 64 bit arguments are special cases wrt to syscall wrappers
and enp up in the ugliness below (see also sys_fallocate). In addition these
special cased syscall wrappers have the drawback that ftrace syscall tracing
doesn't work on them, since they don't get defined by using the usual macros.
Signed-off-by: Eric Paris <eparis@redhat.com>
Paul Mundt [Fri, 18 Dec 2009 02:24:26 +0000 (21:24 -0500)]
fanotify: select ANON_INODES.
fanotify references anon_inode_getfd(), which is only available with
ANON_INODES enabled. Presently this bails out with the following:
LD vmlinux
fs/built-in.o: In function `sys_fanotify_init':
(.text+0x26d1c): undefined reference to `anon_inode_getfd'
make: *** [vmlinux] Error 1
which is trivially corrected by adding an ANON_INODES select.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:26 +0000 (21:24 -0500)]
fanotify: send events using read
Send events to userspace by reading the file descriptor from fanotify_init().
One will get blocks of data which look like:
struct fanotify_event_metadata {
__u32 event_len;
__u32 vers;
__s32 fd;
__u64 mask;
__s64 pid;
__u64 cookie;
} __attribute__ ((packed));
Simple code to retrieve and deal with events is below
while ((len = read(fan_fd, buf, sizeof(buf))) > 0) {
struct fanotify_event_metadata *metadata;
metadata = (void *)buf;
while(FAN_EVENT_OK(metadata, len)) {
[PROCESS HERE!!]
if (metadata->fd >= 0 && close(metadata->fd) != 0)
goto fail;
metadata = FAN_EVENT_NEXT(metadata, len);
}
}
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:26 +0000 (21:24 -0500)]
fanotify: fanotify_mark syscall implementation
NAME
fanotify_mark - add, remove, or modify an fanotify mark on a
filesystem object
SYNOPSIS
int fanotify_mark(int fanotify_fd, unsigned int flags, u64 mask,
int dfd, const char *pathname)
DESCRIPTION
fanotify_mark() is used to add remove or modify a mark on a filesystem
object. Marks are used to indicate that the fanotify group is
interested in events which occur on that object. At this point in
time marks may only be added to files and directories.
fanotify_fd must be a file descriptor returned by fanotify_init()
The flags field must contain exactly one of the following:
FAN_MARK_ADD - or the bits in mask and ignored mask into the mark
FAN_MARK_REMOVE - bitwise remove the bits in mask and ignored mark
from the mark
The following values can be OR'd into the flags field:
FAN_MARK_DONT_FOLLOW - same meaning as O_NOFOLLOW as described in open(2)
FAN_MARK_ONLYDIR - same meaning as O_DIRECTORY as described in open(2)
dfd may be any of the following:
AT_FDCWD: the object will be lookup up based on pathname similar
to open(2)
file descriptor of a directory: if pathname is not NULL the
object to modify will be lookup up similar to openat(2)
file descriptor of the final object: if pathname is NULL the
object to modify will be the object referenced by dfd
The mask is the bitwise OR of the set of events of interest such as:
FAN_ACCESS - object was accessed (read)
FAN_MODIFY - object was modified (write)
FAN_CLOSE_WRITE - object was writable and was closed
FAN_CLOSE_NOWRITE - object was read only and was closed
FAN_OPEN - object was opened
FAN_EVENT_ON_CHILD - interested in objected that happen to
children. Only relavent when the object
is a directory
FAN_Q_OVERFLOW - event queue overflowed (not implemented)
RETURN VALUE
On success, this system call returns 0. On error, -1 is
returned, and errno is set to indicate the error.
ERRORS
EINVAL An invalid value was specified in flags.
EINVAL An invalid value was specified in mask.
EINVAL An invalid value was specified in ignored_mask.
EINVAL fanotify_fd is not a file descriptor as returned by
fanotify_init()
EBADF fanotify_fd is not a valid file descriptor
EBADF dfd is not a valid file descriptor and path is NULL.
ENOTDIR dfd is not a directory and path is not NULL
EACCESS no search permissions on some part of the path
ENENT file not found
ENOMEM Insufficient kernel memory is available.
CONFORMING TO
These system calls are Linux-specific.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:26 +0000 (21:24 -0500)]
fanotify: sys_fanotify_mark declartion
This patch simply declares the new sys_fanotify_mark syscall
int fanotify_mark(int fanotify_fd, unsigned int flags, u64_mask,
int dfd const char *pathname)
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:26 +0000 (21:24 -0500)]
fanotify: fanotify_init syscall implementation
NAME
fanotify_init - initialize an fanotify group
SYNOPSIS
int fanotify_init(unsigned int flags, unsigned int event_f_flags, int priority);
DESCRIPTION
fanotify_init() initializes a new fanotify instance and returns a file
descriptor associated with the new fanotify event queue.
The following values can be OR'd into the flags field:
FAN_NONBLOCK Set the O_NONBLOCK file status flag on the new open file description.
Using this flag saves extra calls to fcntl(2) to achieve the same
result.
FAN_CLOEXEC Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.
See the description of the O_CLOEXEC flag in open(2) for reasons why
this may be useful.
The event_f_flags argument is unused and must be set to 0
The priority argument is unused and must be set to 0
RETURN VALUE
On success, this system call return a new file descriptor. On error, -1 is
returned, and errno is set to indicate the error.
ERRORS
EINVAL An invalid value was specified in flags.
EINVAL A non-zero valid was passed in event_f_flags or in priority
ENFILE The system limit on the total number of file descriptors has been reached.
ENOMEM Insufficient kernel memory is available.
CONFORMING TO
These system calls are Linux-specific.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
fanotify: fanotify_init syscall declaration
This patch defines a new syscall fanotify_init() of the form:
int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags,
unsigned int priority)
This syscall is used to create and fanotify group. This is very similar to
the inotify_init() syscall.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
fanotify: do not clone on merge unless needed
Currently if 2 events are going to be merged on the notication queue with
different masks the second event will be cloned and will replace the first
event. However if this notification queue is the only place referencing
the event in question there is no reason not to just update the event in
place. We can tell this if the event->refcnt == 1. Since we hold a
reference for each queue this event is on we know that when refcnt == 1
this is the only queue. The other concern is that it might be about to be
added to a new queue, but this can't be the case since fsnotify holds a
reference on the event until it is finished adding it to queues.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
fanotify: merge notification events with different masks
Instead of just merging fanotify events if they are exactly the same, merge
notification events with different masks. To do this we have to clone the
old event, update the mask in the new event with the new merged mask, and
put the new event in place of the old event.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
fanotify:drop notification if they exist in the outgoing queue
fanotify listeners get an open file descriptor to the object in question so
the ordering of operations is not as important as in other notification
systems. inotify will drop events if the last event in the event FIFO is
the same as the current event. This patch will drop fanotify events if
they are the same as another event anywhere in the event FIFO.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
fanotify: fscking all notification system
fanotify is a novel file notification system which bases notification on
giving userspace both an event type (open, close, read, write) and an open
file descriptor to the object in question. This should address a number of
races and problems with other notification systems like inotify and dnotify
and should allow the future implementation of blocking or access controlled
notification. These are useful for on access scanners or hierachical storage
management schemes.
This patch just implements the basics of the fsnotify functions.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Wu Fengguang [Mon, 8 Feb 2010 17:31:29 +0000 (12:31 -0500)]
fanotify: FMODE_NONOTIFY and __O_SYNC in sparc conflict
sparc used the same value as FMODE_NONOTIFY so change FMODE_NONOTIFY to be
something unique.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
vfs: introduce FMODE_NONOTIFY
This is a new f_mode which can only be set by the kernel. It indicates
that the fd was opened by fanotify and should not cause future fanotify
events. This is needed to prevent fanotify livelock. An example of
obvious livelock is from fanotify close events.
Process A closes file1
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
notice a pattern?
The fix is to add the FMODE_NONOTIFY bit to the open filp done by the kernel
for fanotify. Thus when that file is used it will not generate future
events.
This patch simply defines the bit.
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
fsnotify: take inode->i_lock inside fsnotify_find_mark_entry()
All callers to fsnotify_find_mark_entry() except one take and
release inode->i_lock around the call. Take the lock inside
fsnotify_find_mark_entry() instead.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:24 +0000 (21:24 -0500)]
dnotify: rename mark_entry to mark
nomenclature change. Used to call things 'entries' but now we just call
them 'marks.' Do those changes for dnotify.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:24 +0000 (21:24 -0500)]
inotify: rename mark_entry to just mark
rename anything in inotify that deals with mark_entry to just be mark. It
makes a lot more sense.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:24 +0000 (21:24 -0500)]
fsnotify: rename mark_entry to just mark
previously I used mark_entry when talking about marks on inodes. The
_entry is pretty useless. Just use "mark" instead.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:24 +0000 (21:24 -0500)]
fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_mark
the _entry portion of fsnotify functions is useless. Drop it.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:24 +0000 (21:24 -0500)]
fsnotify: rename fsnotify_mark_entry to just fsnotify_mark
The name is long and it serves no real purpose. So rename
fsnotify_mark_entry to just fsnotify_mark.
Signed-off-by: Eric Paris <eparis@redhat.com>
Andreas Gruenbacher [Fri, 18 Dec 2009 02:24:24 +0000 (21:24 -0500)]
fsnotify: kill FSNOTIFY_EVENT_FILE
Some fsnotify operations send a struct file. This is more information than
we technically need. We instead send a struct path in all cases instead of
sometimes a path and sometimes a file.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:24 +0000 (21:24 -0500)]
fsnotify: add flags to fsnotify_mark_entries
To differentiate between inode and vfsmount (or other future) types of
marks we add a flags field and set the inode bit on inode marks (the only
currently supported type of mark)
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: add vfsmount specific fields to the fsnotify_mark_entry union
vfsmount marks need mostly the same data as inode specific fields, but for
consistency and understandability we put that data in a vfsmount specific
struct inside a union with inode specific data.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: put inode specific fields in an fsnotify_mark in a union
The addition of marks on vfs mounts will be simplified if the inode
specific parts of a mark and the vfsmnt specific parts of a mark are
actually in a union so naming can be easy. This patch just implements the
inode struct and the union.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: include vfsmount in should_send_event when appropriate
To ensure that a group will not duplicate events when it receives it based
on the vfsmount and the inode should_send_event test we should distinguish
those two cases. We pass a vfsmount to this function so groups can make
their own determinations.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: mount point listeners list and global mask
currently all of the notification systems implemented select which inodes
they care about and receive messages only about those inodes (or the
children of those inodes.) This patch begins to flesh out fsnotify support
for the concept of listeners that want to hear notification for an inode
accessed below a given monut point. This patch implements a second list
of fsnotify groups to hold these types of groups and a second global mask
to hold the events of interest for this type of group.
The reason we want a second group list and mask is because the inode based
notification should_send_event support which makes each group look for a mark
on the given inode. With one nfsmount listener that means that every group would
have to take the inode->i_lock, look for their mark, not find one, and return
for every operation. By seperating vfsmount from inode listeners only when
there is a inode listener will the inode groups have to look for their
mark and take the inode lock. vfsmount listeners will have to grab the lock and
look for a mark but there should be fewer of them, and one vfsmount listener
won't cause the i_lock to be grabbed and released for every fsnotify group
on every io operation.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: add groups to fsnotify_inode_groups when registering inode watch
Currently all fsnotify groups are added immediately to the
fsnotify_inode_groups list upon creation. This means, even groups with no
watches (common for audit) will be on the global tracking list and will
get checked for every event. This patch adds groups to the global list on
when the first inode mark is added to the group.
Signed-of-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: initialize the group->num_marks in a better place
Currently the comments say that group->num_marks is held because the group
is on the fsnotify_group list. This isn't strictly the case, we really
just hold the num_marks for the life of the group (any time group->refcnt
is != 0) This patch moves the initialization stuff and makes it clear when
it is really being held.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: rename fsnotify_groups to fsnotify_inode_groups
Simple renaming patch. fsnotify is about to support mount point listeners
so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists
used only for groups which have watches on inodes.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
fsnotify: drop mask argument from fsnotify_alloc_group
Nothing uses the mask argument to fsnotify_alloc_group. This patch drops
that argument.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
Audit: only set group mask when something is being watched
Currently the audit watch group always sets a mask equal to all events it
might care about. We instead should only set the group mask if we are
actually watching inodes. This should be a perf win when audit watches are
compiled in.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group
fsnotify_obtain_group was intended to be able to find an already existing
group. Nothing uses that functionality. This just renames it to
fsnotify_alloc_group so it is clear what it is doing.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
fsnotify: fsnotify_obtain_group kzalloc cleanup
fsnotify_obtain_group uses kzalloc but then proceedes to set things to 0.
This patch just deletes those useless lines.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
fsnotify: remove group_num altogether
The original fsnotify interface has a group-num which was intended to be
able to find a group after it was added. I no longer think this is a
necessary thing to do and so we remove the group_num.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
fsnotify: lock annotation for event replacement
fsnotify_replace_event need to lock both the old and the new event. This
causes lockdep to get all pissed off since it dosn't know this is safe.
It's safe in this case since the new event is impossible to be reached from
other places in the kernel.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
fsnotify: replace an event on a list
fanotify would like to clone events already on its notification list, make
changes to the new event, and then replace the old event on the list with
the new event. This patch implements the replace functionality of that
process.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: clone existing events
fsnotify_clone_event will take an event, clone it, and return the cloned
event to the caller. Since events may be in use by multiple fsnotify
groups simultaneously certain event entries (such as the mask) cannot be
changed after the event was created. Since fanotify would like to merge
events happening on the same file it needs a new clean event to work with
so it can change any fields it wishes.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: per group notification queue merge types
inotify only wishes to merge a new event with the last event on the
notification fifo. fanotify is willing to merge any events including by
means of bitwise OR masks of multiple events together. This patch moves
the inotify event merging logic out of the generic fsnotify notification.c
and into the inotify code. This allows each use of fsnotify to provide
their own merge functionality.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: send struct file when sending events to parents when possible
fanotify needs a path in order to open an fd to the object which changed.
Currently notifications to inode's parents are done using only the inode.
For some parental notification we have the entire file, send that so
fanotify can use it.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: pass a file instead of an inode to open, read, and write
fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process. Close was the only operation that already was passing
a struct file to the notification hook. This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: include data in should_send calls
fanotify is going to need to look at file->private_data to know if an event
should be sent or not. This passes the data (which might be a file,
dentry, inode, or none) to the should_send function calls so fanotify can
get that information when available
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: provide the data type to should_send_event
fanotify is only interested in event types which contain enough information
to open the original file in the context of the fanotify listener. Since
fanotify may not want to send events if that data isn't present we pass
the data type to the should_send_event function call so fanotify can express
its lack of interest.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Wed, 23 Dec 2009 04:16:33 +0000 (23:16 -0500)]
inotify: do not spam console without limit
inotify was supposed to have a dmesg printk ratelimitor which would cause
inotify to only emit one message per boot. The static bool was never set
so it kept firing messages. This patch correctly limits warnings in multiple
places.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:30:52 +0000 (20:30 -0500)]
inotify: remove inotify in kernel interface
nothing uses inotify in the kernel, drop it!
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:27:10 +0000 (20:27 -0500)]
inotify: do not reuse watch descriptors
Prior to 2.6.31 inotify would not reuse watch descriptors until all of
them had been used at least once. After the rewrite inotify would reuse
watch descriptors. The selinux utility 'restorecond' was found to have
problems when watch descriptors were reused. This patch reverts to the
pre inotify rewrite behavior to not reuse watch descriptors.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:07 +0000 (20:12 -0500)]
fsnotify: use kmem_cache_zalloc to simplify event initialization
fsnotify event initialization is done entry by entry with almost everything
set to either 0 or NULL. Use kmem_cache_zalloc and only initialize things
that need non-zero initialization. Also means we don't have to change
initialization entries based on the config options.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:06 +0000 (20:12 -0500)]
fsnotify: kzalloc fsnotify groups
Use kzalloc for fsnotify_groups so that none of the fields can leak any
information accidentally.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:06 +0000 (20:12 -0500)]
inotify: use container_of instead of casting
inotify_free_mark casts directly from an fsnotify_mark_entry to an
inotify_inode_mark_entry. This works, but should use container_of instead
for future proofing.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:06 +0000 (20:12 -0500)]
fsnotify: use fsnotify_create_event to allocate the q_overflow event
Currently fsnotify defines a static fsnotify event which is sent when a
group overflows its allotted queue length. This patch just allocates that
event from the event cache rather than defining it statically. There is no
known reason that the current implementation is wrong, but this makes sure the
event is initialized and created like any other.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:06 +0000 (20:12 -0500)]
Audit: audit watch init should not be before fsnotify init
Audit watch init and fsnotify init both use subsys_initcall() but since the
audit watch code is linked in before the fsnotify code the audit watch code
would be using the fsnotify srcu struct before it was initialized. This
patch fixes that problem by moving audit watch init to device_initcall() so
it happens after fsnotify is ready.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by : Sachin Sant <sachinp@in.ibm.com>
Eric Paris [Fri, 18 Dec 2009 01:12:06 +0000 (20:12 -0500)]
Audit: split audit watch Kconfig
Audit watch should depend on CONFIG_AUDIT_SYSCALL and should select
FSNOTIFY. This splits the spagetti like mixing of audit_watch and
audit_filter code so they can be configured seperately.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:06 +0000 (20:12 -0500)]
Audit: audit watches depend on fsnotify
CONFIG_AUDIT builds audit_watches which depend on fsnotify. Make
CONFIG_AUDIT select fsnotify.
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:05 +0000 (20:12 -0500)]
audit: reimplement audit_trees using fsnotify rather than inotify
Simply switch audit_trees from using inotify to using fsnotify for it's
inode pinning and disappearing act information.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:05 +0000 (20:12 -0500)]
fsnotify: allow addition of duplicate fsnotify marks
This patch allows a task to add a second fsnotify mark to an inode for the
same group. This mark will be added to the end of the inode's list and
this will never be found by the stand fsnotify_find_mark() function. This
is useful if a user wants to add a new mark before removing the old one.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:05 +0000 (20:12 -0500)]
fsnotify: duplicate fsnotify_mark_entry data between 2 marks
Simple copy fsnotify information from one mark to another in preparation
for the second mark to replace the first.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:05 +0000 (20:12 -0500)]
audit: do not get and put just to free a watch
deleting audit watch rules is not currently done under audit_filter_mutex.
It was done this way because we could not hold the mutex during inotify
manipulation. Since we are using fsnotify we don't need to do the extra
get/put pair nor do we need the private list on which to store the parents
while they are about to be freed.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:04 +0000 (20:12 -0500)]
audit: redo audit watch locking and refcnt in light of fsnotify
fsnotify can handle mutexes to be held across all fsnotify operations since
it deals strickly in spinlocks. This can simplify and reduce some of the
audit_filter_mutex taking and dropping.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:04 +0000 (20:12 -0500)]
audit: convert audit watches to use fsnotify instead of inotify
Audit currently uses inotify to pin inodes in core and to detect when
watched inodes are deleted or unmounted. This patch uses fsnotify instead
of inotify.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:04 +0000 (20:12 -0500)]
Audit: clean up the audit_watch split
No real changes, just cleanup to the audit_watch split patch which we done
with minimal code changes for easy review. Now fix interfaces to make
things work better.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 01:12:04 +0000 (20:12 -0500)]
inotify: simplify the inotify idr handling
This patch moves all of the idr editing operations into their own idr
functions. It makes it easier to prove locking correctness and to to
understand the code flow.
Signed-off-by: Eric Paris <eparis@redhat.com>