Avi Kivity [Thu, 31 May 2007 08:56:54 +0000 (11:56 +0300)]
KVM: MMU: Fold fix_read_pf() into set_pte_common()
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 31 May 2007 08:45:18 +0000 (11:45 +0300)]
KVM: MMU: Pass the guest pde to set_pte_common
We will need the accessed bit (in addition to the dirty bit) and
also write access (for setting the dirty bit) in a future patch.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 30 May 2007 16:31:17 +0000 (19:31 +0300)]
KVM: MMU: Move set_pte_common() to pte width dependent code
In preparation of some modifications.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 30 May 2007 11:21:51 +0000 (14:21 +0300)]
KVM: MMU: Simplify fetch() a little bit
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 30 May 2007 09:34:53 +0000 (12:34 +0300)]
KVM: MMU: Use slab caches for shadow pages and their headers
Use slab caches instead of a simple custom list.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Eddie Dong [Tue, 29 May 2007 12:07:21 +0000 (15:07 +0300)]
KVM: Use symbolic constants instead of magic numbers
Signed-off-by: Avi Kivity <avi@qumranet.com>
Markus Rechberger [Sun, 27 May 2007 07:46:52 +0000 (10:46 +0300)]
KVM: Fix includes
KVM compilation fails for some .configs. This fixes it.
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 24 May 2007 08:17:33 +0000 (11:17 +0300)]
KVM: x86 emulator: implement wbinvd
Vista seems to trigger it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Jan Engelhardt [Wed, 23 May 2007 21:22:11 +0000 (14:22 -0700)]
Use menuconfig objects II - KVM/Virt
Make a "menuconfig" out of the Kconfig objects "menu, ..., endmenu",
so that the user can disable all the options in that menu at once
instead of having to disable each option separately.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Eddie Dong [Mon, 21 May 2007 04:28:09 +0000 (07:28 +0300)]
KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit
MSR_EFER.LME/LMA bits are automatically save/restored by VMX
hardware, KVM only needs to save NX/SCE bits at time of heavy
weight VM Exit. But clearing NX bits in host envirnment may
cause system hang if the host page table is using EXB bits,
thus we leave NX bits as it is. If Host NX=1 and guest NX=0, we
can do guest page table EXB bits check before inserting a shadow
pte (though no guest is expecting to see this kind of gp fault).
If host NX=0, we present guest no Execute-Disable feature to guest,
thus no host NX=0, guest NX=1 combination.
This patch reduces raw vmexit time by ~27%.
Me: fix compile warnings on i386.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Eddie Dong [Sun, 20 May 2007 07:50:08 +0000 (10:50 +0300)]
KVM: VMX: Cleanup redundant code in MSR set
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Eddie Dong [Thu, 17 May 2007 15:55:15 +0000 (18:55 +0300)]
KVM: VMX: Avoid saving and restoring msrs on lightweight vmexit
In a lightweight exit (where we exit and reenter the guest without
scheduling or exiting to userspace in between), we don't need various
msrs on the host, and avoiding shuffling them around reduces raw exit
time by 8%.
i386 compile fix by Daniel Hecken <dh@bahntechnik.de>.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Nitin A Kamble [Thu, 17 May 2007 12:50:34 +0000 (15:50 +0300)]
KVM: VMX: Handle #SS faults from real mode
Instructions with address size override prefix opcode 0x67
Cause the #SS fault with 0 error code in VM86 mode. Forward
them to the emulator.
Signed-Off-By: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 14 May 2007 17:41:13 +0000 (20:41 +0300)]
KVM: VMX: Use local labels in inline assembly
This makes oprofile dumps and disassebly easier to read.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 8 May 2007 08:34:07 +0000 (11:34 +0300)]
KVM: Fix vmx I/O bitmap initialization on highmem systems
kunmap() expects a struct page, not a virtual address. Fixes an oops loading
kvm-intel.ko on i386 with CONFIG_HIGHMEM.
Thanks to Michael Ivanov <deruhu@peterstar.ru> for reporting.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 7 May 2007 07:55:37 +0000 (10:55 +0300)]
KVM: Avoid corrupting tr in real mode
The real mode tr needs to be set to a specific tss so that I/O
instructions can function. Divert the new tr values to the real
mode save area from where they will be restored on transition to
protected mode.
This fixes some crashes on reboot when the bios accesses an I/O
instruction.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 6 May 2007 13:10:01 +0000 (16:10 +0300)]
KVM: VMX: Only reload guest msrs if they are already loaded
If we set an msr via an ioctl() instead of by handling a guest exit, we
have the host state loaded, so reloading the msrs would clobber host
state instead of guest state.
This fixes a host oops (and loss of a cpu) on a guest reboot.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 6 May 2007 12:50:58 +0000 (15:50 +0300)]
KVM: MMU: Store shadow page tables as kernel virtual addresses, not physical
Simpifies things a bit.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 6 May 2007 12:36:30 +0000 (15:36 +0300)]
KVM: MMU: Simplify kvm_mmu_free_page() a tiny bit
Signed-off-by: Avi Kivity <avi@qumranet.com>
Matthew Gregan [Sun, 6 May 2007 07:59:46 +0000 (10:59 +0300)]
KVM: Implement IA32_EBL_CR_POWERON msr
Attempting to boot the default 'bsd' kernel of OpenBSD 4.1 i386 in a guest
fails early in the kernel init inside p3_get_bus_clock while trying to read
the IA32_EBL_CR_POWERON MSR. KVM logs an 'unhandled MSR' message and the
guest kernel faults.
This patch is sufficient to allow OpenBSD to boot, after which it seems to
run fine. I'm not sure if this is the correct solution for dealing with
this particular MSR, but it works for me.
Signed-off-by: Matthew Gregan <kinetik@flim.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 2 May 2007 20:06:22 +0000 (23:06 +0300)]
KVM: Set cr0.mp for guests
This allows fwait instructions to be trapped when the guest fpu is not
loaded.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 2 May 2007 17:40:00 +0000 (20:40 +0300)]
KVM: Consolidate guest fpu activation and deactivation
Easier to keep track of where the fpu is this way.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 2 May 2007 14:57:40 +0000 (17:57 +0300)]
KVM: Rationalize exception bitmap usage
Everyone owns a piece of the exception bitmap, but they happily write to
the entire thing like there's no tomorrow. Centralize handling in
update_exception_bitmap() and have everyone call that.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 2 May 2007 14:33:43 +0000 (17:33 +0300)]
KVM: Move some more msr mangling into vmx_save_host_state()
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 2 May 2007 13:54:03 +0000 (16:54 +0300)]
KVM: Fix potential guest state leak into host
The lightweight vmexit path avoids saving and reloading certain host
state. However in certain cases lightweight vmexit handling can schedule()
which requires reloading the host state.
So we store the host state in the vcpu structure, and reloaded it if we
relinquish the vcpu.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 1 May 2007 15:24:38 +0000 (18:24 +0300)]
KVM: Increase mmu shadow cache to 1024 pages
This improves kbuild times by about 10%, bringing it within a respectable
25% of native.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 1 May 2007 13:53:31 +0000 (16:53 +0300)]
KVM: Update shadow pte on write to guest pte
A typical demand page/copy on write pattern is:
- page fault on vaddr
- kvm propagates fault to guest
- guest handles fault, updates pte
- kvm traps write, clears shadow pte, resumes guest
- guest returns to userspace, re-faults on same vaddr
- kvm installs shadow pte, resumes guest
- guest continues
So, three vmexits for a single guest page fault. But if instead of clearing
the page table entry, we update to correspond to the value that the guest
has just written, we eliminate the third vmexit.
This patch does exactly that, reducing kbuild time by about 10%.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 1 May 2007 13:44:05 +0000 (16:44 +0300)]
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 1 May 2007 11:16:52 +0000 (14:16 +0300)]
KVM: Unify kvm_mmu_pre_write() and kvm_mmu_post_write()
Instead of calling two functions and repeating expensive checks, call one
function and provide it with before/after information.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 1 May 2007 08:32:28 +0000 (11:32 +0300)]
KVM: Be more careful restoring fs on lightweight vmexit
i386 wants fs for accessing the pda even on a lightweight exit, so ensure
we can always restore it. This fixes a regression on i386 introduced by
the lightweight vmexit patch.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 30 Apr 2007 14:05:38 +0000 (17:05 +0300)]
KVM: Reduce misfirings of the fork detector
The kvm mmu tries to detects forks by looking for repeated writes to a
page table. If it sees a fork, it unshadows the page table so the page
table copying can proceed at native speed instead of being emulated.
However, the detector also triggered on simple demand paging access patterns:
a linear walk of memory would of course cause repeated writes to the same
pagetable page, causing it to unshadow prematurely.
Fix by resetting the fork detector if we detect a demand fault.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 30 Apr 2007 13:15:58 +0000 (16:15 +0300)]
KVM: Unindent some code
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 30 Apr 2007 13:07:54 +0000 (16:07 +0300)]
KVM: Avoid saving and restoring some host CPU state on lightweight vmexit
Many msrs and the like will only be used by the host if we schedule() or
return to userspace. Therefore, we avoid saving them if we handle the
exit within the kernel, and if a reschedule is not requested.
Based on a patch from Eddie Dong <eddie.dong@intel.com> with a couple of
fixes by me.
Signed-off-by: Yaozu(Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 30 Apr 2007 11:47:02 +0000 (14:47 +0300)]
KVM: Assume that writes smaller than 4 bytes are to non-pagetable pages
This allows us to remove write protection earlier than otherwise. Should
some mad OS choose to use byte writes to update pagetables, it will suffer
a performance hit, but still work correctly.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Anthony Liguori [Mon, 30 Apr 2007 06:48:11 +0000 (09:48 +0300)]
KVM: SVM: Allow direct guest access to PC debug port
The PC debug port is used for IO delay and does not require emulation.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
He, Qing [Mon, 30 Apr 2007 06:45:24 +0000 (09:45 +0300)]
KVM: VMX: Enable io bitmaps to avoid IO port 0x80 VMEXITs
This patch enables IO bitmaps control on vmx and unmask the 0x80 port to
avoid VMEXITs caused by accessing port 0x80. 0x80 is used as delays (see
include/asm/io.h), and handling VMEXITs on its access is unnecessary but
slows things down. This patch improves kernel build test at around
3%~5%.
Because every VM uses the same io bitmap, it is shared between
all VMs rather than a per-VM data structure.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Linus Torvalds [Sun, 15 Jul 2007 23:56:12 +0000 (16:56 -0700)]
Merge git://git.infradead.org/battery-2.6
* git://git.infradead.org/battery-2.6:
git-battery vs git-acpi
Power supply class and drivers: remove non obligatory return statements
pda_power: clean up irq, timer
MAINTAINERS: Add maintainers for power supply subsystem and drivers
Fixed up trivial conflict in drivers/w1/slaves/w1_ds2760.c manually
Linus Torvalds [Sun, 15 Jul 2007 23:51:54 +0000 (16:51 -0700)]
Merge /pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (166 commits)
[SCSI] ibmvscsi: convert to use the data buffer accessors
[SCSI] dc395x: convert to use the data buffer accessors
[SCSI] ncr53c8xx: convert to use the data buffer accessors
[SCSI] sym53c8xx: convert to use the data buffer accessors
[SCSI] ppa: coding police and printk levels
[SCSI] aic7xxx_old: remove redundant GFP_ATOMIC from kmalloc
[SCSI] i2o: remove redundant GFP_ATOMIC from kmalloc from device.c
[SCSI] remove the dead CYBERSTORMIII_SCSI option
[SCSI] don't build scsi_dma_{map,unmap} for !HAS_DMA
[SCSI] Clean up scsi_add_lun a bit
[SCSI] 53c700: Remove printk, which triggers because of low scsi clock on SNI RMs
[SCSI] sni_53c710: Cleanup
[SCSI] qla4xxx: Fix underrun/overrun conditions
[SCSI] megaraid_mbox: use mutex instead of semaphore
[SCSI] aacraid: add 51245, 51645 and 52245 adapters to documentation.
[SCSI] qla2xxx: update version to 8.02.00-k1.
[SCSI] qla2xxx: add support for NPIV
[SCSI] stex: use resid for xfer len information
[SCSI] Add Brownie 1200U3P to blacklist
[SCSI] scsi.c: convert to use the data buffer accessors
...
Linus Torvalds [Sun, 15 Jul 2007 23:50:46 +0000 (16:50 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
[TCP]: Verify the presence of RETRANS bit when leaving FRTO
[IPV6]: Call inet6addr_chain notifiers on link down
[NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
[NET_SCHED]: act_api: qdisc internal reclassify support
[NET_SCHED]: sch_dsmark: act_api support
[NET_SCHED]: sch_atm: act_api support
[NET_SCHED]: sch_atm: Lindent
[IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets
[IPV4]: Cleanup call to __neigh_lookup()
[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization
[NETFILTER]: nf_conntrack: UDPLITE support
[NETFILTER]: nf_conntrack: mark protocols __read_mostly
[NETFILTER]: x_tables: add connlimit match
[NETFILTER]: Lower *tables printk severity
[NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
[NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
[NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
[NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header
[NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.
[AF_IUCV]: Add lock when updating accept_q
...
Linus Torvalds [Sun, 15 Jul 2007 23:44:53 +0000 (16:44 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix a race condition bug in umount which caused a segfault
9p: re-enable mount time debug option
9p: cache meta-data when cache=loose
net/9p: set error to EREMOTEIO if trans->write returns zero
net/9p: change net/9p module name to 9pnet
9p: Reorganization of 9p file system code
Linus Torvalds [Sun, 15 Jul 2007 23:43:43 +0000 (16:43 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (37 commits)
[XFS] Fix lockdep annotations for xfs_lock_inodes
[LIB]: export radix_tree_preload()
[XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode
[XFS] Compat ioctl handler for handle operations
[XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1.
[XFS] Clean up function name handling in tracing code
[XFS] Quota inode has no parent.
[XFS] Concurrent Multi-File Data Streams
[XFS] Use uninitialized_var macro to stop warning about rtx
[XFS] XFS should not be looking at filp reference counts
[XFS] Use is_power_of_2 instead of open coding checks
[XFS] Reduce shouting by removing unnecessary macros from dir2 code.
[XFS] Simplify XFS min/max macros.
[XFS] Kill off xfs_count_bits
[XFS] Cancel transactions on xfs_itruncate_start error.
[XFS] Use do_div() on 64 bit types.
[XFS] Fix remount,readonly path to flush everything correctly.
[XFS] Cleanup inode extent size hint extraction
[XFS] Prevent ENOSPC from aborting transactions that need to succeed
[XFS] Prevent deadlock when flushing inodes on unmount
...
Al Viro [Sun, 15 Jul 2007 20:37:16 +0000 (21:37 +0100)]
make i2c-acorn tristate
It depends on tristate I2C and it's trivial to make modular. The
current Kconfig allows I2C=m, I2C_ACORN=y, which doesn't work at
all; alternatives are dependency on I2C=y and making I2C_ACORN
itself a tristate. The latter is the right thing to do...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Al Viro [Sun, 15 Jul 2007 20:01:32 +0000 (21:01 +0100)]
icside: devm_iounmap() needs linux/io.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Al Viro [Sun, 15 Jul 2007 20:01:22 +0000 (21:01 +0100)]
missing argument in bin_attribute ->read()/->write()
Fallout from commit
91a6902958f052358899f58683d44e36228d85c2 ('sysfs:
add parameter "struct bin_attribute *" ...')
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 20:01:12 +0000 (21:01 +0100)]
fallout from constified seq_operations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 20:01:02 +0000 (21:01 +0100)]
fallout from Auke's pci ->revision patch
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 20:00:51 +0000 (21:00 +0100)]
ax88796: dev_dbg() wants device, not platform device
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 20:00:41 +0000 (21:00 +0100)]
pass -msize-long to sparse on s390
s390 is the only 32bit with unsigned long for size_t (usual for those
is unsigned int). Tell sparse...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 20:00:31 +0000 (21:00 +0100)]
frv: missing __clear_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 20:00:21 +0000 (21:00 +0100)]
zd1211rw: too early inclusion of asm/unaligned.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 20:00:11 +0000 (21:00 +0100)]
fix return type of skb_checksum_complete()
It returns __sum16, not unsigned int
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 20:00:01 +0000 (21:00 +0100)]
PDA_POWER depends on having request_irq()
... so all proud owners of s390-based PDAs will have to live without that one
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 19:59:51 +0000 (20:59 +0100)]
ieee1394: forgotten dereference...
Going through the string and waiting for _pointer_ to become '\0'
is not what the authors meant...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Ben Collins <ben.collins@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 19:59:41 +0000 (20:59 +0100)]
the wrong variable checked after request_irq()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 19:59:31 +0000 (20:59 +0100)]
wrong order of arguments of ->readdir()
Shows how many people are testing coda - the bug had been there for 5 years
and results of stepping on it are not subtle.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 15 Jul 2007 19:59:22 +0000 (20:59 +0100)]
minimal fixes for drivers/usb/gadget/m66592-udc.c
still looks racy (and definitely leaks)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Sun, 15 Jul 2007 18:37:03 +0000 (22:37 +0400)]
git-battery vs git-acpi
drivers/w1/slaves/w1_ds2760.c:85: warning: initialization from incompatible pointer type
The ACPI guys changed the bin_attr APIs
(commit
91a6902958f052358899f58683d44e36228d85c2)
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Anton Vorontsov [Sun, 15 Jul 2007 01:18:25 +0000 (05:18 +0400)]
Power supply class and drivers: remove non obligatory return statements
Per Jeff Garzik request.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Anton Vorontsov <cbou@mail.ru>
Jeff Garzik [Sat, 14 Jul 2007 23:12:04 +0000 (19:12 -0400)]
pda_power: clean up irq, timer
Clean up pda_power interrupt handling:
Prior to this patch, the driver would pass information it needed
to the interrupt handler dev_id pointer, and then prompt forget it
ever did so, recreating that same information after a couple passes
through the timer-based state machine.
This patch removes the redundant checks by passing the
pda_power_supply[] pointer through the state machine. The current
code passed 'irq' through the state machine, as an index to recreate
the pointer, when we could more simply pass around the pointer itself.
This patch makes it easier to remove the 'irq' argument in the future,
in addition to cleaning up the driver today.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Anton Vorontsov [Sun, 15 Jul 2007 00:43:36 +0000 (04:43 +0400)]
MAINTAINERS: Add maintainers for power supply subsystem and drivers
Signed-off-by: Anton Vorontsov <cbou@mail.ru>
Acked-by: David Woodhouse <dwmw2@infradead.org>
FUJITA Tomonori [Fri, 25 May 2007 15:32:58 +0000 (00:32 +0900)]
[SCSI] ibmvscsi: convert to use the data buffer accessors
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Santiago Leon <santil@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
FUJITA Tomonori [Fri, 25 May 2007 17:07:09 +0000 (02:07 +0900)]
[SCSI] dc395x: convert to use the data buffer accessors
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
Jens Axboe <jens.axboe@oracle.com> did the for_each_sg cleanup.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jamie Lenehan <lenehan@twibble.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
FUJITA Tomonori [Mon, 14 May 2007 10:22:21 +0000 (19:22 +0900)]
[SCSI] ncr53c8xx: convert to use the data buffer accessors
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
Jens Axboe <jens.axboe@oracle.com> did the for_each_sg cleanup.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
FUJITA Tomonori [Fri, 25 May 2007 17:31:17 +0000 (02:31 +0900)]
[SCSI] sym53c8xx: convert to use the data buffer accessors
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
Jens Axboe <jens.axboe@oracle.com> did the for_each_sg cleanup.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Alan Cox [Mon, 9 Jul 2007 19:00:10 +0000 (12:00 -0700)]
[SCSI] ppa: coding police and printk levels
Add printk levels
Clean up some oddities of formatting
Fix goto labels
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Satyam Sharma [Mon, 9 Jul 2007 19:00:07 +0000 (12:00 -0700)]
[SCSI] aic7xxx_old: remove redundant GFP_ATOMIC from kmalloc
drivers/scsi/aic7xxx_old.c:aic7xxx_slave_alloc() unnecessarily passes
GFP_ATOMIC (along with GFP_KERNEL) to kmalloc() from a context that is not
atomic. Remove the pointless GFP_ATOMIC.
Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Satyam Sharma [Mon, 9 Jul 2007 19:00:07 +0000 (12:00 -0700)]
[SCSI] i2o: remove redundant GFP_ATOMIC from kmalloc from device.c
drivers/message/i2o/device.c:i2o_parm_field_get() unnecessarily passes
GFP_ATOMIC (along with GFP_KERNEL) to kmalloc() from a context that is not
atomic. Remove the pointless GFP_ATOMIC.
Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Adrian Bunk [Mon, 9 Jul 2007 19:00:10 +0000 (12:00 -0700)]
[SCSI] remove the dead CYBERSTORMIII_SCSI option
Not converted to the 2.6 kconfig system and no code in the tree.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Ilpo Järvinen [Sun, 15 Jul 2007 07:19:29 +0000 (00:19 -0700)]
[TCP]: Verify the presence of RETRANS bit when leaving FRTO
For yet unknown reason, something cleared SACKED_RETRANS bit
underneath FRTO.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Sun, 15 Jul 2007 07:16:35 +0000 (00:16 -0700)]
[IPV6]: Call inet6addr_chain notifiers on link down
Currently if the link is brought down via ip link or ifconfig down,
the inet6addr_chain notifiers are not called even though all
the addresses are removed from the interface. This caused SCTP
to add duplicate addresses to it's list.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 07:03:05 +0000 (00:03 -0700)]
[NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
remove the old code. The config option will be kept around to select
the equivalent NET_CLS_ACT options for a short time to allow easier
upgrades.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 07:02:31 +0000 (00:02 -0700)]
[NET_SCHED]: act_api: qdisc internal reclassify support
The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return
it to the qdisc, which could handle it internally or ignore it. With
NET_CLS_ACT however, tc_classify starts over at the first classifier
and never returns it to the qdisc. This makes it impossible to support
qdisc-internal reclassification, which in turn makes it impossible to
remove the old NET_CLS_POLICE code without breaking compatibility since
we have two qdiscs (CBQ and ATM) that support this.
This patch adds a tc_classify_compat function that handles
reclassification the old way and changes CBQ and ATM to use it.
This again is of course not fully backwards compatible with the previous
NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain
compatibility *and* support qdisc internal reclassification with
NET_CLS_ACT, but this seems like the better choice over keeping the two
incompatible options around forever.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 07:02:10 +0000 (00:02 -0700)]
[NET_SCHED]: sch_dsmark: act_api support
Handle act_api classification results.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 07:01:49 +0000 (00:01 -0700)]
[NET_SCHED]: sch_atm: act_api support
Handle act_api classification results.
The ATM scheduler behaves slightly different than other schedulers
in that it only handles policer results for successful classifications,
this behaviour is retained for the act_api case.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 07:01:25 +0000 (00:01 -0700)]
[NET_SCHED]: sch_atm: Lindent
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Butskoy [Sun, 15 Jul 2007 06:53:08 +0000 (23:53 -0700)]
[IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets
From: Dmitry Butskoy <dmitry@butskoy.name>
Taken from http://bugzilla.kernel.org/show_bug.cgi?id=8747
Problem Description:
It is related to the possibility to obtain MSG_ERRQUEUE messages from the udp
and raw sockets, both connected and unconnected.
There is a little typo in net/ipv6/icmp.c code, which prevents such messages
to be delivered to the errqueue of the correspond raw socket, when the socket
is CONNECTED. The typo is due to swap of local/remote addresses.
Consider __raw_v6_lookup() function from net/ipv6/raw.c. When a raw socket is
looked up usual way, it is something like:
sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif);
where "daddr" is a destination address of the incoming packet (IOW our local
address), "saddr" is a source address of the incoming packet (the remote end).
But when the raw socket is looked up for some icmp error report, in
net/ipv6/icmp.c:icmpv6_notify() , daddr/saddr are obtained from the echoed
fragment of the "bad" packet, i.e. "daddr" is the original destination
address of that packet, "saddr" is our local address. Hence, for
icmpv6_notify() must use "saddr, daddr" in its arguments, not "daddr, saddr"
...
Steps to reproduce:
Create some raw socket, connect it to an address, and cause some error
situation: f.e. set ttl=1 where the remote address is more than 1 hop to reach.
Set IPV6_RECVERR .
Then send something and wait for the error (f.e. poll() with POLLERR|POLLIN).
You should receive "time exceeded" icmp message (because of "ttl=1"), but the
socket do not receive it.
If you do not connect your raw socket, you will receive MSG_ERRQUEUE
successfully. (The reason is that for unconnected socket there are no actual
checks for local/remote addresses).
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 15 Jul 2007 06:47:04 +0000 (23:47 -0700)]
Merge /pub/scm/linux/kernel/git/herbert/crypto-2.6
Conflicts:
crypto/Kconfig
Jean Delvare [Sun, 15 Jul 2007 03:51:44 +0000 (20:51 -0700)]
[IPV4]: Cleanup call to __neigh_lookup()
Back in the times of Linux 2.2, negative values for the creat parameter
of __neigh_lookup() had a particular meaning, but no longer, so we
should pass 1 instead.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 03:49:26 +0000 (20:49 -0700)]
[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization
As noticed by Ranko Zivojnovic <ranko@spidernet.net>, calling qdisc_run
from the timer handler can result in deadlock:
> CPU#0
>
> qdisc_watchdog() fires and gets dev->queue_lock
> qdisc_run()...qdisc_restart()...
> -> releases dev->queue_lock and enters dev_hard_start_xmit()
>
> CPU#1
>
> tc del qdisc dev ...
> qdisc_graft()...dev_graft_qdisc()...dev_deactivate()...
> -> grabs dev->queue_lock ...
>
> qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()...
> -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still
> holding dev->queue_lock
>
> CPU#0
>
> dev_hard_start_xmit() returns ...
> -> wants to get dev->queue_lock(!)
>
> DEADLOCK!
The entire optimization is a bit questionable IMO, it moves potentially
large parts of NET_TX_SOFTIRQ work to TIMER_SOFTIRQ/HRTIMER_SOFTIRQ,
which kind of defeats the separation of them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Ranko Zivojnovic <ranko@spidernet.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 03:48:44 +0000 (20:48 -0700)]
[NETFILTER]: nf_conntrack: UDPLITE support
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 03:48:19 +0000 (20:48 -0700)]
[NETFILTER]: nf_conntrack: mark protocols __read_mostly
Also remove two unnecessary EXPORT_SYMBOLs and move the
nf_conntrack_l3proto_ipv4 declaration to the correct file.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Sun, 15 Jul 2007 03:47:26 +0000 (20:47 -0700)]
[NETFILTER]: x_tables: add connlimit match
ipt_connlimit has been sitting in POM-NG for a long time.
Here is a new shiny xt_connlimit with:
* xtables'ified
* will request the layer3 module
(previously it hotdropped every packet when it was not loaded)
* fixed: there was a deadlock in case of an OOM condition
* support for any layer4 protocol (e.g. UDP/SCTP)
* using jhash, as suggested by Eric Dumazet
* ipv6 support
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 03:46:15 +0000 (20:46 -0700)]
[NETFILTER]: Lower *tables printk severity
Lower ip6tables, arptables and ebtables printk severity similar to
Dan Aloni's patch for iptables.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 15 Jul 2007 03:45:41 +0000 (20:45 -0700)]
[NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
The conntrack assigned to locally generated ICMP error is usually the one
assigned to the original packet which has caused the error. But if
the original packet is handled as invalid by nf_conntrack, no conntrack
is assigned to the original packet. Then nf_ct_attach() cannot assign
any conntrack to the ICMP error packet. In that case the current
nf_conntrack_icmp assigns appropriate conntrack to it. But the current
code mistakes the direction of the packet. As a result, NAT code mistakes
the address to be mangled.
To fix the bug, this changes nf_conntrack_icmp not to assign conntrack
to such ICMP error. Actually no address is necessary to be mangled
in this case.
Spotted by Jordan Russell.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 15 Jul 2007 03:45:14 +0000 (20:45 -0700)]
[NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
nf_ct_get_tuple() requires the offset to transport header and that bothers
callers such as icmp[v6] l4proto modules. This introduces new function
to simplify them.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 15 Jul 2007 03:44:50 +0000 (20:44 -0700)]
[NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
The icmp[v6] l4proto modules parse headers in ICMP[v6] error to get tuple.
But they have to find the offset to transport protocol header before that.
Their processings are almost same as prepare() of l3proto modules.
This makes prepare() more generic to simplify icmp[v6] l4proto module
later.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Sun, 15 Jul 2007 03:44:23 +0000 (20:44 -0700)]
[NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 15 Jul 2007 02:07:52 +0000 (19:07 -0700)]
[NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.
Add ethtool utility function to set or clear IPV6_CSUM feature flag.
Modify tg3.c and bnx2.c to use this function when doing ethtool -K
to change tx checksum.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Sun, 15 Jul 2007 02:04:25 +0000 (19:04 -0700)]
[AF_IUCV]: Add lock when updating accept_q
The accept_queue of an af_iucv socket will be corrupted, if
adding and deleting of entries in this queue occurs at the
same time (connect request from one client, while accept call
is processed for another client).
Solution: add locking when updating accept_q
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Acked-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Sun, 15 Jul 2007 02:03:41 +0000 (19:03 -0700)]
[AF_IUCV]: Avoid deadlock between iucv_path_connect and tasklet.
An iucv deadlock may occur, where one CPU is spinning on the
iucv_table_lock for iucv_tasklet_fn(), while another CPU is holding
the iucv_table_lock for an iucv_path_connect() and is waiting for
the first CPU in an smp_call_function.
Solution: replace spin_lock in iucv_tasklet_fn by spin_trylock and
reschedule tasklet in case of non-granted lock.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Acked-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jennifer Hunt [Sun, 15 Jul 2007 02:03:00 +0000 (19:03 -0700)]
[AF_IUCV]: Improve description of IUCV and AFIUCV configuration options.
Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com>
Signed-off-by: Ursula Braun >braunu@de.ibm.com>
Acked-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk [Sun, 15 Jul 2007 02:00:59 +0000 (19:00 -0700)]
[INET_SOCK]: make net/ipv4/inet_timewait_sock.c:__inet_twsk_kill() static
This patch makes the needlessly global __inet_twsk_kill() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 15 Jul 2007 01:58:49 +0000 (18:58 -0700)]
Merge branch 'upstream-davem' of /linux/kernel/git/linville/wireless-2.6
Stephen Hemminger [Sun, 15 Jul 2007 01:57:19 +0000 (18:57 -0700)]
[TCP]: tcp probe add back ssthresh field
Sangtae noticed the ssthresh got missed.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 01:56:30 +0000 (18:56 -0700)]
[VLAN]: Fix memset length
Fix sizeof(ETH_ALEN) Introduced by my rtnl_link patches.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 01:55:06 +0000 (18:55 -0700)]
[NET]: Add macvlan driver
Add macvlan driver, which allows to create virtual ethernet devices
based on MAC address.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 01:53:28 +0000 (18:53 -0700)]
[VLAN]: Use multicast list synchronization helpers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 01:52:56 +0000 (18:52 -0700)]
[VLAN]: Fix promiscous/allmulti synchronization races
The set_multicast_list function may be called without holding the rtnl
mutex, resulting in races when changing the underlying device's promiscous
and allmulti state. Use the change_rx_mode hook, which is always invoked
under the rtnl.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 01:52:02 +0000 (18:52 -0700)]
[NET]: dev_mcast: add multicast list synchronization helpers
The method drivers currently use to synchronize multicast lists is not
very pretty:
- walk the multicast list
- search each entry on a copy of the previous list
- if new add to lower device
- walk the copy of the previous list
- search each entry on the current list
- if removed delete from lower device
- copy entire list
This patch adds a new field to struct dev_addr_list to store the
synchronization state and adds two helper functions for synchronization
and cleanup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 15 Jul 2007 01:51:31 +0000 (18:51 -0700)]
[NET]: Add net_device change_rx_mode callback
Currently the set_multicast_list (and set_rx_mode) callbacks are
responsible for configuring the device according to the IFF_PROMISC,
IFF_MULTICAST and IFF_ALLMULTI flags and the mc_list (and uc_list in
case of set_rx_mode).
These callbacks can be invoked from BH context without the rtnl_mutex
by dev_mc_add/dev_mc_delete, which makes reading the device flags and
promiscous/allmulti count racy. For real hardware drivers that just
commit all changes to the hardware this is not a real problem since
the stack guarantees to call them for every change, so at least the
final call will not race and commit the correct configuration to the
hardware.
For software devices that want to synchronize promiscous and multicast
state to an underlying device however this can cause corruption of the
underlying device's flags or promisc/allmulti counts.
When the software device is concurrently put in promiscous or allmulti
mode while set_multicast_list is invoked from bottem half context, the
device might synchronize the change to the underlying device without
holding the rtnl_mutex, which races with concurrent changes to the
underlying device.
Add a dev->change_rx_flags hook that is invoked when any of the flags
that affect rx filtering change (under the rtnl_mutex), which allows
drivers to perform synchronization immediately and only synchronize
the address lists in set_multicast_list/set_rx_mode.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>