Vernon Mauery [Tue, 18 May 2010 22:02:50 +0000 (19:02 -0300)]
Add support for Westmere to i7core_edac driver
This adds new PCI IDs for the Westmere's memory controller
devices and modifies the i7core_edac driver to be able to
probe both Nehalem and Westmere processors.
Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Tony Luck [Tue, 18 May 2010 13:53:25 +0000 (10:53 -0300)]
i7core_edac: don't free on success
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 18 May 2010 16:00:31 +0000 (13:00 -0300)]
i7core_edac: Add support for X5670
As reported by Vernon Mauery <vernux@us.ibm.com>, X5670 (Westmere-EP) uses a
different register for one of the uncore PCI devices. Add support for
it.
Those are the PCI ID's on this new chipset:
fe:00.0 0600: 8086:2c70 (rev 02)
fe:00.1 0600: 8086:2d81 (rev 02)
fe:02.0 0600: 8086:2d90 (rev 02)
fe:02.1 0600: 8086:2d91 (rev 02)
fe:02.2 0600: 8086:2d92 (rev 02)
fe:02.3 0600: 8086:2d93 (rev 02)
fe:02.4 0600: 8086:2d94 (rev 02)
fe:02.5 0600: 8086:2d95 (rev 02)
fe:03.0 0600: 8086:2d98 (rev 02)
fe:03.1 0600: 8086:2d99 (rev 02)
fe:03.2 0600: 8086:2d9a (rev 02)
fe:03.4 0600: 8086:2d9c (rev 02)
fe:04.0 0600: 8086:2da0 (rev 02)
fe:04.1 0600: 8086:2da1 (rev 02)
fe:04.2 0600: 8086:2da2 (rev 02)
fe:04.3 0600: 8086:2da3 (rev 02)
fe:05.0 0600: 8086:2da8 (rev 02)
fe:05.1 0600: 8086:2da9 (rev 02)
fe:05.2 0600: 8086:2daa (rev 02)
fe:05.3 0600: 8086:2dab (rev 02)
fe:06.0 0600: 8086:2db0 (rev 02)
fe:06.1 0600: 8086:2db1 (rev 02)
fe:06.2 0600: 8086:2db2 (rev 02)
fe:06.3 0600: 8086:2db3 (rev 02)
(as usual, the same PCI devices repeat at ff: bus)
The PCI device 8086:2c70 is shown as:
fe:00.0 Host bridge: Intel Corporation QuickPath Architecture Generic
Non-core Registers (rev 02)
So, for this device to be recognized, it is only a matter of adding this
new PCI ID to the driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Vernon Mauery [Fri, 16 Apr 2010 22:40:19 +0000 (19:40 -0300)]
Always call i7core_[ur]dimm_check_mc_ecc_err
This fixes an error in function i7core_check_error
In commit
ca9c90ba09ca3c9799319f46a56f397afbf617c2 which converts the
driver to use double buffering, there is a change in the logic. Before,
if mce_count was zero, it skipped over a couple of statements and
finished out with a call to the *check_mc_ecc_err function. The current
code checks to see if mce_count is 0 and then exits.
This change reverts the behavior back to the original where if there are
no errors to report, we skip to the end and call the *check_mc_ecc_err
function.
This fix allows the driver to work again on my Nehalem based blades
again.
Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Alexander Beregalov [Fri, 8 Jan 2010 02:27:30 +0000 (23:27 -0300)]
i7core_edac: fix memory leak of i7core_dev
Free already allocated i7core_dev.
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Jiri Slaby [Wed, 9 Dec 2009 19:55:15 +0000 (16:55 -0300)]
EDAC: add __init to i7core_xeon_pci_fixup
It's called only from an __init function and is the only user
of pcibios_scan_specific_bus which will be marked as __devinit in
the next patch.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 14 Oct 2009 16:44:37 +0000 (13:44 -0300)]
i7core_edac: Fix wrong device id for channel 1 devices
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 14 Oct 2009 16:31:06 +0000 (13:31 -0300)]
i7core: add support for Lynnfield alternate address
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 14 Oct 2009 14:21:58 +0000 (11:21 -0300)]
i7core_edac: Add initial support for Lynnfield
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Stephen Rothwell [Fri, 4 Dec 2009 18:49:34 +0000 (16:49 -0200)]
i7core_edac: do not export static functions
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Randy Dunlap [Sun, 8 Nov 2009 03:36:40 +0000 (01:36 -0200)]
edac: fix i7core build
Fix build warning (missing header file) and
build error when CONFIG_SMP=n.
drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep'
drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Alan Cox [Sun, 8 Nov 2009 03:34:27 +0000 (01:34 -0200)]
edac: i7core_edac produces undefined behaviour on 32bit
Fix the shifts up
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 14 Oct 2009 11:02:40 +0000 (08:02 -0300)]
i7core_edac: Use a more generic approach for probing PCI devices
Currently, only one PCI set of tables is allowed. This prevents using
the driver for other devices like Lynnfield, with have a different
set of PCI ID's.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 14 Oct 2009 09:07:07 +0000 (06:07 -0300)]
i7core_edac: PCI device is called NONCORE, instead of NOCORE
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 8 Oct 2009 16:11:08 +0000 (13:11 -0300)]
i7core_edac: Fix ringbuffer maxsize
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Mon, 5 Oct 2009 12:40:09 +0000 (09:40 -0300)]
i7core_edac: First store, then increment
Fix ringbuffer store logic.
While here, add a few comments to the code and remove the undesired
printk that could otherwise be called during NMI time.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sun, 4 Oct 2009 14:54:56 +0000 (11:54 -0300)]
i7core_edac: Better parse "any" addrmask
Instead of accepting just "any", accept also "any\n"
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sun, 4 Oct 2009 13:15:40 +0000 (10:15 -0300)]
i7core_edac: Use a lockless ringbuffer
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Fri, 25 Sep 2009 16:42:25 +0000 (13:42 -0300)]
edac: Create an unique instance for each kobj
Current code only works when there's just one memory
controller, since we need one kobj for each instance.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 24 Sep 2009 20:28:50 +0000 (17:28 -0300)]
Documentation/edac.txt: Reflect the sysfs changes at the document
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 24 Sep 2009 20:25:43 +0000 (17:25 -0300)]
i7core_edac: Convert UDIMM error counters into a proper sysfs group
Instead of displaying 3 values at the same var, break it into 3
different sysfs nodes:
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2
For registered dimms, however, the error counters are already being
displayed at:
/sys/devices/system/edac/mc/mc0/csrow*/ce_count
So, there's no need to add any extra sysfs nodes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 24 Sep 2009 19:36:32 +0000 (16:36 -0300)]
edac: Don't create csrow entries on instance groups
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 24 Sep 2009 19:23:42 +0000 (16:23 -0300)]
edac: store/show methods for device groups weren't working
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 23 Sep 2009 21:56:47 +0000 (18:56 -0300)]
i7core_edac: Add support for sysfs addrmatch group
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 23 Sep 2009 19:26:09 +0000 (16:26 -0300)]
edac_core: Allow the creation of sysfs groups
Currently, all sysfs nodes are stored at /sys/.*/mc. (regex)
However, sometimes it is needed to create attribute groups.
This patch extends edac_core to allow groups creation.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 24 Sep 2009 12:58:26 +0000 (09:58 -0300)]
i7core_edac: Avoid printing a warning when debug is disabled
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 24 Sep 2009 12:59:13 +0000 (09:59 -0300)]
i7core_edac: We need to use list_for_each_entry_safe to avoid errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sun, 6 Sep 2009 02:06:50 +0000 (23:06 -0300)]
i7core_edac: change remove module strategy
The old remove module stragegy didn't work on devices with multiple
cores, since only one PCI device is used to open all mc's, due to
Nehalem nature.
Also, it were based at pdev value. However, this doesn't point to the
pci device used at mci->dev.
So, instead, it unregisters all devices at once, deleting them from the
device list.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 5 Sep 2009 15:16:19 +0000 (12:16 -0300)]
i7core_edac: remove static counter for max sockets
The number of sockets is now fully dynamic. Get rid of this obsolete
var.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 5 Sep 2009 15:15:20 +0000 (12:15 -0300)]
i7core_edac: at remove, don't remove all pci devices at once
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 5 Sep 2009 08:10:31 +0000 (05:10 -0300)]
i7core_edac: Fix a bug when printing error counts with RDIMMs
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 5 Sep 2009 08:10:15 +0000 (05:10 -0300)]
Documentation/edac.txt: Improve it to reflect the latest changes at the driver
Signed-off-by: Mauro Carvalho Chehab <mcheahb@redhat.com>
Mauro Carvalho Chehab [Sat, 5 Sep 2009 07:12:02 +0000 (04:12 -0300)]
i7core_edac: a few fixes for multiple mc's
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 5 Sep 2009 06:27:04 +0000 (03:27 -0300)]
i7core_edac: sanity check: print a warning if a mcelog is ignored
In thesis, the other mc controller should handle it.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 5 Sep 2009 05:35:08 +0000 (02:35 -0300)]
i7core_edac: create one mc per socket/QPI
Instead of creating just one memory controller, create one per socket
(e. g. per Quick Link Path Interconnect).
This better reflects the Nehalem architecture.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 5 Sep 2009 03:52:11 +0000 (00:52 -0300)]
Dynamically allocate memory for PCI devices
Instead of using a static table assuming always 2 CPU sockets, allocate
space dynamically for Nehalem PCI devs.
This patch is part of a series of patches that changes i7core_edac to
allow more than 2 sockets and to properly report one memory controller
per socket.
Mauro Carvalho Chehab [Sat, 5 Sep 2009 03:47:21 +0000 (00:47 -0300)]
i7core: temporary workaround to allow it to compile against 2.6.30
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 3 Sep 2009 23:17:26 +0000 (20:17 -0300)]
i7core_edac: Improve corrected_error_counts output for RDIMM
Just cosmetics. instead of showing something like:
socket 0, channel 2dimm0: 1
dimm1: 0
dimm2: 0
socket 1, channel 2dimm0: 0
dimm1: 0
dimm2: 0
Show:
socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0
socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0
This is more synthetic and easier to parse.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Keith Mannthey [Thu, 3 Sep 2009 03:05:05 +0000 (00:05 -0300)]
i7core_edac: Probe on Xeons eariler
On the Xeon 55XX series cpus the pci deives are not exposed via acpi so
we much explicitly probe them to make the usable as a Linux PCI device.
This moves the detection of this state to before pci_register_driver is
called. Its present position was not working on my systems, the driver
would complain about not finding a specific device.
This patch allows the driver to load on my systems.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 3 Sep 2009 02:52:36 +0000 (23:52 -0300)]
i7core: Use registered memories per processor
Instead of assuming that the entire machine has either registered or
unregistered memories, do it at CPU socket based.
While here, fix a bug at i7core_mce_output_error(), where the we're
using m->cpu directly as if it would represent a socket. Instead, the
proper socket_id is given by cpu_data[m->cpu].phys_proc_id.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
Mauro Carvalho Chehab [Thu, 3 Sep 2009 02:49:59 +0000 (23:49 -0300)]
i7core_edac: Use Device 3 function 2 to report errors with RDIMM's
Nehalem and upper chipsets provide an special device that has corrected memory
error counters detected with registered dimms. This device is only seen if
there are registered memories plugged.
After this patch, on a machine fully equiped with RDIMM's, it will use the
Device 3 function 2 to count corrected errors instead on relying at mcelog.
For unregistered DIMMs, it will keep the old behavior, counting errors
via mcelog.
This patch were developed together with Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Keith Mannthey [Thu, 3 Sep 2009 02:46:59 +0000 (23:46 -0300)]
i7core_edac: Fix ecc enable shift
From: Keith Mannthey <kmannth@us.ibm.com>
Simple correction to a shift value.
ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c
This correctly identifies the state of the ECC at the machine.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 3 Sep 2009 02:43:33 +0000 (23:43 -0300)]
i7core_edac: Print an error message if pci register fails
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 6 Aug 2009 00:36:35 +0000 (21:36 -0300)]
i7core_edac: CodingSyle fixes/cleanups
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 6 Aug 2009 00:16:56 +0000 (21:16 -0300)]
Documentation/edac.txt: Add Nehalem specific EDAC characteristics
As Nehalem has a different binding to EDAC API, and its own different
error injection code, documents it.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 5 Aug 2009 23:27:15 +0000 (20:27 -0300)]
i7core_edac: fix error injection
There were two stupid error injection bugs introduced by wrong
cut-and-paste: one at socket store, and another at the error inject
register. The last one were causing the code to not work at all.
While here, adds debug messages to allow seeing what registers are being
set while sending error injection.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 5 Aug 2009 22:28:27 +0000 (19:28 -0300)]
i7core_edac: fix error codes for sysfs error injection interface
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 23 Jul 2009 00:45:50 +0000 (21:45 -0300)]
i7core_edac: some fixes at error injection code
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Mon, 20 Jul 2009 21:48:18 +0000 (18:48 -0300)]
i7core_edac: Some cleanups at displayed info
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 18 Jul 2009 15:22:28 +0000 (12:22 -0300)]
i7core: remove some uneeded noisy debug messages
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 18 Jul 2009 15:20:04 +0000 (12:20 -0300)]
i7core: add socket info at the debug msg
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 18 Jul 2009 13:44:30 +0000 (10:44 -0300)]
i7core: better document i7core_get_active_channels()
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Sat, 18 Jul 2009 13:43:08 +0000 (10:43 -0300)]
i7core: fix get_devices routine for Xeon55xx
i7core_get_devices() were preparet to get just the first found device of each type.
Due to that, on Xeon 55xx, only socket 1 were retrived.
Rework i7core_get_devices() to clean it and to properly support Xeon 55xx.
While here, fix a small typo.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Fri, 17 Jul 2009 13:54:23 +0000 (10:54 -0300)]
i7core: enrich error information based on memory transaction type
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Fri, 17 Jul 2009 13:28:15 +0000 (10:28 -0300)]
i7core: check if the memory error is fatal or non-fatal
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Fri, 17 Jul 2009 03:09:10 +0000 (00:09 -0300)]
i7core: fix probing on Xeon55xx
Xeon55xx fails to probe with this error message:
EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1660: MC: drivers/edac/i7core_edac.c: i7core_init()
EDAC i7core: Device not found: dev 00:00.0 PCI ID 8086:2c41
i7core_edac: probe of 0000:00:14.0 failed with error -22
This is due to the fact that, on Xeon35xx (and i7core), device 00.0 has
PCI ID 8086:2c40.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 15 Jul 2009 22:53:24 +0000 (19:53 -0300)]
i7core_edac: some fixes at memory error parser
m->bank is not related to the memory bank but, instead, to the MCA Error
register bank. Fix it accordingly. While here, improves the comments for
Nehalem bank.
A later fix is needed, in order to get bank/rank information from MCA
error log.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 15 Jul 2009 22:01:08 +0000 (19:01 -0300)]
i7core_edac: decode mcelog error and send it via edac interface
Enriches mcelog error by using the encoded information at MCE status and
misc registers (IA32_MCx_STATUS, IA32_MCx_MISC).
Some fixes are still needed here, in order to properly fill the EDAC
fields.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 15 Jul 2009 12:02:32 +0000 (09:02 -0300)]
i7core_edac: maps all sockets as if ther are one MC controller
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Wed, 15 Jul 2009 09:56:23 +0000 (06:56 -0300)]
i7core_edac: add support for more than one MC socket
Some Nehalem architectures have more than one MC socket. Socket 0 is
located at bus 255.
Currently, it is using up to 2 sockets, but increasing it to a larger
number is just a matter of increasing MAX_SOCKETS definition.
This seems to be required for properly support of Xeon 55xx.
Still needs testing with Xeon 55xx.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Fri, 10 Jul 2009 21:39:53 +0000 (18:39 -0300)]
i7core_edac: Add a code to probe Xeon 55xx bus
This code changes the detection procedure of i7core_edac. Instead of
directly probing for MC registers, it probes for another register found
on Nehalem. If found, it tries to pick the first MC PCI BUS. This should
work fine with Xeon 35xx, but, on Xeon 55xx, this is at bus 254 and 255
that are not properly detected by the non-legacy PCI methods.
The new detection code scans specifically at buses 254 and 255 for the
Xeon 55xx devices.
This code has not tested yet. After working, a change at the code will
be needed, since the i7core is not yet ready for working with 2 sets of
MC.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Aristeu Rozanski [Fri, 10 Jul 2009 01:21:13 +0000 (22:21 -0300)]
pci: Add a probing code that seeks for an specific bus
This patch adds a probing code that seeks for an specific pci bus. It
still needs testing, but it is hoped that this will help to identify the
memory controller with Xeon 55xx series.
Signed-off-by: Aristeu Sergio <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Fri, 10 Jul 2009 01:14:35 +0000 (22:14 -0300)]
i7core_edac: Adds write unlock to MC registers
The public Intel Xeon 5500 volume 2 datasheet describes, on page 53,
session 2.6.7 a register that can lock/unlock Memory Controller the
configuration register, called MC_CFG_CONTROL.
Adds support for it in the hope that software error injection would
work. With my tests with Xeon 35xx, there's still something missing.
With a program that does sequencial bit writes at dev 0.0, sometimes, it
produces error injection, after unblocking the MC_CFG_CONTROL (and,
sometimes, it just locks my testing machine).
I'll try later to discover by trial and error what's the register that
solves this issue on Xeon 35xx.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Fri, 10 Jul 2009 01:06:41 +0000 (22:06 -0300)]
i7core_edac: Add edac_mce glue
Adds a glue code to allow i7core to work with mcelog. With the glue,
i7core registers itself on edac_mce. At mce, when an error is detected,
it calls all registered drivers (in this case, i7core), for EDAC error
handling.
TODO: It currently just prints the MCE error log using about the same
format as mce panic messages. The error message should be enhanced
with mcelog userspace info and converted into the proper EDAC format,
to feed the EDAC error counts.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Fri, 10 Jul 2009 01:04:30 +0000 (22:04 -0300)]
edac/Kconfig: edac_mce can't be module
Since mcelog is bool, edac_mce glue should also be bool, or otherwise
will not work.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Thu, 23 Jul 2009 09:57:45 +0000 (06:57 -0300)]
edac_mce: Add an interface driver to report mce errors via edac
edac_mce module is an interface module that gets mcelog data and
forwards to any registered edac module that expects to receive data via
mce.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:31 +0000 (22:48 -0300)]
i7core_edac: CodingStyle fixes
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:31 +0000 (22:48 -0300)]
i7core_edac: fill csrows edac sysfs info
csrows is still fake, since we can't identify its representation with
Nehalem registers.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:31 +0000 (22:48 -0300)]
i7core_edac: Memory info fixes and preparation for properly filling cswrow data
Now, memory size is properly displayed:
EDAC i7core: DOD Max limits: DIMMS: 2, 1-ranked, 8-banked
EDAC i7core: DOD Max rows x colums = 0x4000 x 0x400
EDAC i7core: Memory channel configuration:
EDAC i7core: Ch0 phy rd0, wr0 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: dimm 1 (0x00001288) 1024 Mb offset: 4, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: Ch1 phy rd1, wr1 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: Ch2 phy rd3, wr3 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
Still, as the way to retrieve csrows info is not known, it does a
mapping of what's available to csrows basic unit at edac core.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:30 +0000 (22:48 -0300)]
i7core_edac: Get more info about the memory DIMMs
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:30 +0000 (22:48 -0300)]
i7core_edac: Add more information about each active dimm
Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:30 +0000 (22:48 -0300)]
i7core_edac: Improve error handling
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:30 +0000 (22:48 -0300)]
i7core_edac: Properly fill struct csrow_info
Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:30 +0000 (22:48 -0300)]
i7core_edac: Add additional tests for error detection
Properly check the number of channels and improve probing error detection
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:29 +0000 (22:48 -0300)]
i7core_edac: Add a memory check routine, based on device 3 function 4
This function appears only on Xeon 5500 datasheet. Yet, testing with a
Xeon 3503 showed that this is also implemented on other Nehalem
processors.
At the first read, MC_TEST_ERR_RCV1 and MC_TEST_ERR_RCV0 can contain any
value. Modify CE error logic to update the error count only after the
second read.
An alternative approach would be to do a write at rcv0 and rcv1
registers, but it seemed better to keep they untouched, since BIOS might
eventually assume that they are exclusive for their usage.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:29 +0000 (22:48 -0300)]
i7core_edac: need mci->edac_check, otherwise module removal doesn't work
There are some locking troubles with edac_core: if you don't declare an
edac_check, module may suffer from soft lock.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:29 +0000 (22:48 -0300)]
i7core_edac: A few fixes at error injection code
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:29 +0000 (22:48 -0300)]
i7core_edac: Show read/write virtual/physical channel association
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:29 +0000 (22:48 -0300)]
i7core_edac: Registers all supported MC functions
Now, it will try to register on all supported Memory Controller
functions.
It should be noticed that dev3, function 2 is present only on chips with
Registered DIMM's, according to the datasheet. So, the driver doesn't
return -ENODEV is all functions but this one were successfully
registered and enabled:
EDAC i7core: Registered device 8086:2c18 fn=3 0
EDAC i7core: Registered device 8086:2c19 fn=3 1
EDAC i7core: Device not found: PCI ID 8086:2c1a (dev 3, func 2)
EDAC i7core: Registered device 8086:2c1c fn=3 4
EDAC i7core: Registered device 8086:2c20 fn=4 0
EDAC i7core: Registered device 8086:2c21 fn=4 1
EDAC i7core: Registered device 8086:2c22 fn=4 2
EDAC i7core: Registered device 8086:2c23 fn=4 3
EDAC i7core: Registered device 8086:2c28 fn=5 0
EDAC i7core: Registered device 8086:2c29 fn=5 1
EDAC i7core: Registered device 8086:2c2a fn=5 2
EDAC i7core: Registered device 8086:2c2b fn=5 3
EDAC i7core: Registered device 8086:2c30 fn=6 0
EDAC i7core: Registered device 8086:2c31 fn=6 1
EDAC i7core: Registered device 8086:2c32 fn=6 2
EDAC i7core: Registered device 8086:2c33 fn=6 3
EDAC i7core: Driver loaded.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:29 +0000 (22:48 -0300)]
i7core_edac: Add more status functions to EDAC driver
This patch were co-authored with Aristeu Rozanski.
Signed-off-by: Aristeu Sergio <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:48:28 +0000 (22:48 -0300)]
i7core_edac: Add error insertion code for Nehalem
Implements set_inject_error() with the low-level code needed to inject
memory errors at Nehalem, and adds some sysfs nodes to allow error injection
The next patch will add an API for error injection.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Tue, 23 Jun 2009 01:41:15 +0000 (22:41 -0300)]
i7core_edac: Add an EDAC memory controller driver for Nehalem chipsets
This driver is meant to support i7 core/i7core extreme desktop
processors and Xeon 35xx/55xx series with integrated memory controller.
It is likely that it can be expanded in the future to work with other
processor series based at the same Memory Controller design.
For now, it has just a few MCH status reads.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Linus Torvalds [Fri, 30 Apr 2010 03:02:05 +0000 (20:02 -0700)]
Linux 2.6.34-rc6
Linus Torvalds [Fri, 30 Apr 2010 03:01:42 +0000 (20:01 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: don't needlessly skip PAGE_USER test for Fsl booke
Linus Torvalds [Fri, 30 Apr 2010 02:49:34 +0000 (19:49 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: add a shrinker to background inode reclaim
Wufei [Wed, 28 Apr 2010 21:42:32 +0000 (17:42 -0400)]
kgdb: don't needlessly skip PAGE_USER test for Fsl booke
The bypassing of this test is a leftover from 2.4 vintage
kernels, and is no longer appropriate, or even used by KGDB.
Currently KGDB uses probe_kernel_write() for all access to
memory via the KGDB core, so it can simply be deleted.
This fixes CVE-2010-1446.
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Wufei <fei.wu@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Linus Torvalds [Fri, 30 Apr 2010 00:18:07 +0000 (17:18 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
exofs: Fix "add bdi backing to mount session" fall out
fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK
Linus Torvalds [Fri, 30 Apr 2010 00:17:35 +0000 (17:17 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: 6061/1: PL061 GPIO: Bug fix - setting gpio for HIGH_LEVEL interrupt is not working.
ARM: 5957/1: ARM: RealView SD/MMC Card detection and write-protect using GPIOLIB
ARM: 6030/1: KS8695: enable console
ARM: 6060/1: PL061 GPIO: Setting gpio val after changing direction to OUT.
ARM: 6059/1: PL061 GPIO: Changing *_irq_chip_data with *_irq_data for real irqs.
ARM: 6023/1: update bcmring_defconfig to latest version and fix build error
ARM: fix build error in arch/arm/kernel/process.c
Linus Torvalds [Fri, 30 Apr 2010 00:16:36 +0000 (17:16 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/ps3: Update ps3_defconfig
powerpc/ps3: Update platform maintainer
powerpc/pseries: Flush lazy kernel mappings after unplug operations
powerpc/numa: Add form 1 NUMA affinity
powerpc/fsl-booke: Fix CONFIG_RELOCATABLE support on FSL Book-E ppc32
powerpc: 2.6.34 update of defconfigs for embedded 6xx/7xxx, 8xx, 8xxx
powerpc/mpc8xxx defconfigs - turn off SYSFS_DEPRECATED
powerpc/83xx: configure SIL SATA driver in 83xx-wide defconfig
powerpc/83xx: enable EPOLL syscall in defconfig
powerpc/83xx: add RTC drivers in 83xx defconfig
powerpc/fsl-cpm: Configure clock correctly for SCC
powerpc/fsl_booke: Correct test for MMU_FTR_BIG_PHYS
powerpc/85xx/86xx: Fix build w/ CONFIG_PCI=n
viresh kumar [Thu, 29 Apr 2010 11:22:52 +0000 (12:22 +0100)]
ARM: 6061/1: PL061 GPIO: Bug fix - setting gpio for HIGH_LEVEL interrupt is not working.
In current implementation of PL061, setting type of irq to HIGH_LEVEL is not
working. This patch fixes this bug.
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Acked-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Dave Chinner [Wed, 28 Apr 2010 23:55:50 +0000 (09:55 +1000)]
xfs: add a shrinker to background inode reclaim
On low memory boxes or those with highmem, kernel can OOM before the
background reclaims inodes via xfssyncd. Add a shrinker to run inode
reclaim so that it inode reclaim is expedited when memory is low.
This is more complex than it needs to be because the VM folk don't
want a context added to the shrinker infrastructure. Hence we need
to add a global list of XFS mount structures so the shrinker can
traverse them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Boaz Harrosh [Thu, 29 Apr 2010 18:35:29 +0000 (20:35 +0200)]
exofs: Fix "add bdi backing to mount session" fall out
The patch: add bdi backing to mount session
(
b3d0ab7e60d1865bb6f6a79a77aaba22f2543236)
Has a bug in the placement of the bdi member at
struct exofs_sb_info. The layout member must be kept
last.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Thu, 29 Apr 2010 18:33:35 +0000 (20:33 +0200)]
fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK
When CONFIG_BLOCK is set, it ends up getting backing-dev.h included.
But for !CONFIG_BLOCK, it isn't so lucky. The proper thing to do is
include <linux/backing-dev.h> directly from the file it's used from,
so do that.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Linus Torvalds [Thu, 29 Apr 2010 17:23:44 +0000 (10:23 -0700)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
nfs: fix some issues in nfs41_proc_reclaim_complete()
NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
NFS: Fix an unstable write data integrity race
nfs: testing for null instead of ERR_PTR()
NFS: rsize and wsize settings ignored on v4 mounts
NFSv4: Don't attempt an atomic open if the file is a mountpoint
SUNRPC: Fix a bug in rpcauth_prune_expired
Arnd Bergmann [Wed, 28 Apr 2010 12:36:41 +0000 (14:36 +0200)]
pktcdvd: improve BKL and compat_ioctl.c usage
The pktcdvd driver uses proper locking and does not need the BKL in the
ioctl and llseek functions of the character device, so kill both.
Moving the compat_ioctl handling from common code into the driver itself
fixes build problems when CONFIG_BLOCK is disabled.
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Boaz Harrosh [Thu, 29 Apr 2010 10:38:00 +0000 (13:38 +0300)]
exofs: Fix "add bdi backing to mount session" fall out
Commit
b3d0ab7e60d1865bb6f6a79a77aaba22f2543236 ("exofs: add bdi backing
to mount session") has a bug in the placement of the bdi member at
struct exofs_sb_info. The layout member must be kept last.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 29 Apr 2010 03:41:55 +0000 (20:41 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: Disable large pages on CPUs with Atom erratum AAE44
x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero
x86, mrst: Conditionally register cpu hotplug notifier for apbt
Linus Torvalds [Thu, 29 Apr 2010 03:40:17 +0000 (20:40 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/PCI: compute Address Space length rather than using _LEN
x86/PCI: never allocate PCI MMIO resources below BIOS_END
Al Viro [Thu, 29 Apr 2010 02:10:43 +0000 (03:10 +0100)]
nfs d_revalidate() is too trigger-happy with d_drop()
If dentry found stale happens to be a root of disconnected tree, we
can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
would simply hide it from shrink_dcache_for_umount(), leading to
all sorts of fun, including busy inodes on umount and oopsen after
that.
Bug had been there since at least 2006 (commit c636eb already has it),
so it's definitely -stable fodder.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Tuckley [Wed, 24 Feb 2010 14:23:10 +0000 (15:23 +0100)]
ARM: 5957/1: ARM: RealView SD/MMC Card detection and write-protect using GPIOLIB
The switch to using GPIOLIB broke the sd/mmc card detection on the
RealView development boards if GPIO_PL061 was not selected.
This patch selects GPIO_PL061 if GPIOLIB is selected.
The sense of the return value from mmc_status has also changed
and is corrected.
Signed-off-by: Colin Tuckley <colin.tuckley@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>