x86/MCE: Correct TSC timestamping of error records
authorBorislav Petkov <bp@suse.de>
Thu, 10 Nov 2016 13:10:53 +0000 (14:10 +0100)
committerIngo Molnar <mingo@kernel.org>
Fri, 11 Nov 2016 07:08:24 +0000 (08:08 +0100)
commit54467353a96577f840cd2348981417c559b21b4b
treedc4a04116165c0f65dcce84ef5db9429fa67b51e
parentc09a8c40e0a0b4994925ac8eba91b85d76f440a3
x86/MCE: Correct TSC timestamping of error records

We did have logic in the MCE code which would TSC-timestamp an error
record only when it is exact - i.e., when it wasn't detected by polling.
This isn't the case anymore. So let's fix that:

We have a valid TSC timestamp in the error record only when it has been
a precise detection, i.e., either in the #MC handler or in one of the
interrupt handlers (thresholding, deferred, ...).

All other error records still have mce.time which contains the wall
time in order to be able to place the error record in time at least
approximately.

Also, this fixes another bug where machine_check_poll() would clear
mce.tsc unconditionally even if we requested precise MCP_TIMESTAMP
logging.

The proper fix would be to generate timestamp only when it has been
requested and not always. But that would require a more thorough code
audit of all mce_gather_info/mce_setup() users. Add a FIXME for now.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony <tony.luck@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: kernel test robot <xiaolong.ye@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkp@01.org
Link: http://lkml.kernel.org/r/20161110131053.kybsijfs5venpjnf@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
arch/x86/kernel/cpu/mcheck/mce.c
arch/x86/kernel/cpu/mcheck/mce_intel.c