spi: mpc512x: improve throughput in the RX/TX func
change the MPC512x SPI controller's transmission routine to increase
throughput: allow the RX byte counter to "lag behind" the TX byte
counter while iterating over the transfer's data, only wait for the
remaining RX bytes at the very end of the transfer
this approach eliminates delays in the milliseconds range, transfer
times for e.g. 16MB of SPI flash data dropped from 31s to 9s, correct
operation was tested by continuously transferring and comparing data
from an SPI flash (more than 200GB in some 45 hours)
background information on the motivation:
one might assume that all the RX data should have been received when the
TX data was sent, given the fact that we are the SPI master and provide
all of the clock, but in practise there's a difference
the ISR is triggered when the TX FIFO became empty, while transmission
of the last item still occurs (from the TX hold and shift registers),
sampling RX data on the opposite clock edge compared to the TX data adds
another delay (half a bit period), and RX data needs to propagate from
the reception buffer to the RX FIFO depending on the specific SoC
implementation
to cut it short: a difference between TX and RX byte counters during
transmission is not just acceptable but should be considered the regular
case, only the very end of the transfer needs to make sure that all of
the RX data was received before deasserting the chip select and telling
the caller that transmission has completed
Signed-off-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Mark Brown <broonie@linaro.org>