staging: lustre: ptlrpc: replay bulk request
authorwang di <di.wang@intel.com>
Thu, 27 Oct 2016 22:11:51 +0000 (18:11 -0400)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Sun, 30 Oct 2016 15:00:11 +0000 (11:00 -0400)
Even though the server might already got the bulk
replay request, but bulk transfer timeout, let's
replay the bulk request, i.e. treat such replay as
same as no replied replay request (See
ptlrpc_replay_interpret()).

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6924
Reviewed-on: http://review.whamcloud.com/15793
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/staging/lustre/lustre/ptlrpc/client.c

index e4fbdd0d0720b37adc0a94d675ef5dde55f3c1d3..bda925ed5294a55f12176248467858102dca5ea0 100644 (file)
@@ -2762,8 +2762,15 @@ static int ptlrpc_replay_interpret(const struct lu_env *env,
 
        atomic_dec(&imp->imp_replay_inflight);
 
-       if (!ptlrpc_client_replied(req)) {
-               CERROR("request replay timed out, restarting recovery\n");
+       /*
+        * Note: if it is bulk replay (MDS-MDS replay), then even if
+        * server got the request, but bulk transfer timeout, let's
+        * replay the bulk req again
+        */
+       if (!ptlrpc_client_replied(req) ||
+           (req->rq_bulk &&
+            lustre_msg_get_status(req->rq_repmsg) == -ETIMEDOUT)) {
+               DEBUG_REQ(D_ERROR, req, "request replay timed out.\n");
                rc = -ETIMEDOUT;
                goto out;
        }