{"id":274,"hash":"2ee0ce0868b6256af1fe2db87b4f9b94aafec253062b9e15850225300600a016","pattern":"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! when resuming training","full_message":"I saved a checkpoint while training on gpu. After reloading the checkpoint and continue training I get the following error:\n\nTraceback (most recent call last):\n  File \"main.py\", line 140, in <module>\n    train(model,optimizer,train_loader,val_loader,criteria=args.criterion,epoch=epoch,batch=batch)\n  File \"main.py\", line 71, in train\n    optimizer.step()\n  File \"/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py\", line 26, in decorate_context\n    return func(*args, **kwargs)\n  File \"/opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py\", line 106, in step\n    buf.mul_(momentum).add_(d_p, alpha=1 - dampening)\nRuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\n\nMy training code is as follows:\n\ndef train(model,optimizer,train_loader,val_loader,criteria,epoch=0,batch=0):\n    batch_count = batch\n    if criteria == 'l1':\n        criterion = L1_imp_Loss()\n    elif criteria == 'l2':\n        criterion = L2_imp_Loss()\n    if args.gpu and torch.cuda.is_available():\n        model.cuda()\n        criterion = criterion.cuda()\n\n    print(f'{datetime.datetime.now().time().replace(microsecond=0)} Starting to train..')\n    \n    while epoch <= args.epochs-1:\n        print(f'********{datetime.datetime.now().time().replace(microsecond=0)} Epoch#: {epoch+1} / {args.epochs}')\n        model.train()\n        interval_loss, total_loss= 0,0\n        for i , (input,target) in enumerate(train_loader):\n            batch_count += 1\n            if args.gpu and torch.cuda.is_available():\n                input, target = input.cuda(), target.cuda()\n            input, target = input.float(), target.float()\n            pred = model(input)\n            loss = criterion(pred,target)\n            optimizer.zero_grad()\n            loss.backward()\n            optimizer.step()\n            ....\n\nThe saving process happened after finishing each epoch:\n\ntorch.save({'epoch': epoch,'batch':batch_count,'model_state_dict': model.state_dict(),'optimizer_state_dict':\n                    optimizer.state_dict(),'loss': total_loss/len(train_loader),'train_set':args.train_set,'val_set':args.val_set,'args':args}, f'{args.weights_dir}/FastDepth_Final.pth')\n\nI can't figure why I get this error.\nargs.gpu == True, and I'm passing the model, all data, and loss function to cuda, somehow there is still a tensor on cpu, could anyone figure out what's wrong?\n\nThanks.","ecosystem":"pypi","package_name":"deep-learning","package_version":null,"solution":"There might be an issue with the device parameters are on:\n\nIf you need to move a model to GPU via .cuda() , please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.\n\nIn general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used.","confidence":0.95,"source":"stackoverflow","source_url":"https://stackoverflow.com/questions/66091226/runtimeerror-expected-all-tensors-to-be-on-the-same-device-but-found-at-least","votes":116,"created_at":"2026-04-19T04:41:44.167930+00:00","updated_at":"2026-04-19T04:51:56.199328+00:00"}