generated from njzjz/python-template
-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
bugSomething isn't workingSomething isn't working
Description
2025-04-13T15:36:49.5424673Z Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
2025-04-13T15:36:49.5424939Z Neighbor list info ...
2025-04-13T15:36:49.5425161Z update: every = 10 steps, delay = 0 steps, check = no
2025-04-13T15:36:49.5425394Z max neighbors/atom: 2000, page size: 100000
2025-04-13T15:36:49.5427367Z master list distance cutoff = 14
2025-04-13T15:36:49.5427584Z ghost atom cutoff = 14
2025-04-13T15:36:49.5427980Z binsize = 7, bins = 2 2 2
2025-04-13T15:36:49.5428205Z 1 neighbor lists, perpetual/occasional/extra = 1 0 0
2025-04-13T15:36:49.5428520Z (1) pair deepmd, perpetual
2025-04-13T15:36:49.5430631Z attributes: full, newton on
2025-04-13T15:36:49.5430875Z pair build: full/bin/atomonly
2025-04-13T15:36:49.5431090Z stencil: full/bin/3d
2025-04-13T15:36:49.5431291Z bin: standard
2025-04-13T15:36:49.5433231Z Setting up Verlet run ...
2025-04-13T15:36:49.5433485Z Unit style : metal
2025-04-13T15:36:49.5433638Z Current step : 0
2025-04-13T15:36:49.5433837Z Time step : 0.0005
2025-04-13T15:36:49.8125367Z ERROR on proc 0: DeePMD-kit C API Error: DeePMD-kit Error: DeePMD-kit PyTorch backend error: The following operation failed in the TorchScript interpreter.
2025-04-13T15:36:49.8128848Z Traceback of TorchScript, serialized code (most recent call last):
2025-04-13T15:36:49.8131041Z File "code/__torch__/deepmd_gnn/mace.py", line 85, in forward_lower
2025-04-13T15:36:49.8131549Z else:
2025-04-13T15:36:49.8133884Z nlist0 = nlist
2025-04-13T15:36:49.8134218Z model_ret = (self).forward_lower_common(nloc, extended_coord, extended_atype, nlist0, mapping0, fparam, aparam, do_atomic_virial, comm_dict, )
2025-04-13T15:36:49.8136852Z ~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
2025-04-13T15:36:49.8137405Z model_predict = annotate(Dict[str, Tensor], {})
2025-04-13T15:36:49.8139507Z torch._set_item(model_predict, "atom_energy", model_ret["energy"])
2025-04-13T15:36:49.8139924Z File "code/__torch__/deepmd_gnn/mace.py", line 206, in forward_lower_common
2025-04-13T15:36:49.8140252Z mapping_ff = torch.add(_33, torch.reshape(_35, [-1]))
2025-04-13T15:36:49.8140588Z _36 = annotate(List[Optional[Tensor]], [mapping_ff])
2025-04-13T15:36:49.8142773Z _37 = torch.index(extended_coord_ff0, _36)
2025-04-13T15:36:49.8143138Z ~~~~~~~~~~~ <--- HERE
2025-04-13T15:36:49.8143368Z shifts_atoms = torch.sub(extended_coord_ff0, _37)
2025-04-13T15:36:49.8143643Z _38 = annotate(List[Optional[Tensor]], [torch.select(edge_index0, 0, 1)])
2025-04-13T15:36:49.8145740Z
2025-04-13T15:36:49.8147219Z Traceback of TorchScript, original code (most recent call last):
2025-04-13T15:36:49.8150531Z File "$PREFIX/lib/python3.12/site-packages/deepmd_gnn/mace.py", line 550, in forward_lower
2025-04-13T15:36:49.8150843Z )
2025-04-13T15:36:49.8151012Z
2025-04-13T15:36:49.8153852Z model_ret = self.forward_lower_common(
2025-04-13T15:36:49.8154142Z ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
2025-04-13T15:36:49.8154358Z nloc,
2025-04-13T15:36:49.8154560Z extended_coord,
2025-04-13T15:36:49.8157071Z File "$PREFIX/lib/python3.12/site-packages/deepmd_gnn/mace.py", line 653, in forward_lower_common
2025-04-13T15:36:49.8157538Z device=mapping.device,
2025-04-13T15:36:49.8158512Z ).unsqueeze(-1).expand(nf, nall).reshape(-1)
2025-04-13T15:36:49.8159474Z shifts_atoms = extended_coord_ff - extended_coord_ff[mapping_ff]
2025-04-13T15:36:49.8163080Z ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
2025-04-13T15:36:49.8163505Z shifts = shifts_atoms[edge_index[1]] - shifts_atoms[edge_index[0]]
2025-04-13T15:36:49.8164291Z edge_index = mapping_ff[edge_index]
2025-04-13T15:36:49.8165097Z RuntimeError: index 139971608205680 is out of bounds for dimension 0 with size 6661 (/home/conda/feedstock_root/build_artifacts/deepmd-kit_1740896715911/work/source/lmp/pair_deepmd.cpp:252)
2025-04-13T15:36:49.8167387Z Last command: run 1
2025-04-13T15:36:49.8167995Z --------------------------------------------------------------------------
2025-04-13T15:36:49.8168691Z MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
2025-04-13T15:36:49.8171254Z Proc: [[14161,1],0]
2025-04-13T15:36:49.8171750Z Errorcode: 1
2025-04-13T15:36:49.8171831Z
2025-04-13T15:36:49.8172055Z NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
2025-04-13T15:36:49.8172307Z You may or may not see output from other processes, depending on
2025-04-13T15:36:49.8174359Z exactly when Open MPI kills them.
2025-04-13T15:36:49.8174651Z --------------------------------------------------------------------------
2025-04-13T15:36:50.0867200Z --------------------------------------------------------------------------
2025-04-13T15:36:50.0868906Z prterun has exited due to process rank 0 with PID 0 on node f026eeb12269 calling
2025-04-13T15:36:50.0874438Z "abort". This may have caused other processes in the application to be
2025-04-13T15:36:50.0875069Z terminated by signals sent by prterun (as reported here).
2025-04-13T15:36:50.0875361Z --------------------------------------------------------------------------
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working