Skip to content

edge_index out of bounds #58

@njzjz

Description

@njzjz
2025-04-13T15:36:49.5424673Z Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
2025-04-13T15:36:49.5424939Z Neighbor list info ...
2025-04-13T15:36:49.5425161Z   update: every = 10 steps, delay = 0 steps, check = no
2025-04-13T15:36:49.5425394Z   max neighbors/atom: 2000, page size: 100000
2025-04-13T15:36:49.5427367Z   master list distance cutoff = 14
2025-04-13T15:36:49.5427584Z   ghost atom cutoff = 14
2025-04-13T15:36:49.5427980Z   binsize = 7, bins = 2 2 2
2025-04-13T15:36:49.5428205Z   1 neighbor lists, perpetual/occasional/extra = 1 0 0
2025-04-13T15:36:49.5428520Z   (1) pair deepmd, perpetual
2025-04-13T15:36:49.5430631Z       attributes: full, newton on
2025-04-13T15:36:49.5430875Z       pair build: full/bin/atomonly
2025-04-13T15:36:49.5431090Z       stencil: full/bin/3d
2025-04-13T15:36:49.5431291Z       bin: standard
2025-04-13T15:36:49.5433231Z Setting up Verlet run ...
2025-04-13T15:36:49.5433485Z   Unit style    : metal
2025-04-13T15:36:49.5433638Z   Current step  : 0
2025-04-13T15:36:49.5433837Z   Time step     : 0.0005
2025-04-13T15:36:49.8125367Z ERROR on proc 0: DeePMD-kit C API Error: DeePMD-kit Error: DeePMD-kit PyTorch backend error: The following operation failed in the TorchScript interpreter.
2025-04-13T15:36:49.8128848Z Traceback of TorchScript, serialized code (most recent call last):
2025-04-13T15:36:49.8131041Z   File "code/__torch__/deepmd_gnn/mace.py", line 85, in forward_lower
2025-04-13T15:36:49.8131549Z     else:
2025-04-13T15:36:49.8133884Z       nlist0 = nlist
2025-04-13T15:36:49.8134218Z     model_ret = (self).forward_lower_common(nloc, extended_coord, extended_atype, nlist0, mapping0, fparam, aparam, do_atomic_virial, comm_dict, )
2025-04-13T15:36:49.8136852Z                  ~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
2025-04-13T15:36:49.8137405Z     model_predict = annotate(Dict[str, Tensor], {})
2025-04-13T15:36:49.8139507Z     torch._set_item(model_predict, "atom_energy", model_ret["energy"])
2025-04-13T15:36:49.8139924Z   File "code/__torch__/deepmd_gnn/mace.py", line 206, in forward_lower_common
2025-04-13T15:36:49.8140252Z       mapping_ff = torch.add(_33, torch.reshape(_35, [-1]))
2025-04-13T15:36:49.8140588Z       _36 = annotate(List[Optional[Tensor]], [mapping_ff])
2025-04-13T15:36:49.8142773Z       _37 = torch.index(extended_coord_ff0, _36)
2025-04-13T15:36:49.8143138Z             ~~~~~~~~~~~ <--- HERE
2025-04-13T15:36:49.8143368Z       shifts_atoms = torch.sub(extended_coord_ff0, _37)
2025-04-13T15:36:49.8143643Z       _38 = annotate(List[Optional[Tensor]], [torch.select(edge_index0, 0, 1)])
2025-04-13T15:36:49.8145740Z 
2025-04-13T15:36:49.8147219Z Traceback of TorchScript, original code (most recent call last):
2025-04-13T15:36:49.8150531Z   File "$PREFIX/lib/python3.12/site-packages/deepmd_gnn/mace.py", line 550, in forward_lower
2025-04-13T15:36:49.8150843Z             )
2025-04-13T15:36:49.8151012Z     
2025-04-13T15:36:49.8153852Z         model_ret = self.forward_lower_common(
2025-04-13T15:36:49.8154142Z                     ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
2025-04-13T15:36:49.8154358Z             nloc,
2025-04-13T15:36:49.8154560Z             extended_coord,
2025-04-13T15:36:49.8157071Z   File "$PREFIX/lib/python3.12/site-packages/deepmd_gnn/mace.py", line 653, in forward_lower_common
2025-04-13T15:36:49.8157538Z                 device=mapping.device,
2025-04-13T15:36:49.8158512Z             ).unsqueeze(-1).expand(nf, nall).reshape(-1)
2025-04-13T15:36:49.8159474Z             shifts_atoms = extended_coord_ff - extended_coord_ff[mapping_ff]
2025-04-13T15:36:49.8163080Z                                                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
2025-04-13T15:36:49.8163505Z             shifts = shifts_atoms[edge_index[1]] - shifts_atoms[edge_index[0]]
2025-04-13T15:36:49.8164291Z             edge_index = mapping_ff[edge_index]
2025-04-13T15:36:49.8165097Z RuntimeError: index 139971608205680 is out of bounds for dimension 0 with size 6661 (/home/conda/feedstock_root/build_artifacts/deepmd-kit_1740896715911/work/source/lmp/pair_deepmd.cpp:252)
2025-04-13T15:36:49.8167387Z Last command: run             1
2025-04-13T15:36:49.8167995Z --------------------------------------------------------------------------
2025-04-13T15:36:49.8168691Z MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
2025-04-13T15:36:49.8171254Z   Proc: [[14161,1],0]
2025-04-13T15:36:49.8171750Z   Errorcode: 1
2025-04-13T15:36:49.8171831Z 
2025-04-13T15:36:49.8172055Z NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
2025-04-13T15:36:49.8172307Z You may or may not see output from other processes, depending on
2025-04-13T15:36:49.8174359Z exactly when Open MPI kills them.
2025-04-13T15:36:49.8174651Z --------------------------------------------------------------------------
2025-04-13T15:36:50.0867200Z --------------------------------------------------------------------------
2025-04-13T15:36:50.0868906Z prterun has exited due to process rank 0 with PID 0 on node f026eeb12269 calling
2025-04-13T15:36:50.0874438Z "abort". This may have caused other processes in the application to be
2025-04-13T15:36:50.0875069Z terminated by signals sent by prterun (as reported here).
2025-04-13T15:36:50.0875361Z --------------------------------------------------------------------------

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions