-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hello, I recently try to reproduce your wonderful work but meet a little problems. When I follow the instruction in your README to set up the environment using the following commands:
conda create -n infercept python=3.10
conda activate infercept
# clong your repository to infercept
cd infercept/
pip install -e .However, according to this issue, the auto-installed torch==2.6.0 and triton==3.2.0 can not run. So I change the requirements.txt to
ninja # For faster builds.
psutil
ray >= 2.5.1
pandas # Required for Ray data.
pyarrow # Required for Ray data.
sentencepiece # Required for LLaMA tokenizer.
numpy
torch == 2.0.1
transformers >= 4.33.1 # Required for Code Llama.
xformers >= 0.0.22
fastapi
uvicorn[standard]
pydantic < 2 # Required for OpenAI server.
gurobipy
rich
deepspeed == 0.12.3
deepspeed-kernels
which specifies the torch version to 2.0.1 (just wanna have a try).
Rebuild the environment and then try to use your AsyncLLMEngine class. However, there will be another error during the initialization of the engine:
ERROR 03-24 08:41:49 async_llm_engine.py:296] Failed to initialize async LLM engine: /root/InferCept/vllm/attention_ops.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c1021throwNullDataPtrErrorEv
Traceback (most recent call last):
File "bench_infercept.py", line 107, in <module>
llm_servers = setup_infercept(infercept_config)
File "/root/evaluation/infercept/setup_infercept.py", line 19, in setup_infercept
servers = [
File "/root/evaluation/infercept/setup_infercept.py", line 21, in <listcomp>
AsyncLLMEngine.from_engine_args(infercept_config.engine_args)
File "/root/InferCept/vllm/engine/async_llm_engine.py", line 564, in from_engine_args
engine = cls(engine_args.worker_use_ray,
File "/root/InferCept/vllm/engine/async_llm_engine.py", line 297, in __init__
raise e
File "/root/InferCept/vllm/engine/async_llm_engine.py", line 294, in __init__
self.engine = self._init_engine(*args, **kwargs)
File "/root/InferCept/vllm/engine/async_llm_engine.py", line 334, in _init_engine
return ray.get(ray.remote(num_cpus=0)(self._engine_class(*args, **kwargs)).remote())
File "/root/InferCept/vllm/engine/llm_engine.py", line 112, in __init__
self._init_workers_ray(placement_group)
File "/root/InferCept/vllm/engine/llm_engine.py", line 173, in _init_workers_ray
from vllm.worker.worker import Worker # pylint: disable=import-outside-toplevel
File "/root/InferCept/vllm/worker/worker.py", line 10, in <module>
from vllm.model_executor import get_model, InputMetadata, set_random_seed
File "/root/InferCept/vllm/model_executor/__init__.py", line 2, in <module>
from vllm.model_executor.model_loader import get_model
File "/root/InferCept/vllm/model_executor/model_loader.py", line 10, in <module>
from vllm.model_executor.models import * # pylint: disable=wildcard-import
File "/root/InferCept/vllm/model_executor/models/__init__.py", line 1, in <module>
from vllm.model_executor.models.aquila import AquilaForCausalLM
File "/root/InferCept/vllm/model_executor/models/aquila.py", line 35, in <module>
from vllm.model_executor.layers.attention import PagedAttentionWithRoPE
File "/root/InferCept/vllm/model_executor/layers/attention.py", line 10, in <module>
from vllm import attention_ops
ImportError: /root/InferCept/vllm/attention_ops.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c1021throwNullDataPtrErrorEvIt seems to be the problem that vllm version doesn't exactly match the torch version in the env. However, I can not find the exact version of torch in your repo. Thus I wanna bother you to give me a completed version of the requirements.txt. Thanks.