Skip to content

Implement dynamic dtype parameter selection based on model precision #57

@maryamtahhan

Description

@maryamtahhan

Description

Currently, the vLLM server is started with --dtype=bfloat16 hardcoded in multiple places. We need to implement logic that automatically selects the appropriate dtype based on the model's native precision.

Proposed Logic

If model is FP16:
  --dtype=bfloat16
Else:
  --dtype=auto

Implementation Locations

The dtype parameter is currently hardcoded in:

  1. Ansible playbooks:

    • automation/test-execution/ansible/roles/vllm_server/tasks/start-embedding.yml (lines 89, 117)
    • automation/test-execution/ansible/start-vllm-server.yml (lines 58, 84)
  2. Workload configurations:

    • automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml (multiple workload definitions)

Required Changes

  1. Add model precision detection logic to determine if a model is FP16
  2. Update vLLM server startup tasks to conditionally set dtype based on model precision
  3. Update workload configurations to support dynamic dtype selection
  4. Update any hardcoded --dtype=bfloat16 references to use the new logic

Documentation

Documentation has been updated to reflect this change:

  • docs/methodology/testing-phases.md - All three phases now document the conditional dtype logic
  • models/llm-models/model-matrix.yaml - Common parameters updated
  • automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml - Comments updated

Benefits

  • Improved model compatibility across different precision types
  • Better performance for non-FP16 models by using native precision
  • Maintains current bfloat16 behavior for FP16 models

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions