-
Notifications
You must be signed in to change notification settings - Fork 2
WIP Add Robust Multi-GPU and Cross-Device Support for PyIsolate Benchmarks #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ome you can still pick cpu only testing
… array writable before converting to a tensor
…ages are used; codebase ruff clean
…ding xpu for Intel GPUs)
… loop to always include CPU in the results, regardless of GPU presence.
|
Hi @billybasass, thanks for this PR! I've completed a review of your multi-GPU support implementation. The Good: Your multi-vendor GPU support (NVIDIA/AMD/Intel) is excellent and the device-aware tensor serialization is well-architected. The comprehensive Critical Issue: Unfortunately, the PR currently breaks the core library by adding mandatory Required Changes:
Next Steps: The GPU features themselves are valuable and well-implemented. Once the import issues are resolved to maintain backward compatibility, this will I've prepared detailed feedback documents that I'll share separately. Please let me know if you need any clarification on the architectural changes needed to |
… added back pre commit config, and clean venv handling
…d environment setup is now handled in main(). run_benchmarks() is now device-agnostic and will not throw UnboundLocalError
|
Thank you Kevin, Sorry for these mistakes, after trying some suggested changes I had to revert back to my first plan and it got a little messy. I have made the changes you asked for here and in your email. Here is a list of changes, and let me know if I missed something or need to add more. Changes were retested on Intel and NVidia systems with multiple GPUs. PyIsolate Benchmark & Core Library: Summary of Changes
|
PR Review Summary - v3Status: ✅ Approved for Core Team Review (Domain Expert Validation Required) For the ContributorGood work, @billybasass! You've addressed the critical architectural issues from the previous review. The core library import problems are resolved, and the GPU functionality is now properly separated into an optional module. The core team will conduct validation of the benchmarking components to ensure they align with the project's specific technical requirements. Technical Summary
Review HighlightsArchitectural Changes:
Domain Expert Review Required:
Code Quality:
Next StepsThe core team will conduct validation of the benchmarking implementation. Your multi-GPU architectural work provides a foundation for the core team to validate and build upon. Thank you! |
guill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number of changes requested. Put into broad categories, they are generally related to:
- Type hint warnings that have been suppressed but should probably just be fixed to not need suppression.
- Silently falling back to CPU-only rather than failing a test if we e.g. are trying to test CUDA but don't have CUDA installed.
- Various one-off changes.
…irectly uses test_instance.manager.
…tel XPU and AMD ROCm GPUs, not just CUDA.Fixed unresolved import for ExampleExtensionBase by importing from example.shared.
|
Thank you for the great reply @guill, some of the unneeded things there left over for when I tried a different approach you suggested and I thought I got rid of (working under the weather) thank you for finding them. There is more than one place you did not know why I was using type ignore, it was because somethings were annotated and some not annotated so Python treats it as dynamic, and no error is raise, I have fixed this. You asked what ruff I used, that I updated to the lastest before running. For the hardcoding of pytorch when it comes to Intel GPUs. This should only happen with Intel GPUs to use pytorch 2.7 or up and is needed because Intel was not fully integrated with pytorch tell 2.7. I did ask Bob Duffy from Intel about this and we will probably have problems if we let it run lower. Here is a summary of changes I made after checking every line you pointed out.
|
… decoupling (comfy decoupling part 1) ## PR Comfy-Org#3 Review Comments Addressed (38/38) ### Hardcoded Paths Removed (Comfy-Org#2, Comfy-Org#3, #13, #14, #36) - Remove /home/johnj references from tests and documentation - Use COMFYUI_ROOT env var with $HOME/ComfyUI fallback in conftest.py - Delete SETUP_GUIDE.md; consolidate in README.md - Add comfy_hello_world/node-venvs/ to .gitignore ### Error Messages & Documentation (Comfy-Org#1, #5, #7, #8) - Improve site-packages error with host_prefix, valid_parents, candidates - Restore editable install comment explaining -e flag behavior - Add 50-line architecture docstring to shared.py documenting three-layer serialization (Layer 1: _prepare_for_rpc, Layer 2: serialize_for_isolation, Layer 3: RPC envelope) - Clarify zero-copy: CUDA IPC on Linux keeps tensors on-device; otherwise CPU shared memory fallback ### Fail-Loud Pattern (Comfy-Org#4, #21, #22, #23, #24, #25, #26, #27, #29, #33, #35) - Restore try/except in client.py with fail-loud logging - Replace hasattr duck-typing with exact type_name matching for CLIP/VAE - Delete unused move_tensors_to_device function (VRAM leak risk) - Remove ALL truncation/defensive code; raise RuntimeError when objects lack _instance_id instead of silently returning opaque repr - Remove module allowlists entirely (ALLOWED_MODULES patterns eliminated) - _tensor_to_cuda now passthrough; no auto-migration of tensors to CUDA ### Code Clarity (#18, #19, #20, #28, #30, #31, #32, #34) - Remove "ComfyUI" hardcoded assumption; derive preferred_root from snapshot - Remove original_register_route replacement logic (simplified) - Add 5-line comment explaining .replace(".", "_x_") for Python identifiers - Restore singleton bypass comment (object.__new__ to prevent recursion) - Confirm handling_call_id IS read for parent_call_id tracking - Rename _tensor_to_cpu → _prepare_for_rpc (12 occurrences) - Add RPC message protocol comments (RPCRequest/RPCCallback/RPCResponse) - Add else clause raising ValueError for unknown RPC request kinds ### Code Quality (#15, #16, #17, #37, #38) - Move get_torch_ecosystem_packages() to _internal/torch_utils.py - Propagate only host env vars (no defaults added) - already correct - Ensure ruff/mypy pass with zero errors - Compare BOTH fingerprint AND full descriptor before skipping install ## New Features ### Adapter Decoupling Architecture - NEW: pyisolate/interfaces.py - IsolationAdapter protocol + SerializerRegistryProtocol - NEW: pyisolate/_internal/bootstrap.py - Child process bootstrap resolving "config before path" paradox - NEW: pyisolate/_internal/loader.py - Entry point discovery for adapters (pyisolate.adapters group) - NEW: pyisolate/_internal/serialization_registry.py - Dynamic serializer registry with O(1) lookup - host.py: build_extension_snapshot() integrates adapter metadata - client.py: bootstrap_child() applies snapshot before heavy imports ### Test Coverage ~40% → 96% - NEW: 23 test files added - tests/test_bootstrap.py, test_bootstrap_additional.py - tests/test_loader.py, test_loader_additional.py - tests/test_serialization_registry.py - tests/test_model_serialization.py, test_model_serialization_additional.py - tests/test_shared_unit.py, test_shared_additional.py, test_shared_rpc_flows.py - tests/test_host_unit.py, test_host_additional.py, test_host_public.py, test_host_internal_ext.py, test_host_integration.py - tests/test_client_entrypoint_extra.py - tests/test_remaining_coverage.py (comprehensive edge cases) - tests/test_fail_loud.py - Reorganize integration tests to tests/integration/ ### Type Annotations - Add type hints to AttrDict, AttributeContainer, AsyncRPC - Add __all__ export list to host.py - Use dict[str, Any] instead of Dict for modern Python
This PR adds, cross-vendor GPU support to the PyIsolate benchmarking suite, enabling seamless benchmarking on NVIDIA (CUDA), AMD (ROCm), and Intel (XPU/oneAPI) GPUs, as well as CPU. It also improves device selection, serialization, and reporting for all supported platforms.
Key Features & Changes
Multi-GPU Device Selection:
Detects all available GPUs (NVIDIA, AMD, Intel) in the system.
Prompts the user to select which GPU to use if more than one is present.
Prints the name of the selected device at the start of each benchmark run.
Device-Aware Tensor Serialization:
CPU and CUDA: Uses PyTorch’s native (zero-copy) serialization for inter-process communication.
Intel XPU: Implements a fallback serialization for XPU tensors:
XPU tensors are moved to CPU, serialized as a buffer (with shape and dtype), sent through the multiprocessing queue, and reconstructed as XPU tensors on the receiving side.
This avoids PyCapsule/pickle errors and ensures all tests run on Intel GPUs, even though the transfer is not zero-copy.
All benchmarks (including shared-torch and RPC) now run on Intel GPUs.
Benchmark Script Enhancements:
Benchmarks are run on all available backends (CPU, CUDA, XPU).
Status messages and results include the backend and device name for clarity and reproducibility.
Extension Venv and Dependency Handling:
Extension venvs now install the correct PyTorch wheel and index for each backend.
Ensures correct torch version and index for Intel GPUs, and removes invalid version specifiers.
Linter and Syntax Fixes:
Fixed all linter and syntax errors, including stray else: statements and indentation issues.
Codebase is now linter-clean and CI-ready.
Motivation
Enable robust benchmarking and extension support across all major GPU vendors.
Ensure that all tests run on all supported hardware, with clear device selection and reporting.
Provide a future-proof serialization mechanism for device tensors, especially for Intel XPU, until native support is available in PyTorch.
Testing
Benchmarks were run on systems with NVIDIA, AMD, and Intel GPUs.
All tests pass, and results are correctly reported for each device.
No PyCapsule/pickle errors on Intel GPUs.
Closes Issues
Fixes device selection and reporting for multi-GPU systems.
Fixes XPU tensor serialization for multiprocessing.
Fixes linter and syntax errors in the codebase.
Ready for review!