Hybrid execution time predictor to solve extrapolation issues#52
Hybrid execution time predictor to solve extrapolation issues#52yl3469 wants to merge 1 commit intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree |
|
@nitinkedia7 please review this PR, it introduces an important fix |
nitinkedia7
left a comment
There was a problem hiding this comment.
Hi @yl3469, the hybrid predictor really helps when the simulator is used for predictions beyond the min and max datapoint. Most of my comments are with organizing the code. The core logic looks fine. It would be great if you can fix these. Thanks!
|
|
||
|
|
||
| @dataclass | ||
| class HybridForestExecutionTimePredictorConfig(BaseExecutionTimePredictorConfig): |
There was a problem hiding this comment.
Is it possible for HybridForestExecutionTimePredictorConfig to extend both RandomForrestExecutionTimePredictorConfig and LinearRegressionExecutionTimePredictorConfig?
That way we'll need to specify only the extra parameters which are mainly dealing with when to switch between models.
If this is not possible, one way is to write the parameters in the order below:
# Random Forrest parameters
...
# Linear Regression parameters
...
# Hybrid parameters
...
| default_factory=lambda: [(1, 99)], | ||
| metadata={"help": "Percentile range to use if boundary_method='percentile'."}, | ||
| ) | ||
| # rf_params : dict = field( |
There was a problem hiding this comment.
Please remove unused parameters.
| @property | ||
| def event_type(self): | ||
| return self._event_type | ||
| pass |
| ) | ||
|
|
||
| ExecutionTimePredictorRegistry.register( | ||
| ExecutionTimePredictorType.HYBRID_FOREST, HybridForestExecutionTimePredictor |
There was a problem hiding this comment.
The current naming HYBRID_FOREST doesn't mention the use of Polynomial Regression. A better name would be HYBRID_RF_PR.
|
|
||
| Parameters | ||
| ---------- | ||
| boundary_method : str, default='auto' |
There was a problem hiding this comment.
Please move the descriptions to inline config.py itself to have a single source of truth.
| Whether to fit an intercept in the LinearRegression model. | ||
|
|
||
| """ | ||
| fit_intercept = True |
There was a problem hiding this comment.
We do not need each parameter repeated here and in the arguments of __init__ again. This class could accept an object of class HybridForestExecutionTimePredictorConfig itself and store it in self._config.
Similarly, default values of the parameters repeated here and again __init__ function creates 3 sources which might start to mismatch. Please have default values of the parameters in config.py only.
| Returns self. | ||
| """ | ||
| # Initialize models | ||
| self.rf_model_ = RandomForestRegressor( |
There was a problem hiding this comment.
The codebase uses convention self._rf_model instead of self.rf_model_. Please follow this in other variable naming too.
| self.rf_model_.fit(X, y) | ||
|
|
||
| # For lower boundary extrapolation | ||
| if isinstance(X, np.ndarray): |
There was a problem hiding this comment.
Is there a reason to use only the lower 20% of the data and upper 60% of the data. What if all the data is used here?
If it is indeed required to use partial data then let's have:
- 1 hyperparameter. For example, the lower uses < 50th percentile and the upper uses > 50th percentile.
- 2 hyperparameter.
In any case, please extract this constant to a variable inHybridForestExecutionTimePredictorConfig.
|
|
||
| # For lower boundary extrapolation | ||
| if isinstance(X, np.ndarray): | ||
| mask_lower = X_feature <= np.percentile(X_feature, 20) # Use bottom 20% of data |
There was a problem hiding this comment.
These if else block is very repititive and can be condensed.
| mask_lower = X_feature <= np.percentile(X_feature, 20) # Use bottom 20% of data | |
| # For lower boundary extrapolation | |
| mask_lower = X_feature <= np.percentile(X_feature, 20) # Use bottom 20% of data | |
| X_lower = X[mask_lower] | |
| y_lower = y[mask_lower] if isinstance(X, np.ndarray) else y[mask_lower.to_list()] | |
| self._lr_model_lower.fit(X_lower, y_lower) | |
| # For upper boundary extrapolation | |
| mask_upper = X_feature >= np.percentile(X_feature, 60) # Use top 20% of data | |
| X_upper = X[mask_upper] | |
| y_lower = y[mask_upper] if isinstance(X, np.ndarray) else y[mask_upper.to_list()] | |
| self._lr_model_upper_.fit(X_upper, y_upper) |
| config_str = str(self.to_dict()) | ||
|
|
||
| if df is None: | ||
| # import pdb; pdb.set_trace() |
There was a problem hiding this comment.
Please remove these debugging comments and print statements from this file and other edited files in this PR.
|
Hi @yl3469 , I spent quite a bit of time with this PR because the core issue it aims to solve (of extrapolating to say higher context length) makes Vidur usable for points it hasn't been profiled. However, I found several critical issues with this implementation:
This PR remains compatible with the codebase after the recent big PR #56 too. However, the implementation needs to be made much more performant and few new parameters. |
The current default random forest predictor is overfitting on the training set and is not able to generalize the batch execution time prediction to unseen token numbers or batch sizes. For example, it would use the largest time needed for the largest num_tokens for any other extrapolation.
This causes inaccuracy of end-to-end latency, especially for long context sequences.
Prefill Latency (TTFT):
E2E Latency:
We verified that utilizing a polynomial fit for extrapolation along with a random forest for interpolation would yield the best MEAP results and bring them closest to the real system.