Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6606 +/- ##
==========================================
- Coverage 93.41% 93.02% -0.40%
==========================================
Files 868 873 +5
Lines 275540 275619 +79
==========================================
- Hits 257391 256385 -1006
- Misses 18149 19234 +1085 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
There was a problem hiding this comment.
Pull request overview
This PR adds LoongArch-optimized implementations for several activation/math layers (SELU, GELU, ELU, Erf) and extends the LoongArch LSX/LASX math helper headers with new vector routines needed by those layers.
Changes:
- Add LoongArch layer implementations for SELU, GELU, ELU, and Erf with LSX/LASX vectorized fast paths plus scalar fallbacks.
- Extend
lsx_mathfun.hwitherf_psandelu_ps, and extendlasx_mathfun.hwithelu_psfor LASX. - Enable packing support in the new LoongArch layer constructors when LSX is available.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/layer/loongarch/selu_loongarch.h | Declares SELU_loongarch layer specialization. |
| src/layer/loongarch/selu_loongarch.cpp | Implements LSX-vectorized SELU forward. |
| src/layer/loongarch/gelu_loongarch.h | Declares GELU_loongarch layer specialization. |
| src/layer/loongarch/gelu_loongarch.cpp | Implements LSX-vectorized GELU (fast and non-fast) forward. |
| src/layer/loongarch/erf_loongarch.h | Declares Erf_loongarch layer specialization. |
| src/layer/loongarch/erf_loongarch.cpp | Implements LSX-vectorized erf forward. |
| src/layer/loongarch/elu_loongarch.h | Declares ELU_loongarch layer specialization. |
| src/layer/loongarch/elu_loongarch.cpp | Implements LASX/LSX-vectorized ELU forward. |
| src/layer/loongarch/lsx_mathfun.h | Adds erf_ps and elu_ps LSX vector helpers. |
| src/layer/loongarch/lasx_mathfun.h | Adds elu_ps LASX vector helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| __m128 _nps = exp_ps(_p); | ||
| _nps = __lsx_vfsub_s(_nps, _one); | ||
| _nps = __lsx_vfmul_s(_nps, _alphaxlambda); | ||
|
|
||
| _p = __lsx_vfmul_s(_p, _lambda); | ||
|
|
||
| _p = (__m128)__lsx_vbitsel_v((__m128i)_p, (__m128i)_nps, (__m128i)_lemask); | ||
| __lsx_vst(_p, ptr, 0); |
| __m128 _half = (__m128)__lsx_vreplfr2vr_s(0.5f); | ||
| __m128 _one = (__m128)__lsx_vreplfr2vr_s(1.f); | ||
| __m128 _inv_sqrt2 = (__m128)__lsx_vreplfr2vr_s(0.70710678f); | ||
| for (; i + 3 < size; i += 4) | ||
| { | ||
| __builtin_prefetch(ptr + 16); | ||
| __m128 _p = (__m128)__lsx_vld(ptr, 0); | ||
|
|
||
| __m128 _blob = __lsx_vfmul_s(_inv_sqrt2, _p); | ||
| _blob = erf_ps(_blob); | ||
| _blob = __lsx_vfadd_s(_one, _blob); | ||
| _blob = __lsx_vfmul_s(_half, __lsx_vfmul_s(_blob, _p)); | ||
| __lsx_vst(_blob, ptr, 0); |
| if (fast_gelu) | ||
| { | ||
| *ptr = 0.5f * *ptr * (1.0f + tanhf(0.79788452f * (*ptr + 0.044715f * *ptr * *ptr * *ptr))); | ||
| } | ||
| else | ||
| { | ||
| *ptr = 0.5f * *ptr * (1.0f + erff(0.70710678f * *ptr)); | ||
| } |
No description provided.