Skip to content

support loongarch elu erf gelu selu#6606

Open
futz12 wants to merge 2 commits intoTencent:masterfrom
futz12:some-activation-opt-on-loongarch
Open

support loongarch elu erf gelu selu#6606
futz12 wants to merge 2 commits intoTencent:masterfrom
futz12:some-activation-opt-on-loongarch

Conversation

@futz12
Copy link
Contributor

@futz12 futz12 commented Mar 17, 2026

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.02%. Comparing base (7237643) to head (28c568d).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6606      +/-   ##
==========================================
- Coverage   93.41%   93.02%   -0.40%     
==========================================
  Files         868      873       +5     
  Lines      275540   275619      +79     
==========================================
- Hits       257391   256385    -1006     
- Misses      18149    19234    +1085     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tencent-adm
Copy link
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds LoongArch-optimized implementations for several activation/math layers (SELU, GELU, ELU, Erf) and extends the LoongArch LSX/LASX math helper headers with new vector routines needed by those layers.

Changes:

  • Add LoongArch layer implementations for SELU, GELU, ELU, and Erf with LSX/LASX vectorized fast paths plus scalar fallbacks.
  • Extend lsx_mathfun.h with erf_ps and elu_ps, and extend lasx_mathfun.h with elu_ps for LASX.
  • Enable packing support in the new LoongArch layer constructors when LSX is available.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/layer/loongarch/selu_loongarch.h Declares SELU_loongarch layer specialization.
src/layer/loongarch/selu_loongarch.cpp Implements LSX-vectorized SELU forward.
src/layer/loongarch/gelu_loongarch.h Declares GELU_loongarch layer specialization.
src/layer/loongarch/gelu_loongarch.cpp Implements LSX-vectorized GELU (fast and non-fast) forward.
src/layer/loongarch/erf_loongarch.h Declares Erf_loongarch layer specialization.
src/layer/loongarch/erf_loongarch.cpp Implements LSX-vectorized erf forward.
src/layer/loongarch/elu_loongarch.h Declares ELU_loongarch layer specialization.
src/layer/loongarch/elu_loongarch.cpp Implements LASX/LSX-vectorized ELU forward.
src/layer/loongarch/lsx_mathfun.h Adds erf_ps and elu_ps LSX vector helpers.
src/layer/loongarch/lasx_mathfun.h Adds elu_ps LASX vector helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +47 to +54
__m128 _nps = exp_ps(_p);
_nps = __lsx_vfsub_s(_nps, _one);
_nps = __lsx_vfmul_s(_nps, _alphaxlambda);

_p = __lsx_vfmul_s(_p, _lambda);

_p = (__m128)__lsx_vbitsel_v((__m128i)_p, (__m128i)_nps, (__m128i)_lemask);
__lsx_vst(_p, ptr, 0);
Comment on lines +62 to +74
__m128 _half = (__m128)__lsx_vreplfr2vr_s(0.5f);
__m128 _one = (__m128)__lsx_vreplfr2vr_s(1.f);
__m128 _inv_sqrt2 = (__m128)__lsx_vreplfr2vr_s(0.70710678f);
for (; i + 3 < size; i += 4)
{
__builtin_prefetch(ptr + 16);
__m128 _p = (__m128)__lsx_vld(ptr, 0);

__m128 _blob = __lsx_vfmul_s(_inv_sqrt2, _p);
_blob = erf_ps(_blob);
_blob = __lsx_vfadd_s(_one, _blob);
_blob = __lsx_vfmul_s(_half, __lsx_vfmul_s(_blob, _p));
__lsx_vst(_blob, ptr, 0);
Comment on lines +82 to +89
if (fast_gelu)
{
*ptr = 0.5f * *ptr * (1.0f + tanhf(0.79788452f * (*ptr + 0.044715f * *ptr * *ptr * *ptr)));
}
else
{
*ptr = 0.5f * *ptr * (1.0f + erff(0.70710678f * *ptr));
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants