support loongarch elu erf gelu selu by futz12 · Pull Request #6606 · Tencent/ncnn

futz12 · 2026-03-17T14:15:29Z

No description provided.

codecov-commenter · 2026-03-17T15:12:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.02%. Comparing base (7237643) to head (28c568d).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6606      +/-   ##
==========================================
- Coverage   93.41%   93.02%   -0.40%     
==========================================
  Files         868      873       +5     
  Lines      275540   275619      +79     
==========================================
- Hits       257391   256385    -1006     
- Misses      18149    19234    +1085

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tencent-adm · 2026-03-18T04:26:18Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Copilot

Pull request overview

This PR adds LoongArch-optimized implementations for several activation/math layers (SELU, GELU, ELU, Erf) and extends the LoongArch LSX/LASX math helper headers with new vector routines needed by those layers.

Changes:

Add LoongArch layer implementations for SELU, GELU, ELU, and Erf with LSX/LASX vectorized fast paths plus scalar fallbacks.
Extend lsx_mathfun.h with erf_ps and elu_ps, and extend lasx_mathfun.h with elu_ps for LASX.
Enable packing support in the new LoongArch layer constructors when LSX is available.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/layer/loongarch/selu_loongarch.h	Declares `SELU_loongarch` layer specialization.
src/layer/loongarch/selu_loongarch.cpp	Implements LSX-vectorized SELU forward.
src/layer/loongarch/gelu_loongarch.h	Declares `GELU_loongarch` layer specialization.
src/layer/loongarch/gelu_loongarch.cpp	Implements LSX-vectorized GELU (fast and non-fast) forward.
src/layer/loongarch/erf_loongarch.h	Declares `Erf_loongarch` layer specialization.
src/layer/loongarch/erf_loongarch.cpp	Implements LSX-vectorized erf forward.
src/layer/loongarch/elu_loongarch.h	Declares `ELU_loongarch` layer specialization.
src/layer/loongarch/elu_loongarch.cpp	Implements LASX/LSX-vectorized ELU forward.
src/layer/loongarch/lsx_mathfun.h	Adds `erf_ps` and `elu_ps` LSX vector helpers.
src/layer/loongarch/lasx_mathfun.h	Adds `elu_ps` LASX vector helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/layer/loongarch/selu_loongarch.cpp

+            __m128 _nps = exp_ps(_p);
+            _nps = __lsx_vfsub_s(_nps, _one);
+            _nps = __lsx_vfmul_s(_nps, _alphaxlambda);
+
+            _p = __lsx_vfmul_s(_p, _lambda);
+
+            _p = (__m128)__lsx_vbitsel_v((__m128i)_p, (__m128i)_nps, (__m128i)_lemask);
+            __lsx_vst(_p, ptr, 0);


src/layer/loongarch/gelu_loongarch.cpp

+            __m128 _half = (__m128)__lsx_vreplfr2vr_s(0.5f);
+            __m128 _one = (__m128)__lsx_vreplfr2vr_s(1.f);
+            __m128 _inv_sqrt2 = (__m128)__lsx_vreplfr2vr_s(0.70710678f);
+            for (; i + 3 < size; i += 4)
+            {
+                __builtin_prefetch(ptr + 16);
+                __m128 _p = (__m128)__lsx_vld(ptr, 0);
+
+                __m128 _blob = __lsx_vfmul_s(_inv_sqrt2, _p);
+                _blob = erf_ps(_blob);
+                _blob = __lsx_vfadd_s(_one, _blob);
+                _blob = __lsx_vfmul_s(_half, __lsx_vfmul_s(_blob, _p));
+                __lsx_vst(_blob, ptr, 0);


src/layer/loongarch/gelu_loongarch.cpp

+            if (fast_gelu)
+            {
+                *ptr = 0.5f * *ptr * (1.0f + tanhf(0.79788452f * (*ptr + 0.044715f * *ptr * *ptr * *ptr)));
+            }
+            else
+            {
+                *ptr = 0.5f * *ptr * (1.0f + erff(0.70710678f * *ptr));
+            }


support loongarch elu erf gelu selu

53fc158

github-actions bot added the loongarch label Mar 17, 2026

add elu

28c568d

nihui requested a review from Copilot March 18, 2026 11:13

Copilot started reviewing on behalf of nihui March 18, 2026 11:13 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support loongarch elu erf gelu selu#6606

support loongarch elu erf gelu selu#6606
futz12 wants to merge 2 commits intoTencent:masterfrom
futz12:some-activation-opt-on-loongarch

futz12 commented Mar 17, 2026

Uh oh!

codecov-commenter commented Mar 17, 2026 •

edited

Loading

Uh oh!

tencent-adm commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

futz12 commented Mar 17, 2026

Uh oh!

codecov-commenter commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tencent-adm commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Mar 17, 2026 •

edited

Loading