Skip to content

NMSIS: support the terapines libdsp and libnn function libraries.#65

Closed
kaishaoshao wants to merge 103 commits intoNuclei-Software:masterfrom
Terapines:feature/terapines_libml
Closed

NMSIS: support the terapines libdsp and libnn function libraries.#65
kaishaoshao wants to merge 103 commits intoNuclei-Software:masterfrom
Terapines:feature/terapines_libml

Conversation

@kaishaoshao
Copy link

@kaishaoshao kaishaoshao commented Dec 24, 2025

No description provided.

dongyongtao and others added 30 commits August 7, 2025 18:53
Signed-off-by: dongyongtao <dongyongtao@nucleisys.com>
RT-Thread/ThreadX/FreeRTOS/UCOSII support are updated

see https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc

The stack grows downwards (towards lower addresses) and the stack pointer shall be aligned to a 128-bit boundary upon procedure entry.

The ILP32E calling convention is designed to be usable with the RV32E ISA. This calling convention is the same as the integer calling convention, except for the following differences. The stack pointer need only be aligned to a 32-bit boundary.
…t 16 bytes aligned

This will make sure sp is 16 bytes aligned when call c function in task switch process

see 1c24496
… enabled

Signed-off-by: Huaqi Fang <578567190@qq.com>
Signed-off-by: dongyongtao <dongyongtao@nucleisys.com>
…eRTOS SMP

The previous implementation of `vPortRecursiveLock` was unsafe on systems with weak memory ordering (like RISC-V), leading to potential race conditions. This commit corrects the implementation by introducing necessary memory barriers and optimizes the spin-wait loop.

**Correctness Fixes:**

*   **Acquire Barrier**: An acquire memory barrier (`__RWMB()`) is added immediately after a successful atomic swap (`__AMOSWAP_W`). This is critical to prevent the compiler or CPU from reordering memory operations from within the critical section to before the lock is actually acquired. Without this, the lock provides no protection.

*   **Release Barrier**: A release memory barrier (`__RWMB()`) is added before the lock variable is cleared. This ensures that all memory writes within the critical section are globally visible *before* the lock is released. This prevents other cores from acquiring the lock and seeing stale data.

**Performance and Logic Improvements:**

*   **Test-and-Test-and-Set (TTS)**: The lock acquisition logic has been restructured into a more efficient TTS pattern. The code now spins on a cheap, non-atomic read (`while (*pxSpinLock == 0)`) and only attempts the expensive atomic swap when the lock appears to be free. This significantly reduces bus contention and improves system performance when multiple cores are contending for a lock.

*   **NOP in Spin Loop**: A `__NOP()` has been added to the spin-wait loop. This can help reduce power consumption and pipeline pressure on some CPU architectures during tight spins.

*   **Improved Readability**: Added comments to clarify the logic for recursive locking, lock acquisition, and the purpose of the memory barriers.

Signed-off-by: Huaqi Fang <578567190@qq.com>
The `ucOwnedByCore` and `ucRecursionCountByLock` arrays are used to
manage recursive spinlocks and are accessed by multiple cores.

Without the `volatile` keyword, the compiler might optimize away memory
accesses and cache the array values in registers. This could lead to
a core reading a stale value, causing incorrect lock behavior, race
conditions, or deadlocks in a multi-core environment.

Marking these arrays as `volatile` ensures that every access reads from
or writes to main memory, guaranteeing the correct and up-to-date
state is observed by all cores.

Signed-off-by: Huaqi Fang <578567190@qq.com>
n300e is rv32emac, n300e is added in build system, npk and doc

Signed-off-by: Huaqi Fang <578567190@qq.com>
…anges

Signed-off-by: Huaqi Fang <578567190@qq.com>
This application demonstrates how to switch from machine mode to user mode on Nuclei RISC-V processors.
It showcases the usage of PMP (Physical Memory Protection) configuration, ECLIC (Enhanced Core-Local Interrupt Controller)
interrupt handling, and SysTimer functionality.
- Add safety recommendation for changing MTH:
  - Disable all interrupts before modifying MTH
  - Perform fence operation after changing MTH
  - Re-enable interrupts after the changes

- Similar recommendation added for setting STH
Set MTH maybe not effective right now, it is neccessary to disable/enable interrupt for criticial mth set routine

- Add code to disable interrupts before setting BASEPRI in vPortRaiseBASEPRI, ulPortRaiseBASEPRI, and vPortSetBASEPRI functions
- This change ensures proper synchronization and prevents potential race conditions when modifying the BASEPRI register
…rt functions

It may racely met a eclic mth setted, but interrupt still goes in, and then mth modified by other tasks switched in, and then return to previous vPortEnterCritical will face a assert Here configASSERT((__ECLIC_GetMth() & portMTH_MASK) == uxMaxSysCallMTH);

- Add MSTATUS_MIE save and restore in vPortRaiseBASEPRI, ulPortRaiseBASEPRI, and vPortSetBASEPRI functions
- Ensure interrupts are disabled before setting MTH to prevent potential race conditions
The sPMP and sMPU entries currently supported are limited up to 16

- Add conditional compilation to limit SPMP and SMPU entry numbers to 16
- Preserve original logic for cases where CFG_PMP_ENTRY_NUM <= 16
- Improve compatibility with systems having more than 16 PMP entries
…king

- Add support for using MSTATUS.MIE instead of ECLIC.MTH for interrupt masking when configMAX_SYSCALL_INTERRUPT_PRIORITY >= 255
- Update port.c and portmacro.h to handle both interrupt masking methods
- Modify FreeRTOSConfig.h files to set configMAX_SYSCALL_INTERRUPT_PRIORITY to 255
- Add comments to explain the behavior of configMAX_SYSCALL_INTERRUPT_PRIORITY
- Remove unnecessary kernel debug and assertion code
- Eliminate redundant critical section management functions
- Simplify interrupt enable/disable macros
- Remove unused variables and functions related to max syscall priority
- Optimize task switching and tick handling
- The original portable code is modified based on FreeRTOS, now this port just use mie to do interrupt masking

Signed-off-by: Huaqi Fang <578567190@qq.com>
Signed-off-by: Huaqi Fang <578567190@qq.com>
Signed-off-by: Huaqi Fang <578567190@qq.com>
Now support PMP entries above 16 to 64

Signed-off-by: Huaqi Fang <578567190@qq.com>
Signed-off-by: Huaqi Fang <578567190@qq.com>
- Enhance NMSIS to support more PMP entries and enable __LD/__SD macro for rv32
- Add comments for ECLIC threshold MTH recommendations
- Update FreeRTOS demo to use MSTATUS.MIE for interrupt masking
- Fix FreeRTOS task stack alignment and optimize SMP spinlock implementation
- Introduce new interrupt masking feature for FreeRTOS
- Limit sPMP/sMPU entry numbers for evalsoc
- Add demo_eclic_umode nsdk_cli configuration for CI

Signed-off-by: Huaqi Fang <578567190@qq.com>
Signed-off-by: qiujiandong <qiujiandong@nucleisys.com>
Signed-off-by: qiujiandong <qiujiandong@nucleisys.com>
fanghuaqi and others added 25 commits November 11, 2025 11:42
Signed-off-by: Huaqi Fang <578567190@qq.com>
Signed-off-by: Huaqi Fang <578567190@qq.com>
Signed-off-by: Huaqi Fang <578567190@qq.com>
This required Nuclei Studio and CPU Model >= 2025.10

Signed-off-by: Huaqi Fang <578567190@qq.com>
Signed-off-by: Huaqi Fang <578567190@qq.com>
…ne stalls on branch misprediction

Signed-off-by: Huaqi Fang <578567190@qq.com>
Still not working, still debug it now

Signed-off-by: Huaqi Fang <578567190@qq.com>
Still not working, just add porting code

Signed-off-by: Huaqi Fang <578567190@qq.com>
exception

ux900_best_config_2c_ku060_50M_3274b1812_2f700b650_202402261123.bit

**** ThreadX SMP Linux Demonstration **** (c) 1996-2020 Microsoft Corporation

           thread 0 events sent                65221, thread 0 cpu 1
           thread 1 messages sent:         696771070, thread 1 cpu 0
           thread 2 messages received:     696771515, thread 2 cpu 0
           thread 3 obtained semaphore:       244577, thread 3 cpu 0
           thread 4 obtained semaphore:       244576, thread 4 cpu 0
           thread 5 events received:           65221, thread 5 cpu 0
           thread 6 mutex obtained:           244577, thread 6 cpu 0
           thread 7 mutex obtained:           244577, thread 7 cpu 0

**** ThreadX SMP Linux Demonstration **** (c) 1996-2020 Microsoft Corporation

           thread 0 events sent                65222, thread 0 cpu 0
           thread 1 messages sent:         696781710, thread 1 cpu 1
           thread 2 messages received:     696782221, thread 2 cpu 1
           thread 3 obtained semaphore:       244580, thread 3 cpu 1
           thread 4 obtained semaphore:       244580, thread 4 cpu 1
           thread 5 events received:           65222, thread 5 cpu 1
           thread 6 mutex obtained:           244581, thread 6 cpu 1
           thread 7 mutex obtained:           244580, thread 7 cpu 1

ux900_best_config_4c_vcu118_50M_42f9d913d_2f700b650_202402261855.bit

this is not working for smpx4

Nuclei SDK Build Time: Dec  5 2025, 11:47:02
Download Mode: SRAM
CPU Frequency 50322472 Hz
CPU HartID: 0
**** ThreadX SMP Linux Demonstration **** (c) 1996-2020 Microsoft Corporation

           thread 0 events sent                    1, thread 0 cpu 0
           thread 1 messages sent:               719, thread 1 cpu 1
           thread 2 messages received:          1077, thread 2 cpu 3
           thread 3 obtained semaphore:            2, thread 3 cpu 3
           thread 4 obtained semaphore:            1, thread 4 cpu 2
           thread 5 events received:               1, thread 5 cpu 1
           thread 6 mutex obtained:                2, thread 6 cpu 2
           thread 7 mutex obtained:                2, thread 7 cpu 1

**** ThreadX SMP Linux Demonstration **** (c) 1996-2020 Microsoft Corporation

           thread 0 events sent                    2, thread 0 cpu 3
           thread 1 messages sent:             11049, thread 1 cpu 2
           thread 2 messages received:         11409, thread 2 cpu 1
           thread 3 obtained semaphore:            5, thread 3 cpu 0
           thread 4 obtained semaphore:            5, thread 4 cpu 1
2          thread 5 events received:      U        20 thr2ad M CAUS2:   t r  P 0xa:012f08
 8tex MTVaL  ::0x0
, tr t2: 0x000100, d 7  utexthread 7 :utex obt ined         e       5, thr0ad 7:cpu 2,
                                                                                      t
                                                                                       : 0x4, t5: 0x2, t6: 0xa0010870
a0: 0x2, a1: 0xa0010748, a2: 0x8, a3: 0x31000, a4: 0x1, a5: 0x18031008, a6: 0xa0010b10, a7: 0xf
cause: 0x38000002, epc: 0xa0012f88
msubm: 0x80
**** ThreadX SMP Linux Demonstration **** (c) 1996-2020 Microsoft Corporation

           thread 0 events sent                    3, thread 0 cpu 0
           thread 1 messages sent:             20129, thread 1 cpu 1
           thread 2 messages received:         20462, thread 2 cpu 3
           thread 3 obtained semaphore:            9, thread 3 cpu 1
           thread 4 obtained semaphore:            9, thread 4 cpu 1
           thread 5 events received:               3, thread 5 cpu 3
           thread 6 mutex obtained:                6, thread 6 cpu 1
           thread 7 mutex obtained:                5, thread 7 cpu 2

**** ThreadX SMP Linux Demonstration **** (c) 1996-2020 Microsoft Corporation

           thread 0 events sent                    4, thread 0 cpu 3
MCAUSE : 0x38000002
MDCAUSE: 0x0
MEPC   : 0xa0010000
MTVAL  : 0x0
HARTID : 3
MCAUSE : 0x30000002
MDCAUSE: 0x0
MEPC   : 0xa0010af0
MTVAL  : 0x0
HARTID : 3
ra: 0xa0010af0, tp: 0xa00106d0, t0: 0xdeadbeef, t1: 0xdeadbeef, t2: 0xdeadbeef, t3: 0xdeadbeef, t4: 0xdeadbeef, t5: 0xdeadbeef, t6: 0xdeadbeef

Signed-off-by: Huaqi Fang <578567190@qq.com>
tested on SMPx4
ux900k_smp4_ecc-rv64imafdcb_zfh_dsp-i64d64ic64dc64l2c2048s2G-pa32_plic_eclic_ecc_pf1_pmp8-vcu118_50M_17651beacb_407314831_202511241738_v4.4.1.bit

Still not working

Signed-off-by: Huaqi Fang <578567190@qq.com>
…dler

mcause and msubm need to be saved and restore to make sure the interrupt status is correct

Thread 1 "riscv.cpu.0" hit Breakpoint 1, eclic_msip_handler () at ../../../OS/ThreadX/ports/nuclei/gcc/context.S:195
195         mret
1: /x ($mintstatus >> 24) & 0xF = 0xf
2: /x ($msubm >> 6) & 0x3 = 0x1
3: /x ($msubm >> 8) & 0x3 = 0x0
(gdb) si
_tx_thread_system_return () at ../../../OS/ThreadX/ports/nuclei/tx_port.h:313
313             __RWMB();
1: /x ($mintstatus >> 24) & 0xF = 0x0
2: /x ($msubm >> 6) & 0x3 = 0x0
3: /x ($msubm >> 8) & 0x3 = 0x0
both mcause and msubm should be save and restored during idle task emulation
The `volatle` restriction is unnecessary for the `vec_base` local
variable.

Signed-off-by: qiujiandong <qiujiandong@nucleisys.com>
…libdsp and libnn function libraries.

Related commit id: f481c68
@kaishaoshao kaishaoshao changed the title NMSIS: Synchronize the NMSIS develop branch to support the terapines libdsp and libnn function libraries. NMSIS: support the terapines libdsp and libnn function libraries. Dec 24, 2025
@kaishaoshao kaishaoshao reopened this Dec 24, 2025
@kaishaoshao
Copy link
Author

Sorry,Due to network issues, I had to submit a PR again.Duplicate of #66 , closing this one because of wrong base branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants