Skip to content

Add support NPU integration (Tenstorrent) #371

@gnh1201

Description

@gnh1201

Summary

We plan to begin implementation and testing soon on several topics related to the potential utilization of NPUs within the WelsonJS ecosystem. These efforts are motivated by practical performance limitations observed in real-world usage scenarios and by architectural requirements that favor on-device or on-premise computation.

Template Matching

WelsonJS includes an image-based event triggering feature (referred to as “Screen Time”) that relies on template matching to detect predefined visual patterns on the screen. While the current implementation has been optimized to run efficiently on CPUs, the computational cost increases linearly as the number of templates grows.

In practice, when the template set becomes large, general-purpose CPUs struggle to maintain acceptable performance. This forces a trade-off: the matching interval must be increased, which in turn weakens the real-time responsiveness of the system. This limitation becomes particularly problematic in scenarios that require frequent screen analysis or near real-time event detection.

NPUs can potentially address this bottleneck. Since normalized cross-correlation (NCC) and zero-mean normalized cross-correlation (ZNCC) are core operations in template matching, NPUs may be well suited to accelerate these computations. We plan to experimentally evaluate whether NPUs can meaningfully improve matching throughput and latency while preserving real-time characteristics.

On-premise LLM

Another area of interest is the deployment of large language models in environments where reliance on third-party or resale-based LLM APIs is not acceptable. In some cases, organizational, regulatory, or security requirements mandate the use of self-owned and self-operated hardware.

In such on-premise scenarios, NPUs may play a key role as accelerators for LLM inference, helping to reduce dependency on external services while maintaining practical performance and cost efficiency. We intend to explore how NPUs can be integrated into this type of architecture and assess their feasibility in real deployments.

Next Steps

Based on the topics above, we plan to implement and test at least one concrete use case involving NPU acceleration within WelsonJS. The goal is to validate both the technical feasibility and the practical performance benefits.

For testing purposes, hardware has been sponsored through the Tenstorrent Korea OSS Developer Program. We would like to express our sincere appreciation for their support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    collaborationRelated to a collaborationenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions