RK-Transformers Documentation

RK-Transformers is a runtime library that seamlessly integrates Hugging Face transformers and sentence-transformers with Rockchip’s RKNN Neural Processing Units (NPUs). It enables efficient and facile deployment of transformer models on edge devices powered by Rockchip SoCs (RK3588, RK3576, etc.).

Key Features

Model Export & Conversion

Automatic ONNX Export: Converts Hugging Face models to ONNX with input detection
RKNN Optimization: Exports to RKNN format with configurable optimization levels (0-3)
Quantization: INT8 (w8a8) quantization with calibration dataset support
Push to Hub: Direct integration with Hugging Face Hub for model versioning

High-Performance Inference

NPU Acceleration: Leverage Rockchip’s hardware NPU for 10-20x speedup
Multi-Core Support: Automatic core selection and load balancing across NPU cores
Memory Efficient: Optimized for edge devices with limited RAM

Framework Integration

Sentence Transformers: Drop-in replacement with RKSentenceTransformer and RKCrossEncoder
Transformers API: Compatible with standard Hugging Face pipelines

Quick Links

User Guide

User Guide

API Reference

API Reference

Development

Local Development

Community & Support

GitHub Repository: emapco/rk-transformers
Issue Tracker: Report a bug
Hugging Face Hub: rk-transformers models

License

This project is licensed under the Apache License 2.0.