RK-Transformers Documentation

RK-Transformers is a runtime library that seamlessly integrates Hugging Face transformers and sentence-transformers with Rockchip’s RKNN Neural Processing Units (NPUs). It enables efficient and facile deployment of transformer models on edge devices powered by Rockchip SoCs (RK3588, RK3576, etc.).

Hugging Face Models Python 3.10-3.12 PyPI Version CI Status License

Key Features

Model Export & Conversion

  • Automatic ONNX Export: Converts Hugging Face models to ONNX with input detection

  • RKNN Optimization: Exports to RKNN format with configurable optimization levels (0-3)

  • Quantization: INT8 (w8a8) quantization with calibration dataset support

  • Push to Hub: Direct integration with Hugging Face Hub for model versioning

High-Performance Inference

  • NPU Acceleration: Leverage Rockchip’s hardware NPU for 10-20x speedup

  • Multi-Core Support: Automatic core selection and load balancing across NPU cores

  • Memory Efficient: Optimized for edge devices with limited RAM

Framework Integration

  • Sentence Transformers: Drop-in replacement with RKSentenceTransformer and RKCrossEncoder

  • Transformers API: Compatible with standard Hugging Face pipelines

Community & Support

License

This project is licensed under the Apache License 2.0.

Indices and tables