NPU Core Configuration
Rockchip SoCs with multiple NPU cores support flexible core allocation strategies through the core_mask parameter. Choosing the right core mask can optimize performance based on your workload and system conditions.
Note
core_mask is specified at inference time, not during export.
Available Core Masks
Value |
Description |
Use Case |
|---|---|---|
|
Automatic mode - selects idle cores dynamically |
Recommended: Best for most scenarios |
|
NPU Core 0 only |
Fixed core assignment |
|
NPU Core 1 only |
Fixed core assignment |
|
NPU Core 2 only |
Fixed core assignment (RK3588 only) |
|
NPU Core 0 and 1 simultaneously |
Parallel execution across 2 cores for larger models |
|
NPU Core 0, 1, and 2 simultaneously |
Maximum parallelism (RK3588 only) for demanding models |
|
All available NPU cores |
Equivalent to |
Platform-Specific Notes
Platform |
Available Cores |
Recommended Default |
|---|---|---|
RK3588 |
0, 1, 2 (3 cores) |
|
RK3576 |
0, 1 (2 cores) |
|
RK3566/RK3568 |
0 (1 core) |
|
Warning
Attempting to use unavailable cores (e.g., "2" on RK3576) may result in a runtime error.
Usage Examples
RK-Transformers API
from rktransformers import RKModelForFeatureExtraction
model = RKModelForFeatureExtraction.from_pretrained(
"rk-transformers/all-MiniLM-L6-v2",
core_mask="all"
)
Sentence Transformers Integration
from rktransformers import RKSentenceTransformer
model = RKSentenceTransformer(
"rk-transformers/all-MiniLM-L6-v2",
model_kwargs={
"platform": "rk3588",
"core_mask": "auto",
},
)
CrossEncoder Integration
from rktransformers import RKCrossEncoder
model = RKCrossEncoder(
"rk-transformers/ms-marco-MiniLM-L12-v2",
model_kwargs={
"platform": "rk3588",
"core_mask": "auto",
},
)
Performance Considerations
Single Core vs Multi-Core
Single Core ("0", "1", "2"):
Lower power consumption
Predictable latency
Good for lightweight models
Useful when cores are allocated to different tasks
Multi-Core ("0_1", "0_1_2", "all"):
Higher throughput
Better for large models
Potentially higher latency due to synchronization
Higher power consumption
Auto vs Manual Selection
Auto Mode ("auto"):
Pros: - RKNN runtime provides automatic load balancing - Adapts to system load - No manual tuning needed
Cons: - Less predictable core assignment - May not be optimal for all scenarios
Manual Mode (specific cores):
Pros: - Predictable behavior - Fine-grained control - Better for multi-model/multi-instance deployment
Cons: - Requires manual tuning - May not adapt to changing conditions
Best Practices
Start with “auto”: Begin with automatic mode and measure performance
Benchmark different configurations: Test various core masks for your specific workload
Consider power constraints: Use fewer cores if power consumption is a concern
Monitor core utilization: Check which cores are busy before manual assignment. ajokela/rktop can help monitor NPU core usage.
Multi-Model Deployment
When running multiple models simultaneously:
# Model 1 on core 0
model1 = RKModelForFeatureExtraction.from_pretrained(
"model1",
core_mask="0"
)
# Model 2 on core 1
model2 = RKModelForSequenceClassification.from_pretrained(
"model2",
core_mask="1"
)
# Model 3 on core 2
model3 = RKModelForMultipleChoice.from_pretrained(
"model3",
core_mask="2"
)
Troubleshooting
Performance Issues
If performance is not as expected:
Try different core mask configurations
Ensure no other NPU-intensive tasks are running
Check CPU and memory usage (may be bottlenecks)
Verify model is properly quantized for the platform
Monitor NPU temperature (thermal throttling may occur)