Modeling

Model classes for running RKNN inference.

Base Classes

class rktransformers.modeling.RKNNRuntime[source]

Bases: object

Runtime wrapper for RKNN models.

This class encapsulates loading an RKNN model, verifying its target device/platform compatibility, and initializing the runtime with the desired core mask.

model_path

Filesystem path to the RKNN model file.

Type:: Path

platform

Target platform string such as 'rk3588'. When None, the platform is detected from the host environment.

Type:: PlatformType | None

core_mask

Core mask selection for devices with multiple NPU cores. Examples include 'auto', '0', '1'`, and ``'all'.

Type:: CoreMaskType

rknn_config

Optional configuration object for RKNN runtime behavior.

Type:: RKNNConfig | None

rknn

Loaded RKNN runtime instance or None when not initialized.

Type:: RKNNLite | None

Example

>>> runtime = RKNNRuntime("/tmp/model.rknn", platform="rk3588", core_mask="auto")
>>> runtime.rknn  # The underlying RKNN runtime instance

__init__(model_path, platform=None, core_mask='auto', rknn_config=None)[source]

Create a new RKNNRuntime and loads the model specified by model_path.

Parameters:

model_path (str | Path) – Path to the RKNN model file on disk. This file will be loaded during initialization.
platform (PlatformType | None, optional) – Optional platform string specifying the target device. When None, the platform will be detected from the host environment via get_edge_host_platform().
core_mask (CoreMaskType, optional) – Core mask used for devices with several NPU cores (e.g., ‘auto’, ‘0’, ‘1’, ‘all’). Defaults to 'auto'.
rknn_config (RKNNConfig | None, optional) – Optional RKNN configuration object. Not all runtime options are currently implemented; this field is kept for future extension.

Raises:

FileNotFoundError – If the given model_path does not exist.
RuntimeError – If the model fails to load or the runtime fails to initialize.

Return type:

None

list_model_compatible_platform()[source]

Return the platforms supported by the current RKNN model.

Returns:

The value returned by RKNN’s

list_support_target_platform helper or None if the runtime is not initialized or the API is not available. Example:

{
    'support_target_platform': ['rk3588'],
    'filled_target_platform': ['rk3588']
}

Return type:

dict[str, Any] | None

class rktransformers.modeling.RKModel[source]

Bases: RKNNRuntime, PreTrainedModel, ModelHubMixin, Generic[MODEL_OUTPUT_T, Unpack[TENSOR_Ts]]

Base class for RKNN-backed text models integrated with the Hugging Face Hub.

model_type: str = 'rknn_model'

auto_model_class: alias of AutoModel

__init__(*, model_id=None, config=None, model_path, platform=None, core_mask='auto', rknn_config=None, max_seq_length=512, batch_size=1)[source]

Create a new RKNNRuntime and loads the model specified by model_path.

Parameters:

model_path (str | Path) – Path to the RKNN model file on disk. This file will be loaded during initialization.
platform (PlatformType | None, optional) – Optional platform string specifying the target device. When None, the platform will be detected from the host environment via get_edge_host_platform().
core_mask (CoreMaskType, optional) – Core mask used for devices with several NPU cores (e.g., ‘auto’, ‘0’, ‘1’, ‘all’). Defaults to 'auto'.
rknn_config (RKNNConfig | None, optional) – Optional RKNN configuration object. Not all runtime options are currently implemented; this field is kept for future extension.
model_id (str | None)
config (PretrainedConfig | None)
max_seq_length (int)
batch_size (int)

Raises:

FileNotFoundError – If the given model_path does not exist.
RuntimeError – If the model fails to load or the runtime fails to initialize.

Return type:

None

__call__(*args: Any, return_dict: Literal[False], **kwargs: Any) → tuple[Unpack[TENSOR_Ts]][source]
__call__(*args: Any, return_dict: Literal[True], **kwargs: Any) → MODEL_OUTPUT_T
__call__(*args: Any, **kwargs: Any) → MODEL_OUTPUT_T: Call self as a function.

forward(*args, **kwargs)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Parameters:

args (Any)
kwargs (Any)

Return type:

MODEL_OUTPUT_T | tuple[Unpack[TENSOR_Ts]]

property device: device: Return the device on which the model is stored.

to(device)[source]

No-op for RKModel. For compatibility with Hugging Face Transformers Pipelines.

Parameters:: device (device | str)
Return type:: RKModel

classmethod from_pretrained(pretrained_model_name_or_path, *, config=None, platform=None, core_mask='auto', subfolder='', revision=None, force_download=False, resume_download=False, proxies=None, token=None, local_files_only=False, trust_remote_code=False, cache_dir=None, file_name=None, **model_kwargs)[source]

Instantiate a pretrained model from a pre-trained model configuration.

Parameters:

model_id (Union[str, Path]) –
Can be either:
- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
  Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like rk-transformers/bert-base-uncased.
- A path to a directory containing a model previously exported using export_rknn(),
  e.g., ./my_model_directory/.
force_download (bool, defaults to True) – Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
token (Optional[Union[bool,str]], defaults to None) – The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running huggingface-cli login (stored in huggingface_hub.constants.HF_TOKEN_PATH).
cache_dir (Optional[str], defaults to None) – Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
subfolder (str, defaults to “”) – In case the relevant files are located inside a subfolder of the model repo either locally or on huggingface.co, you can specify the folder name here.
config (Optional[transformers.PretrainedConfig], defaults to None) – The model configuration.
local_files_only (Optional[bool], defaults to False) – Whether or not to only look at local files (i.e., do not try to download the model).
trust_remote_code (bool, defaults to False) – Whether or not to allow for custom code defined on the Hub in their own modeling. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
revision (Optional[str], defaults to None) – The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
pretrained_model_name_or_path (str | Path)
platform (Literal['simulator', 'rk3588', 'rk3576', 'rk3568', 'rk3566', 'rk3562'] | None)
core_mask (Literal['auto', '0', '1', '2', '0_1', '0_1_2', 'all'])
resume_download (bool | None)
proxies (dict | None)
file_name (str | None)
model_kwargs (Any)

Task-Specific Models

Feature Extraction

class rktransformers.modeling.RKModelForFeatureExtraction[source]

Bases: RKModel[BaseModelOutput, Tensor | ndarray]

RKNN model for feature extraction tasks. This model inherits from RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).

forward(input_ids, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]

The RKModelForFeatureExtraction forward method, overrides the __call__() special method.

Parameters:

input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, sequence_length)) – Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of ModelOutput instead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.
kwargs (Any)

Example of feature extraction:

from transformers import AutoTokenizer
from rktransformers.modeling import RKModelForFeatureExtraction
import torch

tokenizer = AutoTokenizer.from_pretrained("rk-transformers/all-MiniLM-L6-v2")
model = RKModelForFeatureExtraction.from_pretrained("rk-transformers/all-MiniLM-L6-v2")

inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="np")

outputs = model(**inputs)
last_hidden_state = outputs.last_hidden_state
list(last_hidden_state.shape)
# [1, 12, 384]

Sequence Classification

class rktransformers.modeling.RKModelForSequenceClassification[source]

Bases: RKModel[SequenceClassifierOutput, Tensor | ndarray]

RKNN model for sequence classification/regression tasks. This model inherits from RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).

forward(input_ids, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]

The RKModelForSequenceClassification forward method, overrides the __call__() special method.

Parameters:

input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of ModelOutput instead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.
kwargs (Any)

Example of single-label classification:

from transformers import AutoTokenizer
from rktransformers.modeling import RKModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("rk-transformers/distilbert-base-uncased-finetuned-sst-2-english")
model = RKModelForSequenceClassification.from_pretrained("rk-transformers/distilbert-base-uncased-finetuned-sst-2-english")

inputs = tokenizer("Hello, my dog is cute", return_tensors="np")

outputs = model(**inputs)
logits = outputs.logits
list(logits.shape)
# [1, 2]

Token Classification

class rktransformers.modeling.RKModelForTokenClassification[source]

Bases: RKModel[TokenClassifierOutput, Tensor | ndarray]

RKNN Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This model inherits from RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).

forward(input_ids, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]

The RKModelForTokenClassification forward method, overrides the __call__() special method.

Parameters:

input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of ModelOutput instead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.
kwargs (Any)

Example of token classification:

from transformers import AutoTokenizer
from rktransformers.modeling import RKModelForTokenClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("rk-transformers/bert-base-NER")
model = RKModelForTokenClassification.from_pretrained("rk-transformers/bert-base-NER")

inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="np")

outputs = model(**inputs)
logits = outputs.logits
list(logits.shape)
# [1, 512, 9]

Question Answering

class rktransformers.modeling.RKModelForQuestionAnswering[source]

Bases: RKModel[QuestionAnsweringModelOutput, Tensor | ndarray, Tensor | ndarray]

RKNN Model with a QuestionAnsweringModelOutput for extractive question-answering tasks like SQuAD. This model inherits from RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).

forward(input_ids, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]

The RKModelForQuestionAnswering forward method, overrides the __call__() special method.

Parameters:

input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of ModelOutput instead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.
kwargs (Any)

Example of question answering:

from transformers import AutoTokenizer
from rktransformers.modeling import RKModelForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained("rk-transformers/distilbert-base-cased-distilled-squad")
model = RKModelForQuestionAnswering.from_pretrained("rk-transformers/distilbert-base-cased-distilled-squad")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors="np")

outputs = model(**inputs)
start_logits = outputs.start_logits
end_logits = outputs.end_logits
list(start_logits.shape)
# [1, 512]
list(end_logits.shape)
# [1, 512]

Masked Language Modeling

class rktransformers.modeling.RKModelForMaskedLM[source]

Bases: RKModel[MaskedLMOutput, Tensor | ndarray]

RKNN model for masked language modeling tasks. This model inherits from RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).

forward(input_ids, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]

The RKModelForMaskedLM forward method, overrides the __call__() special method.

Parameters:

input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of ModelOutput instead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.
kwargs (Any)

Example of masked language modeling:

from transformers import AutoTokenizer
from rktransformers.modeling import RKModelForMaskedLM
import torch

tokenizer = AutoTokenizer.from_pretrained("rk-transformers/bert-base-uncased")
model = RKModelForMaskedLM.from_pretrained("rk-transformers/bert-base-uncased")

inputs = tokenizer("The capital of France is [MASK].", return_tensors="np")

outputs = model(**inputs)
logits = outputs.logits
list(logits.shape)
# [1, 512, 30522]

Multiple Choice

class rktransformers.modeling.RKModelForMultipleChoice[source]

Bases: RKModel[MultipleChoiceModelOutput, Tensor | ndarray]

RKNN Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks. This model inherits from RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).

forward(input_ids=None, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]

The RKModelForMultipleChoice forward method, overrides the __call__() special method.

Parameters:

input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, num_choices, sequence_length)) –
Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, num_choices, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, num_choices, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of ModelOutput instead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.
kwargs (Any)

Example of multiple choice:

from transformers import AutoTokenizer
from rktransformers.modeling import RKModelForMultipleChoice
import torch

tokenizer = AutoTokenizer.from_pretrained("rk-transformers/bert-base-uncased_SWAG")
model = RKModelForMultipleChoice.from_pretrained("rk-transformers/bert-base-uncased_SWAG")

prompt = "In Italy, pizza is served in slices."
choice0 = "It is eaten with a fork and knife."
choice1 = "It is eaten while held in the hand."
choice2 = "It is blended into a smoothie."
choice3 = "It is folded into a taco."
labels = torch.tensor(0).unsqueeze(0)  # choice0 is correct (according to Wikipedia ;))

encoding = tokenizer([prompt, prompt, prompt, prompt], [choice0, choice1, choice2, choice3], return_tensors="np", padding=True)
inputs = {k: np.expand_dims(v, 0) for k, v in encoding.items()}

outputs = model(**inputs)
logits = outputs.logits
list(logits.shape)
# [1, 4]