Modeling
Model classes for running RKNN inference.
Base Classes
- class rktransformers.modeling.RKNNRuntime[source]
Bases:
objectRuntime wrapper for RKNN models.
This class encapsulates loading an RKNN model, verifying its target device/platform compatibility, and initializing the runtime with the desired core mask.
- model_path
Filesystem path to the RKNN model file.
- Type:
Path
- platform
Target platform string such as
'rk3588'. When None, the platform is detected from the host environment.- Type:
PlatformType | None
- core_mask
Core mask selection for devices with multiple NPU cores. Examples include
'auto','0','1'`, and ``'all'.- Type:
CoreMaskType
- rknn_config
Optional configuration object for RKNN runtime behavior.
- Type:
RKNNConfig | None
- rknn
Loaded RKNN runtime instance or
Nonewhen not initialized.- Type:
RKNNLite | None
Example
>>> runtime = RKNNRuntime("/tmp/model.rknn", platform="rk3588", core_mask="auto") >>> runtime.rknn # The underlying RKNN runtime instance
- __init__(model_path, platform=None, core_mask='auto', rknn_config=None)[source]
Create a new
RKNNRuntimeand loads the model specified bymodel_path.- Parameters:
model_path (str | Path) – Path to the RKNN model file on disk. This file will be loaded during initialization.
platform (PlatformType | None, optional) – Optional platform string specifying the target device. When None, the platform will be detected from the host environment via
get_edge_host_platform().core_mask (CoreMaskType, optional) – Core mask used for devices with several NPU cores (e.g., ‘auto’, ‘0’, ‘1’, ‘all’). Defaults to
'auto'.rknn_config (RKNNConfig | None, optional) – Optional RKNN configuration object. Not all runtime options are currently implemented; this field is kept for future extension.
- Raises:
FileNotFoundError – If the given model_path does not exist.
RuntimeError – If the model fails to load or the runtime fails to initialize.
- Return type:
None
- class rktransformers.modeling.RKModel[source]
Bases:
RKNNRuntime,PreTrainedModel,ModelHubMixin,Generic[MODEL_OUTPUT_T,Unpack[TENSOR_Ts]]Base class for RKNN-backed text models integrated with the Hugging Face Hub.
- auto_model_class
alias of
AutoModel
- __init__(*, model_id=None, config=None, model_path, platform=None, core_mask='auto', rknn_config=None, max_seq_length=512, batch_size=1)[source]
Create a new
RKNNRuntimeand loads the model specified bymodel_path.- Parameters:
model_path (str | Path) – Path to the RKNN model file on disk. This file will be loaded during initialization.
platform (PlatformType | None, optional) – Optional platform string specifying the target device. When None, the platform will be detected from the host environment via
get_edge_host_platform().core_mask (CoreMaskType, optional) – Core mask used for devices with several NPU cores (e.g., ‘auto’, ‘0’, ‘1’, ‘all’). Defaults to
'auto'.rknn_config (RKNNConfig | None, optional) – Optional RKNN configuration object. Not all runtime options are currently implemented; this field is kept for future extension.
model_id (str | None)
config (PretrainedConfig | None)
max_seq_length (int)
batch_size (int)
- Raises:
FileNotFoundError – If the given model_path does not exist.
RuntimeError – If the model fails to load or the runtime fails to initialize.
- Return type:
None
- __call__(*args: Any, return_dict: Literal[False], **kwargs: Any) tuple[Unpack[TENSOR_Ts]][source]
- __call__(*args: Any, return_dict: Literal[True], **kwargs: Any) MODEL_OUTPUT_T
- __call__(*args: Any, **kwargs: Any) MODEL_OUTPUT_T
Call self as a function.
- forward(*args, **kwargs)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
- classmethod from_pretrained(pretrained_model_name_or_path, *, config=None, platform=None, core_mask='auto', subfolder='', revision=None, force_download=False, resume_download=False, proxies=None, token=None, local_files_only=False, trust_remote_code=False, cache_dir=None, file_name=None, **model_kwargs)[source]
Instantiate a pretrained model from a pre-trained model configuration.
- Parameters:
model_id (Union[str, Path]) –
- Can be either:
- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like rk-transformers/bert-base-uncased.
- A path to a directory containing a model previously exported using
export_rknn(), e.g., ./my_model_directory/.
- A path to a directory containing a model previously exported using
force_download (bool, defaults to True) – Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
token (Optional[Union[bool,str]], defaults to None) – The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running huggingface-cli login (stored in huggingface_hub.constants.HF_TOKEN_PATH).
cache_dir (Optional[str], defaults to None) – Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
subfolder (str, defaults to “”) – In case the relevant files are located inside a subfolder of the model repo either locally or on huggingface.co, you can specify the folder name here.
config (Optional[transformers.PretrainedConfig], defaults to None) – The model configuration.
local_files_only (Optional[bool], defaults to False) – Whether or not to only look at local files (i.e., do not try to download the model).
trust_remote_code (bool, defaults to False) – Whether or not to allow for custom code defined on the Hub in their own modeling. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
revision (Optional[str], defaults to None) – The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
platform (Literal['simulator', 'rk3588', 'rk3576', 'rk3568', 'rk3566', 'rk3562'] | None)
core_mask (Literal['auto', '0', '1', '2', '0_1', '0_1_2', 'all'])
resume_download (bool | None)
proxies (dict | None)
file_name (str | None)
model_kwargs (Any)
Task-Specific Models
Feature Extraction
- class rktransformers.modeling.RKModelForFeatureExtraction[source]
Bases:
RKModel[BaseModelOutput,Tensor|ndarray]RKNN model for feature extraction tasks. This model inherits from
RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).- forward(input_ids, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]
The
RKModelForFeatureExtractionforward method, overrides the__call__()special method.- Parameters:
input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, sequence_length)) – Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of
ModelOutputinstead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.kwargs (Any)
Example of feature extraction:
from transformers import AutoTokenizer from rktransformers.modeling import RKModelForFeatureExtraction import torch tokenizer = AutoTokenizer.from_pretrained("rk-transformers/all-MiniLM-L6-v2") model = RKModelForFeatureExtraction.from_pretrained("rk-transformers/all-MiniLM-L6-v2") inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="np") outputs = model(**inputs) last_hidden_state = outputs.last_hidden_state list(last_hidden_state.shape) # [1, 12, 384]
Sequence Classification
- class rktransformers.modeling.RKModelForSequenceClassification[source]
Bases:
RKModel[SequenceClassifierOutput,Tensor|ndarray]RKNN model for sequence classification/regression tasks. This model inherits from
RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).- forward(input_ids, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]
The
RKModelForSequenceClassificationforward method, overrides the__call__()special method.- Parameters:
input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of
ModelOutputinstead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.kwargs (Any)
Example of single-label classification:
from transformers import AutoTokenizer from rktransformers.modeling import RKModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("rk-transformers/distilbert-base-uncased-finetuned-sst-2-english") model = RKModelForSequenceClassification.from_pretrained("rk-transformers/distilbert-base-uncased-finetuned-sst-2-english") inputs = tokenizer("Hello, my dog is cute", return_tensors="np") outputs = model(**inputs) logits = outputs.logits list(logits.shape) # [1, 2]
Token Classification
- class rktransformers.modeling.RKModelForTokenClassification[source]
Bases:
RKModel[TokenClassifierOutput,Tensor|ndarray]RKNN Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This model inherits from
RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).- forward(input_ids, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]
The
RKModelForTokenClassificationforward method, overrides the__call__()special method.- Parameters:
input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of
ModelOutputinstead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.kwargs (Any)
Example of token classification:
from transformers import AutoTokenizer from rktransformers.modeling import RKModelForTokenClassification import torch tokenizer = AutoTokenizer.from_pretrained("rk-transformers/bert-base-NER") model = RKModelForTokenClassification.from_pretrained("rk-transformers/bert-base-NER") inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="np") outputs = model(**inputs) logits = outputs.logits list(logits.shape) # [1, 512, 9]
Question Answering
- class rktransformers.modeling.RKModelForQuestionAnswering[source]
Bases:
RKModel[QuestionAnsweringModelOutput,Tensor|ndarray,Tensor|ndarray]RKNN Model with a QuestionAnsweringModelOutput for extractive question-answering tasks like SQuAD. This model inherits from
RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).- forward(input_ids, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]
The
RKModelForQuestionAnsweringforward method, overrides the__call__()special method.- Parameters:
input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of
ModelOutputinstead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.kwargs (Any)
Example of question answering:
from transformers import AutoTokenizer from rktransformers.modeling import RKModelForQuestionAnswering import torch tokenizer = AutoTokenizer.from_pretrained("rk-transformers/distilbert-base-cased-distilled-squad") model = RKModelForQuestionAnswering.from_pretrained("rk-transformers/distilbert-base-cased-distilled-squad") question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" inputs = tokenizer(question, text, return_tensors="np") outputs = model(**inputs) start_logits = outputs.start_logits end_logits = outputs.end_logits list(start_logits.shape) # [1, 512] list(end_logits.shape) # [1, 512]
Masked Language Modeling
- class rktransformers.modeling.RKModelForMaskedLM[source]
Bases:
RKModel[MaskedLMOutput,Tensor|ndarray]RKNN model for masked language modeling tasks. This model inherits from
RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).- forward(input_ids, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]
The
RKModelForMaskedLMforward method, overrides the__call__()special method.- Parameters:
input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of
ModelOutputinstead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.kwargs (Any)
Example of masked language modeling:
from transformers import AutoTokenizer from rktransformers.modeling import RKModelForMaskedLM import torch tokenizer = AutoTokenizer.from_pretrained("rk-transformers/bert-base-uncased") model = RKModelForMaskedLM.from_pretrained("rk-transformers/bert-base-uncased") inputs = tokenizer("The capital of France is [MASK].", return_tensors="np") outputs = model(**inputs) logits = outputs.logits list(logits.shape) # [1, 512, 30522]
Multiple Choice
- class rktransformers.modeling.RKModelForMultipleChoice[source]
Bases:
RKModel[MultipleChoiceModelOutput,Tensor|ndarray]RKNN Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks. This model inherits from
RKModel, check its documentation for the generic methods the library implements for all its models (such as downloading or saving).- forward(input_ids=None, attention_mask=None, token_type_ids=None, *, return_dict=True, **kwargs)[source]
The
RKModelForMultipleChoiceforward method, overrides the__call__()special method.- Parameters:
input_ids (Union[torch.Tensor, np.ndarray] of shape (batch_size, num_choices, sequence_length)) –
Indices of input sequence tokens in the vocabulary. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, num_choices, sequence_length), defaults to None) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, num_choices, sequence_length), defaults to None) – Segment token indices to indicate first and second portions of the inputs.
return_dict (bool, optional, defaults to None) – Whether or not to return a subclass of
ModelOutputinstead of a tuple. Tensors will be np.ndarrays or torch.Tensors depending on the original input_ids type.kwargs (Any)
Example of multiple choice:
from transformers import AutoTokenizer from rktransformers.modeling import RKModelForMultipleChoice import torch tokenizer = AutoTokenizer.from_pretrained("rk-transformers/bert-base-uncased_SWAG") model = RKModelForMultipleChoice.from_pretrained("rk-transformers/bert-base-uncased_SWAG") prompt = "In Italy, pizza is served in slices." choice0 = "It is eaten with a fork and knife." choice1 = "It is eaten while held in the hand." choice2 = "It is blended into a smoothie." choice3 = "It is folded into a taco." labels = torch.tensor(0).unsqueeze(0) # choice0 is correct (according to Wikipedia ;)) encoding = tokenizer([prompt, prompt, prompt, prompt], [choice0, choice1, choice2, choice3], return_tensors="np", padding=True) inputs = {k: np.expand_dims(v, 0) for k, v in encoding.items()} outputs = model(**inputs) logits = outputs.logits list(logits.shape) # [1, 4]