跳到主要内容

NPU 算子

本节总结了 SyNAP VS6x0/SL16x0 系列 NPU 和配套软件栈支持的神经网络算子。对于每种算子类型,还记录了支持的张量类型和执行引擎。设计最大化使用 NN 核心执行的算子的网络将提供最佳性能。

执行引擎

缩写描述
NN神经网络引擎
PPU并行处理单元
TP张量处理器

张量类型

缩写描述
asym-u8非对称仿射 uint8
asym-i8非对称仿射 int8
pc-sym-i8每通道对称 int8
fp3232 位浮点数
fp1616 位浮点数
h半精度
int1616 位整数
int3232 位整数
备注

NN 引擎在乘法运算中支持 int16 动态定点卷积。其他层遵循表格;如果 NN 列中没有 asym-u8,则 int16 也不可用。

基本运算

算子输入输出NNTPPPU
CONV2Dasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
CONV1Dasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
DECONVOLUTIONasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
DECONVOLUTION1Dasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
GROUPED_CONV2Dasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
FULLY_CONNECTEDasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
备注

卷积仅在满足以下条件时在 NN 引擎中执行:**stride == 1****kernel_size <= 15x15****dilation size + kernel size <= 15x15**。如果这些条件中的任何一个不满足,卷积将需要 TP 核心的支持,并且运行速度会显著降低。

激活运算

算子输入输出NNTPPPU
ELUasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
HARD_SIGMOIDasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SWISHasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
LEAKY_RELUasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
PRELUasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
RELUasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
RELUNasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
RSQRTasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SIGMOIDasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SOFTRELUasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SQRTasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
TANHasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
ABSasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
CLIPasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
EXPasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
LOGasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
NEGasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
MISHasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SOFTMAXasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
LOG_SOFTMAXasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SQUAREasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SINasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
LINEARasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
ERFasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
GELUasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16

元素运算

算子输入输出NNTPPPU
ADDasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SUBTRACTasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
MULTIPLYasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
DIVIDEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
MAXIMUMasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
MINIMUMasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
POWasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
FLOORDIVasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
MATRIXMULasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
RELATIONAL_OPSasym-u8bool8
asym-i8bool8
fp32bool8
fp16bool8
bool8bool8
LOGICAL_OPSbool8bool8
LOGICAL_NOTbool8bool8
SELECTasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
bool8bool8
ADDNasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16

归一化运算

算子输入输出NNTPPPU
BATCH_NORMasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
LRN2asym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
L2_NORMALIZEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
LAYER_NORMasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
INSTANCE_NORMasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
BATCHNORM_SINGLEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
MOMENTSasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
GROUP_NORMasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16

重塑运算

算子输入输出NNTPPPU
EXPAND_BROADCASTasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SLICEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SPLITasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
CONCATasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
STACKasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
UNSTACKasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
RESHAPEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SQUEEZEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
PERMUTEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
REORGasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SPACE2DEPTHasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
DEPTH2SPACEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
bool8bool8
BATCH2SPACEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SPACE2BATCHasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
PADasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
REVERSEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
STRIDED_SLICEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
REDUCEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
ARGMAXasym-u8asym-u8 / int16 / int32
asym-i8asym-u8 / int16 / int32
fp32int32
fp16asym-u8 / int16 / int32
ARGMINasym-u8asym-u8 / int16 / int32
asym-i8asym-u8 / int16 / int32
fp32int32
fp16asym-u8 / int16 / int32
SHUFFLECHANNELasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16

RNN 运算

算子输入输出NNTPPPU
LSTMUNIT_OVXLIBasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
CONV2D_LSTMasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
CONV2D_LSTM_CELLasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
LSTM_OVXLIBasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
GRUCELL_OVXLIBasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
GRU_OVXLIBasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16
SVDFasym-u8asym-u8asym-u8
asym-i8pc-sym-i8asym-i8
fp32fp32fp32
fp16fp16fp16

池化运算

算子输入输出NNTPPPU
POOLasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
ROI_POOLasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
POOLWITHARGMAXasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
UPSAMPLEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16

其他运算

算子输入输出NNTPPPU
PROPOSALasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
VARIABLEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
DROPOUTasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
RESIZEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
DATACONVERTasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
FLOORasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
EMBEDDING_LOOKUPasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
GATHERasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
GATHER_NDasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SCATTER_NDasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
GATHER_ND_UPDATEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
TILEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
ELTWISEMAXasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SIGNAL_FRAMEasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
CONCATSHIFTasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
UPSAMPLESCALEasym-u8asym-u8
asym-i8asym-i8
fp16fp16
ROUNDasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
CEILasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
SEQUENCE_MASKasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
REPEATasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
ONE_HOTasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16
CASTasym-u8asym-u8
asym-i8asym-i8
fp32fp32
fp16fp16