Kindle

Kindle Modules

Supported Modules Summary

Module	Components	Arguments
Conv	Conv -> BatchNorm -> Activation	[out_channels, kernel_size, stride, padding, groups, activation]
DWConv	DWConv -> BatchNorm -> Activation	[out_channels, kernel_size, stride, padding, activation]
Focus	Reshape x -> Conv -> Concat	[out_channels, kernel_size, stride, padding, activation]
Bottleneck	Expansion ConvBNAct -> ConvBNAct	[out_channels, shortcut, groups, expansion, activation]
BottleneckCSP	CSP Bottleneck	[out_channels, shortcut, groups, expansion, activation]
C3	CSP Bottleneck with 3 Conv	[out_channels, shortcut, groups, expansion, activation]
MV2Block	MobileNet v2 block	[out_channels, stride, expand_ratio, activation]
AvgPool	Average pooling	[kernel_size, stride, padding]
MaxPool	Max pooling	[kernel_size, stride, padding]
GlobalAvgPool	Global Average Pooling	[]
SPP	Spatial Pyramid Pooling	[out_channels, [kernel_size1, kernel_size2, ...], activation]
SPPF	Spatial Pyramid Pooling - Fast	[out_channels, kernel_size, activation]
Flatten	Flatten	[]
Concat	Concatenation	[dimension]
Linear	Linear	[out_channels, activation]
Add	Add	[]
UpSample	UpSample	[]
Identity	Identity	[]
YamlModule	Custom module from yaml file	['yaml/file/path', arg0, arg1, ...]
nn.{module_name}	PyTorch torch.nn.* module	Please refer to https://pytorch.org/docs/stable/nn.html
Pretrained	timm.create_model	[model_name, use_feature_maps, features_only, pretrained]
PreTrainedFeatureMap	Bypass feature layer map from `Pretrained`	[feature_idx]
YOLOHead	YOLOv5 head module	[n_classes, anchors, out_xyxy]
MobileViTBlock	MobileVit Block(experimental)	[conv_channels, mlp_channels, depth, kernel_size, patch_size, dropout, activation]

Note

nn.{module_name} is currently experimental. This might change in the future release. Use with caution.

Conv

Argument name	Type	Default value	Description
out_channels	int		Conv channels
kernel_size	int		(n, n) kernel size
stride	int	1	Conv stride
padding	int	None	Conv padding. If None, auto-padding will be applied which generates same width and height of the input
groups	int	1	Group convolution size. If 1, no group convolution
activation	str or None	"ReLU"	If None, no activation(Identity) is applied.

Please refer to https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html for further detail.

DWConv

Argument name	Type	Default value	Description
out_channels	int		Conv channels
kernel_size	int		(n, n) kernel size
stride	int	1	Conv stride
padding	int	None	Conv padding. If None, auto-padding will be applied which generates same width and height of the input
activation	str or None	"ReLU"	If None, no activation(Identity) is applied.

DWConv is identical to Conv but with force grouped convolution.

Focus

Argument name	Type	Default value	Description
out_channels	int		Conv channels
kernel_size	int		(n, n) kernel size
stride	int	1	Conv stride
padding	int	None	Conv padding. If None, auto-padding will be applied which generates same width and height of the input
groups	int	1	Group convolution size. If 1, no group convolution
activation	str or None	"ReLU"	If None, no activation(Identity) is applied.

Bottleneck

Argument name	Type	Default value	Description
out_channels	int		Conv channels
shortcut	bool	True	Use shortcut. Only applied when in_channels and out_channels are same.
groups	int	1	Group convolution size. If 1, no group convolution
expansion	float	0.5	Expansion(squeeze) ratio.
activation	str or None	"ReLU"	If None, no activation(Identity) is applied.

BottleneckCSP

Argument name	Type	Default value	Description
out_channels	int		Conv channels
shortcut	bool	True	Use shortcut. Only applied when in_channels and out_channels are same.
groups	int	1	Group convolution size. If 1, no group convolution
expansion	float	0.5	Expansion(squeeze) ratio.
activation	str or None	"ReLU"	If None, no activation(Identity) is applied.

C3

Argument name	Type	Default value	Description
out_channels	int		Conv channels
shortcut	bool	True	Use shortcut. Only applied when in_channels and out_channels are same.
groups	int	1	Group convolution size. If 1, no group convolution
expansion	int	0.5	Expansion(squeeze) ratio.
activation	str or None	"ReLU"	If None, no activation(Identity) is applied.

MV2Block

Argument name	Type	Default value	Description
out_channels	int		Conv channels
stride	int	1	Stride value. (1 or 2 only)
expand_ratio	int	4	Expansion ratio.
activation	str or None	"ReLU"	If None, no activation(Identity) is applied.

AvgPool

Argument name	Type	Default value
kernel_size	int
stride	int or None	None
padding	int	0
ceil_mode	bool	False
count_include_pad	bool	True
divisor_override	bool or None	None

Please refer to https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html for further detail.

MaxPool

Argument name	Type	Default value
kernel_size	int
stride	int or None	None
padding	int	0
dilation	int	1
return_indices	bool	False
ceil_mode	bool	False

Please refer to https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html for further detail.

SPP

Argument name	Type	Default value	Description
out_channels	int		Conv channels
kernel_sizes	List[int]	[5, 9, 13]	List of (n, n) kernel size
activation	str or None	"ReLU"	If None, no activation(Identity) is applied.

SPPF

Argument name	Type	Default value	Description
out_channels	int		Conv channels
kernel_sizes	int	5	Kernel size. Default value(5) is equivalent (5, 9, 13) kernel sizes in SPP
activation	str or None	"ReLU"	If None, no activation(Identity) is applied.

Flatten

Argument name	Type	Default value	Description
start_dim	int	1
end_dim	int	-1

Please refer to https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html for further detail.

Concat

Argument name	Type	Default value	Description
dimension	int	1

Linear

Argument name	Type	Default value	Description
out_channels	int
activation	str or None	None

UpSample

Argument name	Type	Default value
size	int or None	None
scale_factor	int or None	2
mode	str	nearest
align_corners	bool or None	None

Please refer to https://pytorch.org/docs/stable/generated/torch.nn.Upsample.html for further detail.

YamlModule

Argument name	Type	Default value	Description
verbose	bool	False

yaml file path and argument configured in yaml module can not be passed through keyword argument.

Pretrained

Argument name	Type	Default value	Description
model_name	str		Please refer to https://rwightman.github.io/pytorch-image-models/results for supported models.
ues_feature_maps	bool	False	If True, return value of the module will be list of each feature maps. `List[torch.Tensor]` (features_only must be True in this case). Otherwise, returns last feature map.
features_only	bool	True	If True, skip classification layer and use feature layers only. `torch.Tensor`
pretrained	bool	True	use pretrained weight

In case you are just lazy as I am, here is list of pretrained model names. (timm==0.4.5)

'adv_inception_v3', 'cspdarknet53', 'cspresnet50', 'cspresnext50', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'densenetblur121d', 'dla34', 'dla46_c', 'dla46x_c', 'dla60', 'dla60_res2net', 'dla60_res2next', 'dla60x', 'dla60x_c', 'dla102', 'dla102x', 'dla102x2', 'dla169', 'dm_nfnet_f0', 'dm_nfnet_f1', 'dm_nfnet_f2', 'dm_nfnet_f3', 'dm_nfnet_f4', 'dm_nfnet_f5', 'dm_nfnet_f6', 'dpn68', 'dpn68b', 'dpn92', 'dpn98', 'dpn107', 'dpn131', 'ecaresnet26t', 'ecaresnet50d', 'ecaresnet50d_pruned', 'ecaresnet50t', 'ecaresnet101d', 'ecaresnet101d_pruned', 'ecaresnet269d', 'ecaresnetlight', 'efficientnet_b0', 'efficientnet_b1', 'efficientnet_b1_pruned', 'efficientnet_b2', 'efficientnet_b2_pruned', 'efficientnet_b2a', 'efficientnet_b3', 'efficientnet_b3_pruned', 'efficientnet_b3a', 'efficientnet_em', 'efficientnet_es', 'efficientnet_lite0', 'ens_adv_inception_resnet_v2', 'ese_vovnet19b_dw', 'ese_vovnet39b', 'fbnetc_100', 'gernet_l', 'gernet_m', 'gernet_s', 'gluon_inception_v3', 'gluon_resnet18_v1b', 'gluon_resnet34_v1b', 'gluon_resnet50_v1b', 'gluon_resnet50_v1c', 'gluon_resnet50_v1d', 'gluon_resnet50_v1s', 'gluon_resnet101_v1b', 'gluon_resnet101_v1c', 'gluon_resnet101_v1d', 'gluon_resnet101_v1s', 'gluon_resnet152_v1b', 'gluon_resnet152_v1c', 'gluon_resnet152_v1d', 'gluon_resnet152_v1s', 'gluon_resnext50_32x4d', 'gluon_resnext101_32x4d', 'gluon_resnext101_64x4d', 'gluon_senet154', 'gluon_seresnext50_32x4d', 'gluon_seresnext101_32x4d', 'gluon_seresnext101_64x4d', 'gluon_xception65', 'hrnet_w18', 'hrnet_w18_small', 'hrnet_w18_small_v2', 'hrnet_w30', 'hrnet_w32', 'hrnet_w40', 'hrnet_w44', 'hrnet_w48', 'hrnet_w64', 'ig_resnext101_32x8d', 'ig_resnext101_32x16d', 'ig_resnext101_32x32d', 'ig_resnext101_32x48d', 'inception_resnet_v2', 'inception_v3', 'inception_v4', 'legacy_senet154', 'legacy_seresnet18', 'legacy_seresnet34', 'legacy_seresnet50', 'legacy_seresnet101', 'legacy_seresnet152', 'legacy_seresnext26_32x4d', 'legacy_seresnext50_32x4d', 'legacy_seresnext101_32x4d', 'mixnet_l', 'mixnet_m', 'mixnet_s', 'mixnet_xl', 'mnasnet_100', 'mobilenetv2_100', 'mobilenetv2_110d', 'mobilenetv2_120d', 'mobilenetv2_140', 'mobilenetv3_large_100', 'mobilenetv3_rw', 'nasnetalarge', 'nf_regnet_b1', 'nf_resnet50', 'nfnet_l0c', 'pnasnet5large', 'regnetx_002', 'regnetx_004', 'regnetx_006', 'regnetx_008', 'regnetx_016', 'regnetx_032', 'regnetx_040', 'regnetx_064', 'regnetx_080', 'regnetx_120', 'regnetx_160', 'regnetx_320', 'regnety_002', 'regnety_004', 'regnety_006', 'regnety_008', 'regnety_016', 'regnety_032', 'regnety_040', 'regnety_064', 'regnety_080', 'regnety_120', 'regnety_160', 'regnety_320', 'repvgg_a2', 'repvgg_b0', 'repvgg_b1', 'repvgg_b1g4', 'repvgg_b2', 'repvgg_b2g4', 'repvgg_b3', 'repvgg_b3g4', 'res2net50_14w_8s', 'res2net50_26w_4s', 'res2net50_26w_6s', 'res2net50_26w_8s', 'res2net50_48w_2s', 'res2net101_26w_4s', 'res2next50', 'resnest14d', 'resnest26d', 'resnest50d', 'resnest50d_1s4x24d', 'resnest50d_4s2x40d', 'resnest101e', 'resnest200e', 'resnest269e', 'resnet18', 'resnet18d', 'resnet26', 'resnet26d', 'resnet34', 'resnet34d', 'resnet50', 'resnet50d', 'resnet101d', 'resnet152d', 'resnet200d', 'resnetblur50', 'resnetv2_50x1_bitm', 'resnetv2_50x1_bitm_in21k', 'resnetv2_50x3_bitm', 'resnetv2_50x3_bitm_in21k', 'resnetv2_101x1_bitm', 'resnetv2_101x1_bitm_in21k', 'resnetv2_101x3_bitm', 'resnetv2_101x3_bitm_in21k', 'resnetv2_152x2_bitm', 'resnetv2_152x2_bitm_in21k', 'resnetv2_152x4_bitm', 'resnetv2_152x4_bitm_in21k', 'resnext50_32x4d', 'resnext50d_32x4d', 'resnext101_32x8d', 'rexnet_100', 'rexnet_130', 'rexnet_150', 'rexnet_200', 'selecsls42b', 'selecsls60', 'selecsls60b', 'semnasnet_100', 'seresnet50', 'seresnet152d', 'seresnext26d_32x4d', 'seresnext26t_32x4d', 'seresnext50_32x4d', 'skresnet18', 'skresnet34', 'skresnext50_32x4d', 'spnasnet_100', 'ssl_resnet18', 'ssl_resnet50', 'ssl_resnext50_32x4d', 'ssl_resnext101_32x4d', 'ssl_resnext101_32x8d', 'ssl_resnext101_32x16d', 'swsl_resnet18', 'swsl_resnet50', 'swsl_resnext50_32x4d', 'swsl_resnext101_32x4d', 'swsl_resnext101_32x8d', 'swsl_resnext101_32x16d', 'tf_efficientnet_b0', 'tf_efficientnet_b0_ap', 'tf_efficientnet_b0_ns', 'tf_efficientnet_b1', 'tf_efficientnet_b1_ap', 'tf_efficientnet_b1_ns', 'tf_efficientnet_b2', 'tf_efficientnet_b2_ap', 'tf_efficientnet_b2_ns', 'tf_efficientnet_b3', 'tf_efficientnet_b3_ap', 'tf_efficientnet_b3_ns', 'tf_efficientnet_b4', 'tf_efficientnet_b4_ap', 'tf_efficientnet_b4_ns', 'tf_efficientnet_b5', 'tf_efficientnet_b5_ap', 'tf_efficientnet_b5_ns', 'tf_efficientnet_b6', 'tf_efficientnet_b6_ap', 'tf_efficientnet_b6_ns', 'tf_efficientnet_b7', 'tf_efficientnet_b7_ap', 'tf_efficientnet_b7_ns', 'tf_efficientnet_b8', 'tf_efficientnet_b8_ap', 'tf_efficientnet_cc_b0_4e', 'tf_efficientnet_cc_b0_8e', 'tf_efficientnet_cc_b1_8e', 'tf_efficientnet_el', 'tf_efficientnet_em', 'tf_efficientnet_es', 'tf_efficientnet_l2_ns', 'tf_efficientnet_l2_ns_475', 'tf_efficientnet_lite0', 'tf_efficientnet_lite1', 'tf_efficientnet_lite2', 'tf_efficientnet_lite3', 'tf_efficientnet_lite4', 'tf_inception_v3', 'tf_mixnet_l', 'tf_mixnet_m', 'tf_mixnet_s', 'tf_mobilenetv3_large_075', 'tf_mobilenetv3_large_100', 'tf_mobilenetv3_large_minimal_100', 'tf_mobilenetv3_small_075', 'tf_mobilenetv3_small_100', 'tf_mobilenetv3_small_minimal_100', 'tresnet_l', 'tresnet_l_448', 'tresnet_m', 'tresnet_m_448', 'tresnet_xl', 'tresnet_xl_448', 'tv_densenet121', 'tv_resnet34', 'tv_resnet50', 'tv_resnet101', 'tv_resnet152', 'tv_resnext50_32x4d', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn', 'vit_base_patch16_224', 'vit_base_patch16_224_in21k', 'vit_base_patch16_384', 'vit_base_patch32_224_in21k', 'vit_base_patch32_384', 'vit_base_resnet50_224_in21k', 'vit_base_resnet50_384', 'vit_deit_base_distilled_patch16_224', 'vit_deit_base_distilled_patch16_384', 'vit_deit_base_patch16_224', 'vit_deit_base_patch16_384', 'vit_deit_small_distilled_patch16_224', 'vit_deit_small_patch16_224', 'vit_deit_tiny_distilled_patch16_224', 'vit_deit_tiny_patch16_224', 'vit_large_patch16_224', 'vit_large_patch16_224_in21k', 'vit_large_patch16_384', 'vit_large_patch32_224_in21k', 'vit_large_patch32_384', 'vit_small_patch16_224', 'wide_resnet50_2', 'wide_resnet101_2', 'xception', 'xception41', 'xception65', 'xception71'

PretrainedFeatureMap

Argument name	Type	Default value	Description
feature_idx	int	-1	Index of the feature maps

YOLOHead

Argument name	Type	Default value	Description
n_classes	int		Number of classes to detect
anchors	List[List[float]]		Anchor lists. Each list represents each layer's anchor and each components in the list represents anchor size of [w1, h1, w2, h2, ...]
out_xyxy	bool	False	Return coordinates as xyxy format. (For older version of yolov5 compatability)

Note

[[-3, -2, -1], 1, YOLOHead, [80, [[100, 200, 200, 100, 200, 200], [50, 100, 100, 50, 100, 100], [10, 20, 20, 10, 20, 20]]]]

This represents that YOLOHead takes inputs from previous 3 layers, detects 80 classes with 3 layers of 3 anchors.

MobileViTBlock

Argument name	Type	Default value	Description
conv_channels	int		number of channels in convolution
mlp_channels	int		number of channels in MLP
depth	int		depth of the transformer
kernel_size	int	3	(n, n) kernel size
patch_size	int or tuple	2	patch size to use in transformer.
dropout	float	0.0	Dropout probability used in Attention and MLP
activation	str or None	SiLU	If None, no activation(Identity) is applied.