Tutorial
1. Building a PyTorch model with yaml
Kindle builds a PyTorch model with yaml file.
Components
input_size
: (Tuple[int, int]) (Optional) Model input image size(height, width).-
input_channel
: (float) Model input channel size.Note
ex) If
input_size
: [32, 32] andinput_channel
: 3 are given, input size of the model will be (batch_size, 3, 32, 32). Wheninput_size
is not provided, Kindle assumes that the model can take any input size. -
depth_multiple
: (float) Depth multiplication factor. width_multiple
: (float) Width multiplication factor.-
channel_divisor
: (int) (Optional) (Default: 8) Channel divisor. Whenwidth_multiple
is adjusted, number of channel is changed to multiple ofchannel_divisor
.Note
ex) If
width_multiple
is 0.5 and the output channel of the module is assigned to 24, the actual output channel is16
instead of12
. -
custom_module_paths
: (List[str]) (Optional) Custom module python script path list. -
backbone
: (List[module
]) Model layers. -
head
: (List[module
]) (Optional) Model head. This section is same widthbackbone
butwidth_multiplier
is not considered which makeshead
to have fixed channel size.Note
backbone
andhead
consist ofmodule
list.-
module
: (List[(int or List[int]), int, str, List]) [from index
,repeat
,module name
,module arguments
]-
from index
: Index number of the input for the module. -1 represents a previous module. Index number ofhead
is continued frombackbone
. First module inbackbone
must have -1from index
value which represents input image. -
repeat
: Repeat number of the module. Ex) When Conv module hasrepeat: 2
, this module will perform Conv operation twice (Input -> Conv -> Conv). module_name
: Name of the module. Pre-built modules are descried here.module_arguments
: Arguments of the module. Each module takes pre-defined arguments. Pre-built module arguments are descried here.module_keyword_arguments
: Keyword argument of the module. Pre-built module keyword arguments are descried here.
-
-
Example
input_size: [32, 32]
input_channel: 3
depth_multiple: 1.0
width_multiple: 1.0
backbone:
[
[-1, 1, Conv, [6, 5, 1, 0], {activation: LeakyReLU}],
[-1, 1, MaxPool, [2]],
[-1, 1, nn.Conv2d, [16, 5, 1, 2], {bias: False}],
[-1, 1, nn.BatchNorm2d, []],
[-1, 1, nn.ReLU, []],
[-1, 1, MaxPool, [2]],
[-1, 1, Flatten, []],
[-1, 1, Linear, [120, ReLU]],
[-1, 1, Linear, [84, ReLU]],
]
head:
[
[-1, 1, Linear, [10]]
]
Build a model
from kindle import Model
model = Model("example.yaml"), verbose=True)
idx | from | n | params | module | arguments | in_channel | out_channel | in shape | out shape |
----------------------------------------------------------------------------------------------------------------------------------------------------------
0 | -1 | 1 | 616 | Conv | [6, 5, 1, 0], activation: LeakyReLU | 3 | 8 | [3, 32, 32] | [8, 32, 32] |
1 | -1 | 1 | 0 | MaxPool | [2] | 8 | 8 | [8 32 32] | [8, 16, 16] |
2 | -1 | 1 | 3,200 | nn.Conv2d | [16, 5, 1, 2], bias: False | 8 | 16 | [8 16 16] | [16, 16, 16] |
3 | -1 | 1 | 32 | nn.BatchNorm2d | [] | 16 | 16 | [16 16 16] | [16, 16, 16] |
4 | -1 | 1 | 0 | nn.ReLU | [] | 16 | 16 | [16 16 16] | [16, 16, 16] |
5 | -1 | 1 | 0 | MaxPool | [2] | 16 | 16 | [16 16 16] | [16, 8, 8] |
6 | -1 | 1 | 0 | Flatten | [] | -1 | 1024 | [16 8 8] | [1024] |
7 | -1 | 1 | 123,000 | Linear | [120, 'ReLU'] | 1024 | 120 | [1024] | [120] |
8 | -1 | 1 | 10,164 | Linear | [84, 'ReLU'] | 120 | 84 | [120] | [84] |
9 | -1 | 1 | 850 | Linear | [10] | 84 | 10 | [84] | [10] |
Model Summary: 20 layers, 137,862 parameters, 137,862 gradients
2. Design Custom Module with YAML
You can make your own custom module with yaml file.
1. custom_module.yaml
args: [96, 32]
module:
# [from, repeat, module, args]
[
[-1, 1, Conv, [arg0, 1, 1]],
[0, 1, Conv, [arg1, 3, 1]],
[0, 1, Conv, [arg1, 5, 1]],
[0, 1, Conv, [arg1, 7, 1]],
[[1, 2, 3], 1, Concat, [1]],
[[0, 4], 1, Add, []],
]
- Arguments of yaml module can be defined as arg0, arg1 ...
2. model_with_custom_module.yaml
input_size: [32, 32]
input_channel: 3
depth_multiple: 1.0
width_multiple: 1.0
backbone:
[
[-1, 1, Conv, [6, 5, 1, 0]],
[-1, 1, MaxPool, [2]],
[-1, 1, YamlModule, ["custom_module.yaml", 48, 16]],
[-1, 1, MaxPool, [2]],
[-1, 1, Flatten, []],
[-1, 1, Linear, [120, ReLU]],
[-1, 1, Linear, [84, ReLU]],
[-1, 1, Linear, [10]]
]
3. Build model
from kindle import Model
model = Model("model_with_custom_module.yaml"), verbose=True)
idx | from | n | params | module | arguments | in shape | out shape |
---------------------------------------------------------------------------------------------------------------------------------
0 | -1 | 1 | 616 | Conv | [6, 5, 1, 0] | [3, 32, 32] | [8, 32, 32] |
1 | -1 | 1 | 0 | MaxPool | [2] | [8 32 32] | [8, 16, 16] |
2 | -1 | 1 | 10,832 | YamlModule | ['custom_module'] | [8 16 16] | [24, 16, 16] |
3 | -1 | 1 | 0 | MaxPool | [2] | [24 16 16] | [24, 8, 8] |
4 | -1 | 1 | 0 | Flatten | [] | [24 8 8] | [1536] |
5 | -1 | 1 | 184,440 | Linear | [120, 'ReLU'] | [1536] | [120] |
6 | -1 | 1 | 10,164 | Linear | [84, 'ReLU'] | [120] | [84] |
7 | -1 | 1 | 850 | Linear | [10] | [84] | [10] |
Model Summary: 36 layers, 206,902 parameters, 206,902 gradients
3. Design Custom Module from Source
You can make your own custom module from the source.
1. custom_module_model.yaml
input_size: [32, 32]
input_channel: 3
depth_multiple: 1.0
width_multiple: 1.0
custom_module_paths: ["tests.test_custom_module"] # Paths to the custom modules of the source
backbone:
# [from, repeat, module, args]
[
[-1, 1, MyConv, [6, 5, 3]],
[-1, 1, MaxPool, [2]],
[-1, 1, MyConv, [16, 3, 5, SiLU]],
[-1, 1, MaxPool, [2]],
[-1, 1, Flatten, []],
[-1, 1, Linear, [120, ReLU]],
[-1, 1, Linear, [84, ReLU]],
[-1, 1, Linear, [10]]
]
2. Write PyTorch module and ModuleGenerator
tests/test_custom_module.py
from typing import List, Union, Dict, Any
import numpy as np
import torch
from torch import nn
from kindle.generator import GeneratorAbstract
from kindle.utils.torch_utils import autopad
from kindle.modules.activation import Activation
class MyConv(nn.Module):
def __init__(
self,
in_channels: int,
out_channels: int,
kernel_size: int,
n: int,
activation: Union[str, None] = "ReLU",
) -> None:
super().__init__()
convs = []
for i in range(n):
convs.append(
nn.Conv2d(
in_channels,
in_channels if (i + 1) != n else out_channels,
kernel_size,
padding=autopad(kernel_size),
bias=False,
)
)
self.convs = nn.Sequential(*convs)
self.batch_norm = nn.BatchNorm2d(out_channels)
self.activation = Activation(activation)()
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.activation(self.batch_norm(self.convs(x)))
class MyConvGenerator(GeneratorAbstract):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
@property
def out_channel(self) -> int:
return self._get_divisible_channel(self.args[0] * self.width_multiply)
@property
def in_channel(self) -> int:
if isinstance(self.from_idx, list):
raise Exception("from_idx can not be a list.")
return self.in_channels[self.from_idx]
@property
def kwargs(self) -> Dict[str, Any]:
args = [self.in_channel, self.out_channel, *self.args[1:]]
return self._get_kwargs(MyConv, args)
@torch.no_grad()
def compute_out_shape(self, size: np.ndarray, repeat: int = 1) -> List[int]:
module = self(repeat=repeat)
module.eval()
module_out = module(torch.zeros([1, *list(size)]))
return list(module_out.shape[-3:])
def __call__(self, repeat: int = 1) -> nn.Module:
if repeat > 1:
module = [MyConv(**self.kwargs) for _ in range(repeat)]
else:
module = MyConv(**self.kwargs)
return self._get_module(module)
3. Build a model
from kindle import Model
model = Model("custom_module_model.yaml"), verbose=True)
idx | from | n | params | module | arguments | in_channel | out_channel | in shape | out shape |
----------------------------------------------------------------------------------------------------------------------------------------------------------
0 | -1 | 1 | 1,066 | MyConv | [6, 5, 3] | 3 | 8 | [3, 32, 32] | [8, 32, 32] |
1 | -1 | 1 | 0 | MaxPool | [2] | 8 | 8 | [8 32 32] | [8, 16, 16] |
2 | -1 | 1 | 3,488 | MyConv | [16, 3, 5, 'SiLU'] | 8 | 16 | [8 16 16] | [16, 16, 16] |
3 | -1 | 1 | 0 | MaxPool | [2] | 16 | 16 | [16 16 16] | [16, 8, 8] |
4 | -1 | 1 | 0 | Flatten | [] | -1 | 1024 | [16 8 8] | [1024] |
5 | -1 | 1 | 123,000 | Linear | [120, 'ReLU'] | 1024 | 120 | [1024] | [120] |
6 | -1 | 1 | 10,164 | Linear | [84, 'ReLU'] | 120 | 84 | [120] | [84] |
7 | -1 | 1 | 850 | Linear | [10] | 84 | 10 | [84] | [10] |
Model Summary: 29 layers, 138,568 parameters, 138,568 gradients
4. Utilize pretrained model
Pre-trained model from timm can be loaded in kindle yaml config file. Please refer to https://rwightman.github.io/pytorch-image-models/results/ for supported models.
Example
- In this example, we load pretrained efficient-b0 model. Then we extract each feature map layer to apply convolution layer.
1. pretrained_model.yaml
input_size: [32, 32]
input_channel: 3
depth_multiple: 1.0
width_multiple: 1.0
pretrained: mobilenetv3_small_100
backbone:
# [from, repeat, module, args]
[
[-1, 1, UpSample, []],
[-1, 1, PreTrained, [efficientnet_b0, True]],
[1, 1, PreTrainedFeatureMap, [-3]],
[-1, 1, Conv, [8, 1], {activation: LeakyReLU}],
[-1, 1, MaxPool, [2]],
[1, 1, PreTrainedFeatureMap, [-2]],
[-1, 1, Conv, [8, 1], {activation: LeakyReLU}],
[[-1, -3], 1, Concat, []],
[-1, 1, MaxPool, [2]],
[1, 1, PreTrainedFeatureMap, [-1]],
[-1, 1, Conv, [8, 1], {activation: LeakyReLU}],
[[-1, -3], 1, Concat, []],
[-1, 1, Flatten, []],
[-1, 1, Linear, [120, ReLU]],
[-1, 1, Linear, [84, ReLU]],
]
head:
[
[-1, 1, Linear, [10]]
]
- When
PreTrained
module hasfeatures_only = True
argument, the output of the module will be list of each feature map. PreTrainedFeatureMap
module simply bypassfeature_idx
output ofPreTrained
.
2. Build a model
from kindle import Model
model = Model("pretrained_model.yaml"), verbose=True)
idx | from | n | params | module | arguments | in_channel | out_channel | in_shape | out_shape
-------+----------+-----+-----------+----------------------+-------------------------------+--------------+------------------------+--------------------------------------------------------------------+--------------------------------------------------------------------
0 | -1 | 1 | 0 | UpSample | [] | 3 | 3 | [3, 32, 32] | [3, 64, 64]
1 | -1 | 1 | 3,595,388 | PreTrained | ['efficientnet_b0', True] | 3 | [16, 24, 40, 112, 320] | [3 64 64] | [[16, 32, 32], [24, 16, 16], [40, 8, 8], [112, 4, 4], [320, 2, 2]]
2 | 1 | 1 | 0 | PreTrainedFeatureMap | [-3] | 40 | 40 | [[16, 32, 32], [24, 16, 16], [40, 8, 8], [112, 4, 4], [320, 2, 2]] | [40, 8, 8]
3 | -1 | 1 | 336 | Conv | [8, 1], activation: LeakyReLU | 40 | 8 | [40, 8, 8] | [8, 8, 8]
4 | -1 | 1 | 0 | MaxPool | [2] | 8 | 8 | [8, 8, 8] | [8, 4, 4]
5 | 1 | 1 | 0 | PreTrainedFeatureMap | [-2] | 112 | 112 | [[16, 32, 32], [24, 16, 16], [40, 8, 8], [112, 4, 4], [320, 2, 2]] | [112, 4, 4]
6 | -1 | 1 | 912 | Conv | [8, 1], activation: LeakyReLU | 112 | 8 | [112, 4, 4] | [8, 4, 4]
7 | [-1, -3] | 1 | 0 | Concat | [] | -1 | 16 | [list([8, 4, 4]) list([8, 4, 4])] | [16, 4, 4]
8 | -1 | 1 | 0 | MaxPool | [2] | 16 | 16 | [16, 4, 4] | [16, 2, 2]
9 | 1 | 1 | 0 | PreTrainedFeatureMap | [-1] | 320 | 320 | [[16, 32, 32], [24, 16, 16], [40, 8, 8], [112, 4, 4], [320, 2, 2]] | [320, 2, 2]
10 | -1 | 1 | 2,576 | Conv | [8, 1], activation: LeakyReLU | 320 | 8 | [320, 2, 2] | [8, 2, 2]
11 | [-1, -3] | 1 | 0 | Concat | [] | -1 | 24 | [list([8, 2, 2]) list([16, 2, 2])] | [24, 2, 2]
12 | -1 | 1 | 0 | Flatten | [] | -1 | 96 | [24, 2, 2] | [96]
13 | -1 | 1 | 11,640 | Linear | [120, 'ReLU'] | 96 | 120 | [96] | [120]
14 | -1 | 1 | 10,164 | Linear | [84, 'ReLU'] | 120 | 84 | [120] | [84]
15 | -1 | 1 | 850 | Linear | [10] | 84 | 10 | [84] | [10]
Model Summary: 250 layers, 3,621,866 parameters, 3,621,866 gradients
5. Make object detectiong model using YOLOHead
- You can build YOLO with simple configuration.
- In this example, we made a neck to bypass the feature maps to the YOLOHead but you can build your own detection neck layers.
Note
In order to compute stride size automatically, you will need to provide arbitrary input_size
.
However, the model can take any input sizes as the model is allowed to take.
1. yolo_sample.yaml
input_size: [256, 256]
input_channel: 3
depth_multiple: 0.33
width_multiple: 0.5
anchors: &anchors
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4
- [116,90, 156,198, 373,326] # P5/32
n_classes: &n_classes
10
activation: &activation
SiLU
backbone:
# [from, repeat, module, args]
[
[-1, 1, Focus, [64, 3], {activation: *activation}],
[-1, 1, Conv, [128, 3, 2], {activation: *activation}],
[-1, 3, C3, [128], {activation: *activation}], # 2
[-1, 1, Conv, [256, 3, 2], {activation: *activation}],
[-1, 9, C3, [256], {activation: *activation}], # 4
[-1, 1, Conv, [512, 3, 2], {activation: *activation}],
[-1, 9, C3, [512], {activation: *activation}], # 6
[-1, 1, Conv, [1024, 3, 2], {activation: *activation}],
[-1, 1, SPP, [1024, [5, 9, 13]], {activation: *activation}],
[-1, 3, C3, [1024, False], {activation: *activation}], # 9
# Neck
[-1, 1, Conv, [512, 1, 1], {activation: *activation}],
[-1, 1, UpSample, [null, 2]],
[[-1, 6], 1, Concat, [1]],
[-1, 3, C3, [512, False], {activation: *activation}], # 13
[-1, 1, Conv, [256, 1, 1], {activation: *activation}],
[-1, 1, UpSample, [null, 2]],
[[-1, 4], 1, Concat, [1]],
[-1, 1, C3, [256, False], {activation: *activation}], # 17
[-1, 1, Conv, [256, 3, 2], {activation: *activation}],
[[-1, 14], 1, Concat, [1]],
[-1, 3, C3, [512, False], {activation: *activation}], # 20
[-1, 1, Conv, [512, 3, 2], {activation: *activation}],
[[-1, 10], 1, Concat, [1]],
[-1, 3, C3, [1024, False], {activation: *activation}] # 23
]
head:
[
[[17, 20, 23], 1, YOLOHead, [*n_classes, *anchors]]
]
2. Build a model
from kindle import YOLOModel
model = YOLOModel("yolo_sample.yaml", verbose=True)
idx | from | n | params | module | arguments | in_channel | out_channel | in_shape | out_shape
-------+--------------+-----+-----------+----------+--------------------------------------------------------------------------------------------+-----------------+---------------+---------------------------------------+--------------------------------
0 | -1 | 1 | 3,520 | Focus | [64, 3], activation: SiLU | 12 | 32 | [3, 256, 256] | [32, 128, 128]
1 | -1 | 1 | 18,560 | Conv | [128, 3, 2], activation: SiLU | 32 | 64 | [32 128 128] | [64, 64, 64]
2 | -1 | 1 | 18,816 | C3 | [128], activation: SiLU | 64 | 64 | [64 64 64] | [64, 64, 64]
3 | -1 | 1 | 73,984 | Conv | [256, 3, 2], activation: SiLU | 64 | 128 | [64 64 64] | [128, 32, 32]
4 | -1 | 3 | 156,928 | C3 | [256], activation: SiLU | 128 | 128 | [128 32 32] | [128, 32, 32]
5 | -1 | 1 | 295,424 | Conv | [512, 3, 2], activation: SiLU | 128 | 256 | [128 32 32] | [256, 16, 16]
6 | -1 | 3 | 625,152 | C3 | [512], activation: SiLU | 256 | 256 | [256 16 16] | [256, 16, 16]
7 | -1 | 1 | 1,180,672 | Conv | [1024, 3, 2], activation: SiLU | 256 | 512 | [256 16 16] | [512, 8, 8]
8 | -1 | 1 | 656,896 | SPP | [1024, [5, 9, 13]], activation: SiLU | 512 | 512 | [512 8 8] | [512, 8, 8]
9 | -1 | 1 | 1,182,720 | C3 | [1024, False], activation: SiLU | 512 | 512 | [512 8 8] | [512, 8, 8]
10 | -1 | 1 | 131,584 | Conv | [512, 1, 1], activation: SiLU | 512 | 256 | [512 8 8] | [256, 8, 8]
11 | -1 | 1 | 0 | UpSample | [None, 2] | 256 | 256 | [256 8 8] | [256, 16, 16]
12 | [-1, 6] | 1 | 0 | Concat | [1] | -1 | 512 | [[256 16 16], [256 16 16]] | [512, 16, 16]
13 | -1 | 1 | 361,984 | C3 | [512, False], activation: SiLU | 512 | 256 | [512 16 16] | [256, 16, 16]
14 | -1 | 1 | 33,024 | Conv | [256, 1, 1], activation: SiLU | 256 | 128 | [256 16 16] | [128, 16, 16]
15 | -1 | 1 | 0 | UpSample | [None, 2] | 128 | 128 | [128 16 16] | [128, 32, 32]
16 | [-1, 4] | 1 | 0 | Concat | [1] | -1 | 256 | [[128 32 32], [128 32 32]] | [256, 32, 32]
17 | -1 | 1 | 90,880 | C3 | [256, False], activation: SiLU | 256 | 128 | [256 32 32] | [128, 32, 32]
18 | -1 | 1 | 147,712 | Conv | [256, 3, 2], activation: SiLU | 128 | 128 | [128 32 32] | [128, 16, 16]
19 | [-1, 14] | 1 | 0 | Concat | [1] | -1 | 256 | [[128 16 16], [128 16 16]] | [256, 16, 16]
20 | -1 | 1 | 296,448 | C3 | [512, False], activation: SiLU | 256 | 256 | [256 16 16] | [256, 16, 16]
21 | -1 | 1 | 590,336 | Conv | [512, 3, 2], activation: SiLU | 256 | 256 | [256 16 16] | [256, 8, 8]
22 | [-1, 10] | 1 | 0 | Concat | [1] | -1 | 512 | [[256 8 8], [256 8 8]] | [512, 8, 8]
23 | -1 | 1 | 1,182,720 | C3 | [1024, False], activation: SiLU | 512 | 512 | [512 8 8] | [512, 8, 8]
24 | [17, 20, 23] | 1 | 40,455 | YOLOHead | [10, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]]] | [128, 256, 512] | [15, 15, 15] | [[128 32 32], [256 16 16], [512 8 8]] | [[-1, 15], [-1, 15], [-1, 15]]
Model Summary: 281 layers, 7,087,815 parameters, 7,087,815 gradients
3. Initialize biases
- Generally, object detection is better trained when biases is initialized with sample or class distribution.
from kindle import YOLOModel
# Initialize biases with default.
model = YOLOModel("yolo_sample.yaml", verbose=True, init_bias=True)
# Initialize biases if classs histogram exists and assume that generally 3 objects are shown up each bounding boxes in 100 images.
model = YOLOModel("yolo_sample.yaml", verbose=True)
model.initialize_biases(class_probability=YOUR_CLASS_HISTOGRAM, n_object_per_image=(3, 100))
# Initialize biases if class histogram does not exists and assuming each class has 60% probability chance to show.
model = Model("yolo_sample.yaml", verbose=True)
model.initialize_biases(class_frequency=0.6, n_object_per_image=(3, 100))
Note
Initializing bias method is currently experimental and prone to change in near future.