device: cuda batch_size: 64 optimizer: Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: True lr: 0.0001 maximize: False weight_decay: 0 ) loss_function: CrossEntropyLoss() augment_data: True model: Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=same, bias=False) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=same, bias=False) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU() (6): MaxPool2d(kernel_size=3, stride=3, padding=1, dilation=1, ceil_mode=False) (7): Dropout2d(p=0.1, inplace=False) (8): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same, bias=False) (9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (10): ReLU() (11): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=same, bias=False) (12): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (13): ReLU() (14): MaxPool2d(kernel_size=3, stride=3, padding=1, dilation=1, ceil_mode=False) (15): Dropout2d(p=0.1, inplace=False) (16): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same, bias=False) (17): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (18): ReLU() (19): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=same, bias=False) (20): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (21): ReLU() (22): MaxPool2d(kernel_size=3, stride=3, padding=1, dilation=1, ceil_mode=False) (23): Flatten(start_dim=1, end_dim=-1) (24): Dropout(p=0.25, inplace=False) (25): Linear(in_features=2048, out_features=1024, bias=True) (26): ReLU() (27): Linear(in_features=1024, out_features=512, bias=True) (28): ReLU() (29): Linear(in_features=512, out_features=20, bias=False) )