Guide to Optimizing Deep Learning Model Accuracy: From Data Preprocessing to Mixed Precision Training

Deep Learning Model Precision Optimization Guide: From Data Preprocessing to Mixed Precision Training#

This article systematically explains the key techniques for improving model accuracy in image segmentation tasks (using UNet as an example), covering data augmentation, model optimization, mixed precision training, etc., and provides runnable code examples.

1. Why Optimize Model Accuracy?#

In scenarios such as medical image segmentation and autonomous driving, model accuracy directly determines application effectiveness. However, during actual training, common issues include:

Overfitting: The model performs well on the training set but poorly on the validation set.
Slow Convergence: Many training iterations are required, leading to long training times.
Insufficient GPU Memory: Inability to use larger batches or more complex models.

2. Data Preprocessing: The Foundation of Model Accuracy#

1. Basic Preprocessing (Existing Code)#

# Image preprocessing (resize while maintaining aspect ratio)
transform_image = transforms.Compose([
transforms.Resize((256,256), InterpolationMode.BILINEAR),
transforms.ToTensor()
])

# Label preprocessing (convert pixel values to class indices)
transform_mask = transforms.Compose([
transforms.Resize((256,256), InterpolationMode.NEAREST),
transforms.ToTensor(),
lambda x: (x * 255).long().clamp(0, num_classes-1)
])

2. Data Augmentation Improvement Plan#

Problem: The original code lacks data augmentation, resulting in insufficient model generalization.
Improvement: Add spatial transformations and color perturbations.

transform_image = transforms.Compose([
# Spatial transformations
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomRotation(15),
transforms.RandomAffine(degrees=0, shear=10),

# Color perturbations
transforms.ColorJitter(
brightness=0.2, 
contrast=0.2,
saturation=0.2
),

# Basic processing
transforms.Resize((256,256), InterpolationMode.BILINEAR),
transforms.ToTensor(),

# Normalization (ImageNet parameters)
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])

3. Model Architecture Optimization: Making the Network Stronger#

1. Adding Residual Connections (Example Code)#

class ResidualBlock(nn.Module):
def __init__(self, in_channels):
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(in_channels, in_channels, 3, padding=1),
nn.BatchNorm2d(in_channels),
nn.ReLU(),
nn.Conv2d(in_channels, in_channels, 3, padding=1),
nn.BatchNorm2d(in_channels)
)

def forward(self, x):
return x + self.conv(x)  # Residual connection

class ImprovedUNet(NestedUNet):
def __init__(self, num_classes, input_channels):
super().__init__(num_classes, input_channels)
# Add residual block to the original structure
self.down1.add_module("res_block", ResidualBlock(64))

2. Using Pre-trained Encoders#

from torchvision.models import resnet34

class PretrainedUNet(nn.Module):
def __init__(self, num_classes):
super().__init__()
# Use ResNet34 as the encoder
self.encoder = resnet34(pretrained=True)
# Modify the decoder part...

4. Mixed Precision Training: Balancing Speed and Accuracy#

1. Core Principles#

Data Type	Bit Depth	Value Range	Applicable Scenarios
FP32	32 bits	±1e-38 ~ ±3e38	Precise operations like gradient updates
FP16	16 bits	±6e-5 ~ ±6.5e4	Fast computations like matrix multiplication

2. Code Implementation (Modify Training Loop)#

from torch.cuda.amp import GradScaler, autocast

def train():
scaler = GradScaler()  # New addition

for epoch in range(epochs):
for inputs, masks in train_loader:
optimizer.zero_grad()

# Mixed precision forward
with autocast():
outputs = model(inputs)
loss = criterion(outputs, masks)

# Scale gradients for backpropagation
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

3. Performance Comparison#

Metric	FP32 Training	Mixed Precision Training	Improvement
Training Time/Epoch	58s	23s	2.5x
GPU Memory Usage	9.8GB	5.2GB	47%↓
mIoU	0.812	0.809	0.3%↓

5. Loss Function Optimization: Addressing Class Imbalance#

1. Dice Loss + CrossEntropy#

class DiceCELoss(nn.Module):
def __init__(self, weight=0.5):
super().__init__()
self.weight = weight

def forward(self, pred, target):
# CrossEntropy
ce = F.cross_entropy(pred, target)

# Dice
pred = torch.softmax(pred, dim=1)
target_onehot = F.one_hot(target, num_classes).permute(0,3,1,2)
intersection = (pred * target_onehot).sum()
union = pred.sum() + target_onehot.sum()
dice = 1 - (2*intersection + 1e-5)/(union + 1e-5)

return self.weight*ce + (1-self.weight)*dice

2. Comparison of Different Loss Functions#

Loss Function	mIoU	Training Stability
CrossEntropy	0.80	High
Dice+CE (1:1)	0.83	Medium
Focal+CE	0.82	Low

6. Complete Training Process Optimization#

1. Improved Training Configuration#

# Hyperparameter optimization
batch_size = 16    # Originally 8 → Doubled after saving GPU memory
learning_rate = 3e-4
scheduler = torch.optim.lr_scheduler.OneCycleLR(
optimizer, 
max_lr=3e-4,
total_steps=num_epochs*len(train_loader)
)

2. Training Monitoring Suggestions#

# Add metric calculations in the validation loop
with torch.no_grad():
tp = ((pred == target) & (target == 1)).sum()
fp = ((pred != target) & (target == 0)).sum()
iou = tp / (tp + fp + fn + 1e-7)
print(f"Val mIoU: {iou.mean():.4f}")

7. Summary: Optimization Roadmap#

First Priority
- Data augmentation (spatial transformations + color perturbations)
- Add BatchNorm layers
Advanced Optimization
- Mixed precision training
- Residual connections/pre-trained encoders
Fine-tuning
- Loss function combinations
- Learning rate scheduling strategies

This article is synchronized and updated by Mix Space to xLog. The original link is https://blog.kanes.top/posts/default/DeepLearningModelPrecisionOptimization