Osako, Seima
MAB3

Repository



SSD (Single Shot MultiBox Detector)
The amdegroot/ssd.pytorch can be adjusted for this fruit detection tasks.

Flow of Object Detection in SSD

Training Processing


Resize the image to 300 × 300 and apply preprocessing (e.g. normalization, mean subtraction)


Feed the image into the SSD network: The SSD model is composed of four subnetworks: VGG, extras, loc and conf.


Create default boxes and assemble the model outputs


Compute losses via the loss function: Perform several steps to obtain the localization loss and the confidence loss.


Backpropagate and update weights


Inference Processing


Resize the image to 300 × 300


Feed the image into the trained SSD network


Filter and suppress redundant detections: Group together all bounding boxes that detect the same object and retain only the one with the highest confidence.


SSD model
Define functions to construct each subnetwork of SSD—VGG, “extras,” “loc,” and “conf”—and also the DBox class that generates the default boxes.

Forward Propagation
Combine model construction and forward propagation into the SSD class, and define any additional helper functions needed during the pass:


decode(): Converts each default box’s offset representation into actual bounding‑box coordinates.


nonmaximum_suppress(): Applies Non‑Maximum Suppression so that only one bounding box is kept per object.


Detect class: A class specialized for inference.
From the SSD outputs, it:

Selects the top 200 default boxes by confidence score.
Converts these into real bounding‑box coordinates (applying the offsets).
Runs Non‑Maximum Suppression to leave exactly one box per object.


SSD class: Builds the VGG, extras, loc, and conf subnetworks, and runs the end‑to‑end forward pass.

During training, it outputs the raw localization offsets (loc), confidence scores (conf), and the default boxes’ priors.
During inference, it outputs only the detected objects’ bounding‑box information.


Loss Computation


point_form(): Converts default‑box offsets into corner‑format bounding‑box coordinates.


intersect(): Computes the area of overlap between two boxes.


jaccard(): Calculates the Jaccard index (IoU) between two boxes.


match(): For each ground‑truth box (BBox):

Computes IoU with every default box (DBox).
Matches each BBox to its best‑overlapping DBox (and vice versa).
Records these best matches in best_truth_idx ↔ best_prior_idx.
Extracts the matched DBoxes’ coordinates.
Builds the true class labels for those DBoxes.
Labels any DBox with IoU < 0.5 as background (class 0).
Computes the offset targets for all matched DBoxes.
Registers the ground‑truth data for loss calculation.


encode(): Generates the target offsets for default boxes, given matched ground‑truth boxes.


MultiBoxLoss class: Implements the overall SSD loss calculation, combining localization and confidence losses.