nnspike.models
Neural network models for vision-based robot navigation.
This module contains neural network architectures designed for the LEGO SPIKE robot’s line following and navigation tasks. It includes both custom and adapted models for different learning approaches.
- Modules:
customized: Custom neural network architectures for specific tasks loss: Custom loss functions for multi-task learning nvidia: Adapted NVIDIA models for regression and multi-task learning
Example
Creating and using a regression model:
from nnspike.models import NvidiaModelRegression
# Initialize model model = NvidiaModelRegression()
# Use model for inference prediction = model(input_tensor)
- class nnspike.models.SimpleNetClassification25(num_classes)[source]
A simple convolutional neural network for processing image data with additional sensor inputs.
This network consists of two convolutional layers followed by max pooling, and two fully connected layers. It accepts both image data and relative position information as inputs. The architecture is designed to handle input images and concatenate them with additional sensor data before final classification.
- Architecture:
Conv2d (3->8 channels, 5x5 kernel, stride=2) + ReLU + MaxPool2d (2x2)
Conv2d (8->16 channels, 5x5 kernel, stride=1, padding=2) + ReLU + MaxPool2d (2x2)
Flatten + Concatenate with relative position
Linear (2689->64) + ReLU
Linear (64->1) output
Expected input image size: (61, 197) which gets processed to (16, 7, 24) after convolutions.
- __init__(num_classes)[source]
Initialize the SimpleNetClassification25 model.
Sets up all layers including convolutional layers, pooling, and fully connected layers. The input size calculation assumes input images of size (61, 197).
- forward(x, relative_position)[source]
Forward pass through the network.
- Parameters:
x (torch.Tensor) – Input image tensor of shape (batch_size, 3, height, width). Expected input size is (batch_size, 3, 61, 197).
relative_position (torch.Tensor) – Relative position sensor data of shape (batch_size,) or (batch_size, 1). This additional sensor input is concatenated with the flattened convolutional features.
- Returns:
- Output logits of shape (batch_size, 1). These are raw
output values that can be used for regression or passed through a sigmoid for binary classification.
- Return type:
torch.Tensor
Note
The network expects input images of size (61, 197). After the first conv+pool operation, the spatial dimensions become approximately (32, 32), and after the second conv+pool operation, they become (16, 16). The comment dimensions may not be accurate for all input sizes.
- class nnspike.models.NvidiaModelMultiTask(num_modes)[source]
A neural network model based on the NVIDIA architecture for end-to-end learning of self-driving cars.
This model consists of five convolutional layers followed by four fully connected layers. The ELU activation function is used after each layer except the final output layer. Additionally, an interval input is concatenated with the flattened output from the convolutional layers before being passed through the fully connected layers.
- conv1
First convolutional layer with 3 input channels and 24 output channels.
- Type:
nn.Conv2d
- conv2
Second convolutional layer with 24 input channels and 36 output channels.
- Type:
nn.Conv2d
- conv3
Third convolutional layer with 36 input channels and 48 output channels.
- Type:
nn.Conv2d
- conv4
Fourth convolutional layer with 48 input channels and 64 output channels.
- Type:
nn.Conv2d
- conv5
Fifth convolutional layer with 64 input channels and 64 output channels.
- Type:
nn.Conv2d
- flatten
Layer to flatten the output from the convolutional layers.
- Type:
nn.Flatten
- fc1
First fully connected layer with input size adjusted to include sensor inputs.
- Type:
nn.Linear
- fc2
Second fully connected layer.
- Type:
nn.Linear
- fc3
Third fully connected layer.
- Type:
nn.Linear
- mode_classifier
Output layer for behavior mode classification (4 modes).
- Type:
nn.Linear
- self_driving_head
Output layer for self-driving control.
- Type:
nn.Linear
- elu
Exponential Linear Unit activation function applied after each layer except the final output layer.
- Type:
nn.ELU
- softmax
Softmax activation for mode classification.
- Type:
nn.Softmax
- forward(x, left_x, right_x, relative_position)[source]
Defines the forward pass of the model. Takes an image tensor x and additional sensor inputs, processes them through the network, and returns two output tensors: mode classification and control.
- Parameters:
x (torch.Tensor) – Input image tensor of shape (batch_size, 3, height, width).
left_x (torch.Tensor) – Left sensor input tensor of shape (batch_size, 1).
right_x (torch.Tensor) – Right sensor input tensor of shape (batch_size, 1).
relative_position (torch.Tensor) – Relative position tensor of shape (batch_size, 1).
- Returns:
mode_output: Softmax probabilities for robot behavior modes (batch_size, 4) [left_x following, right_x following, obstacle avoidance, self driving]
control_output: Control tensor for self-driving mode (batch_size, 1)
- Return type:
tuple[torch.Tensor, torch.Tensor]
- class nnspike.models.NvidiaModelRegression[source]
A neural network model based on the NVIDIA architecture for end-to-end learning of self-driving cars.
This model consists of five convolutional layers followed by four fully connected layers. The ELU activation function is used after each layer except the final output layer. Additionally, an interval input is concatenated with the flattened output from the convolutional layers before being passed through the fully connected layers.
- conv1
First convolutional layer with 3 input channels and 24 output channels.
- Type:
nn.Conv2d
- conv2
Second convolutional layer with 24 input channels and 36 output channels.
- Type:
nn.Conv2d
- conv3
Third convolutional layer with 36 input channels and 48 output channels.
- Type:
nn.Conv2d
- conv4
Fourth convolutional layer with 48 input channels and 64 output channels.
- Type:
nn.Conv2d
- conv5
Fifth convolutional layer with 64 input channels and 64 output channels.
- Type:
nn.Conv2d
- flatten
Layer to flatten the output from the convolutional layers.
- Type:
nn.Flatten
- fc1
First fully connected layer with input size adjusted to include sensor inputs.
- Type:
nn.Linear
- fc2
Second fully connected layer.
- Type:
nn.Linear
- fc3
Third fully connected layer.
- Type:
nn.Linear
- mode_classifier
Output layer for behavior mode classification (4 modes).
- Type:
nn.Linear
- self_driving_head
Output layer for self-driving control.
- Type:
nn.Linear
- elu
Exponential Linear Unit activation function applied after each layer except the final output layer.
- Type:
nn.ELU
- softmax
Softmax activation for mode classification.
- Type:
nn.Softmax
- forward(x, left_x, right_x, relative_position)[source]
Defines the forward pass of the model. Takes an image tensor x and additional sensor inputs, processes them through the network, and returns two output tensors: mode classification and control.
- Parameters:
x (torch.Tensor) – Input image tensor of shape (batch_size, 3, height, width).
left_x (torch.Tensor) – Left sensor input tensor of shape (batch_size, 1).
right_x (torch.Tensor) – Right sensor input tensor of shape (batch_size, 1).
relative_position (torch.Tensor) – Relative position tensor of shape (batch_size, 1).
- Returns:
mode_output: Softmax probabilities for robot behavior modes (batch_size, 4) [left_x following, right_x following, obstacle avoidance, self driving]
control_output: Control tensor for self-driving mode (batch_size, 1)
- Return type:
tuple[torch.Tensor, torch.Tensor]
- class nnspike.models.MultiTaskLoss(mode_weight=1.0, control_weight=30.0, control_scale=10.0)[source]
Modules