nnspike.models

Neural network models for vision-based robot navigation.

This module contains neural network architectures designed for the LEGO SPIKE robot’s line following and navigation tasks. It includes both custom and adapted models for different learning approaches.

Modules:: customized: Custom neural network architectures for specific tasks loss: Custom loss functions for multi-task learning nvidia: Adapted NVIDIA models for regression and multi-task learning

Example

Creating and using a regression model:

from nnspike.models import NvidiaModelRegression

# Initialize model model = NvidiaModelRegression()

# Use model for inference prediction = model(input_tensor)

class nnspike.models.SimpleNetClassification25(num_classes)[source]

A simple convolutional neural network for processing image data with additional sensor inputs.

This network consists of two convolutional layers followed by max pooling, and two fully connected layers. It accepts both image data and relative position information as inputs. The architecture is designed to handle input images and concatenate them with additional sensor data before final classification.

Architecture:

Conv2d (3->8 channels, 5x5 kernel, stride=2) + ReLU + MaxPool2d (2x2)
Conv2d (8->16 channels, 5x5 kernel, stride=1, padding=2) + ReLU + MaxPool2d (2x2)
Flatten + Concatenate with relative position
Linear (2689->64) + ReLU
Linear (64->1) output

Expected input image size: (61, 197) which gets processed to (16, 7, 24) after convolutions.

__init__(num_classes)[source]

Initialize the SimpleNetClassification25 model.

Sets up all layers including convolutional layers, pooling, and fully connected layers. The input size calculation assumes input images of size (61, 197).

forward(x, relative_position)[source]

Forward pass through the network.

Parameters:

x (torch.Tensor) – Input image tensor of shape (batch_size, 3, height, width). Expected input size is (batch_size, 3, 61, 197).
relative_position (torch.Tensor) – Relative position sensor data of shape (batch_size,) or (batch_size, 1). This additional sensor input is concatenated with the flattened convolutional features.

Returns:

Output logits of shape (batch_size, 1). These are raw: output values that can be used for regression or passed through a sigmoid for binary classification.

Return type:

torch.Tensor

Note

The network expects input images of size (61, 197). After the first conv+pool operation, the spatial dimensions become approximately (32, 32), and after the second conv+pool operation, they become (16, 16). The comment dimensions may not be accurate for all input sizes.

class nnspike.models.NvidiaModelMultiTask(num_modes)[source]

A neural network model based on the NVIDIA architecture for end-to-end learning of self-driving cars.

This model consists of five convolutional layers followed by four fully connected layers. The ELU activation function is used after each layer except the final output layer. Additionally, an interval input is concatenated with the flattened output from the convolutional layers before being passed through the fully connected layers.

conv1

First convolutional layer with 3 input channels and 24 output channels.

Type:: nn.Conv2d

conv2

Second convolutional layer with 24 input channels and 36 output channels.

Type:: nn.Conv2d

conv3

Third convolutional layer with 36 input channels and 48 output channels.

Type:: nn.Conv2d

conv4

Fourth convolutional layer with 48 input channels and 64 output channels.

Type:: nn.Conv2d

conv5

Fifth convolutional layer with 64 input channels and 64 output channels.

Type:: nn.Conv2d

flatten

Layer to flatten the output from the convolutional layers.

Type:: nn.Flatten

fc1

First fully connected layer with input size adjusted to include sensor inputs.

Type:: nn.Linear

fc2

Second fully connected layer.

Type:: nn.Linear

fc3

Third fully connected layer.

Type:: nn.Linear

mode_classifier

Output layer for behavior mode classification (4 modes).

Type:: nn.Linear

self_driving_head

Output layer for self-driving control.

Type:: nn.Linear

elu

Exponential Linear Unit activation function applied after each layer except the final output layer.

Type:: nn.ELU

softmax

Softmax activation for mode classification.

Type:: nn.Softmax

forward(x, left_x, right_x, relative_position)[source]: Defines the forward pass of the model. Takes an image tensor x and additional sensor inputs, processes them through the network, and returns two output tensors: mode classification and control.

Parameters:

x (torch.Tensor) – Input image tensor of shape (batch_size, 3, height, width).
left_x (torch.Tensor) – Left sensor input tensor of shape (batch_size, 1).
right_x (torch.Tensor) – Right sensor input tensor of shape (batch_size, 1).
relative_position (torch.Tensor) – Relative position tensor of shape (batch_size, 1).

Returns:

mode_output: Softmax probabilities for robot behavior modes (batch_size, 4) [left_x following, right_x following, obstacle avoidance, self driving]
control_output: Control tensor for self-driving mode (batch_size, 1)

Return type:

tuple[torch.Tensor, torch.Tensor]

__init__(num_modes)[source]

forward(x, relative_position)[source]

Return type:: tuple[Tensor, Tensor]

class nnspike.models.NvidiaModelRegression[source]

A neural network model based on the NVIDIA architecture for end-to-end learning of self-driving cars.

This model consists of five convolutional layers followed by four fully connected layers. The ELU activation function is used after each layer except the final output layer. Additionally, an interval input is concatenated with the flattened output from the convolutional layers before being passed through the fully connected layers.

conv1

First convolutional layer with 3 input channels and 24 output channels.

Type:: nn.Conv2d

conv2

Second convolutional layer with 24 input channels and 36 output channels.

Type:: nn.Conv2d

conv3

Third convolutional layer with 36 input channels and 48 output channels.

Type:: nn.Conv2d

conv4

Fourth convolutional layer with 48 input channels and 64 output channels.

Type:: nn.Conv2d

conv5

Fifth convolutional layer with 64 input channels and 64 output channels.

Type:: nn.Conv2d

flatten

Layer to flatten the output from the convolutional layers.

Type:: nn.Flatten

fc1

First fully connected layer with input size adjusted to include sensor inputs.

Type:: nn.Linear

fc2

Second fully connected layer.

Type:: nn.Linear

fc3

Third fully connected layer.

Type:: nn.Linear

mode_classifier

Output layer for behavior mode classification (4 modes).

Type:: nn.Linear

self_driving_head

Output layer for self-driving control.

Type:: nn.Linear

elu

Exponential Linear Unit activation function applied after each layer except the final output layer.

Type:: nn.ELU

softmax

Softmax activation for mode classification.

Type:: nn.Softmax

forward(x, left_x, right_x, relative_position)[source]: Defines the forward pass of the model. Takes an image tensor x and additional sensor inputs, processes them through the network, and returns two output tensors: mode classification and control.

Parameters:

x (torch.Tensor) – Input image tensor of shape (batch_size, 3, height, width).
left_x (torch.Tensor) – Left sensor input tensor of shape (batch_size, 1).
right_x (torch.Tensor) – Right sensor input tensor of shape (batch_size, 1).
relative_position (torch.Tensor) – Relative position tensor of shape (batch_size, 1).

Returns:

mode_output: Softmax probabilities for robot behavior modes (batch_size, 4) [left_x following, right_x following, obstacle avoidance, self driving]
control_output: Control tensor for self-driving mode (batch_size, 1)

Return type:

tuple[torch.Tensor, torch.Tensor]

__init__()[source]

forward(x, relative_position)[source]

Return type:: Tensor

class nnspike.models.MultiTaskLoss(mode_weight=1.0, control_weight=30.0, control_scale=10.0)[source]

__init__(mode_weight=1.0, control_weight=30.0, control_scale=10.0)[source]

forward(outputs, targets)[source]

Return type:: tuple[Tensor, Tensor, Tensor]

Modules

`customized`
`loss`
`nvidia`