tensor([[1, 2],
[3, 4]])
2026-04-20
Source: wikipedia
wikipedia






Nielson
Integrates over the input signals using weights: \[ x_{\mathrm{out}} = \phi\left(\sum_{i=1}^n w_ix_i + c\right) \]
\(\phi\) is called the activation function
\(w_i\) are the weights
\(c\) is the bias
\[ \phi(x) = \frac{1}{1+e^{-x}} \]
\[ \phi(x) = \frac{e^x - e^{-x}}{e^x +e^{-x}} \]
\[ \phi(x) = \cases{0& \text{if} \quad x\geq 0 \\ x& \text{if} \quad x<0} \]
pytorchtensorspytorch is called a tensortensorstensorspytorch stores the computational graph*.backward() computes gradienttensorsloss.backward() updated the tensors w and b to include the gradient with respect to loss \[
\frac{\partial \mathrm{loss}}{\partial \mathbf{w}},\quad \frac{\partial \mathrm{loss}}{\partial \mathbf{b}},
\]pytorch is written to separate model building/training and dataset managementtorch.utils.data.Dataset enables built in datasetstorch.utils.data.DataLoader is for managing and iterating over datasetsimport torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
# The network class contains the intializer and some methods for our neural network
# You create a network by calling Network([Nodes_Input,Nodes_2,Nodes_3,...,Nodes_Output])
class Network(nn.Module):
def __init__(self, sizes):
super(Network, self).__init__()
self.sizes = sizes
self.num_layers = len(sizes)
self.layers = nn.ModuleList()
for i in range(self.num_layers - 1):
layer = nn.Linear(sizes[i], sizes[i+1])
nn.init.xavier_normal_(layer.weight) # Good initialization for shallow/sigmoid nets
#nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu') initialization for relus
#nn.init.kaiming_uniform_(layer.weight, mode='fan_out', nonlinearity='relu') initialization for relus and deep nets
nn.init.zeros_(layer.bias) # initialize the bias to 0
self.layers.append(layer)
# Forward is the method that calculates the value of the neural network. Basically we recursively apply the activations in each
# layer
def forward(self, x):
for layer in self.layers[:-1]:
x = nn.sigmoid(layer(x)) # sigmoid layers
#x = F.relu(layer(x)) # You will try the relu layer in the last problem
x = self.layers[-1](x)
return xdef train(network, train_data, epochs, eta, test_data=None):
optimizer = optim.SGD(network.parameters(),momentum=0.8,nesterov=True, lr=eta,weight_decay=1e-5)
loss_fn = nn.CrossEntropyLoss()
loss_history = []
accuracy_history = []
train_accuracy_history = []
# We are going to loop over the epochs
for epoch in range(epochs):
# This puts the network into training mode
network.train()
running_loss = 0.0
batch_count = 0 # Now we loop through the batches to train
for data, target in train_data:
optimizer.zero_grad() # This clears the internally stored gradients
output = network(data) # evaluate the neural network on the minibatch, we will compare this to the target
# Here we calculate the loss function and then use backpropagation
# to calculate the gradient
loss = loss_fn(output, target)
loss.backward()
# Update the weights
optimizer.step()
running_loss += loss.item()
batch_count += 1from torchvision import datasets, transforms
from torch.utils.data import DataLoader
transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])
train_dataset = datasets.FashionMNIST('data/', train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST('data/', train=False, transform=transform)
img, label = training_data[100]
plt.imshow(img.squeeze(),cmap="gray")




ADAM is much more robust to learning rate choices
ADAM is excellent when the gradient is sparse
ADAM is often the best in initial training stages


DATA 622