2026-05-04
ISLP
Apply the network to the data sequentially
Apply it to the old activations and next state
Repeat
Use the final output for your prediction




\[ A_{lk} = g\left(\sum_{j=1}^p w_{jk} X_{lj} + \cdots \right) \]
\[ A_{lk} = g\left(\sum_{j=1}^p w_{jk} X_{lj} + \sum_{s=1}^K u_{sk} A_{(l-1),s} + \cdots \right) \]
\[ A_{lk} = g\left(w_{0k} +\sum_{j=1}^p w_{jk} X_{lj} + \sum_{s=1}^K u_{sk} A_{(l-1),s} \right) \]
\[ O_l = \beta_0 + \sum_k=1^K \beta_k A_{lk} \]
\[ O_l = g(\beta_0 + \sum_k=1^K \beta_k A_{lk}) \]
imdb.com
Jay Alammar
ResNet for imagesfreeze=False fine-tunes those weights
\[ C_l = f_l C_{l-1} + i_l \tilde{C}_l \]
Christopher Olah
Christopher Olah
Christopher Olah
Christopher Olah
class LSTMModel(nn.Module):
def __init__(self, input_size):
super(LSTMModel, self).__init__()
self.embedding = nn.Embedding(input_size, 64)
self.lstm = nn.LSTM(input_size=64,
hidden_size=64,
batch_first=True)
self.dense = nn.Linear(64, 1)
def forward(self, x):
val, (h_n, c_n) = self.lstm(self.embedding(x))
return torch.flatten(self.dense(val[:,-1]))def lstm_objective(trial):
# Hyperparameters to optimize over
embedding_dim = trial.suggest_categorical("embedding_dim", [16, 32, 64, 128])
hidden_size = trial.suggest_categorical("hidden_size", [32, 64, 128])
num_layers = trial.suggest_int("num_layers", 1, 3)
dropout = trial.suggest_float("dropout", 0.0, 0.5)
learning_rate = trial.suggest_float("learning_rate", 1e-4, 1e-2, log=True)
weight_decay = trial.suggest_float("weight_decay", 1e-6, 1e-2, log=True)
batch_size = trial.suggest_categorical("batch_size", [128, 256, 512])class TrialLSTMModel(nn.Module):
def __init__(self, input_size):
super(TrialLSTMModel, self).__init__()
self.embedding = nn.Embedding(input_size, embedding_dim)
self.drop1 = nn.Dropout(dropout)
self.lstm = nn.LSTM(input_size=embedding_dim,
hidden_size=hidden_size,
num_layers=num_layers,
dropout=dropout if num_layers > 1 else 0,
batch_first=True)
self.drop2 = nn.Dropout(dropout)
self.dense = nn.Linear(hidden_size, 1)Before tuning 86% 
After tuning 88% 

DATA 622