"# TP 2 : machine learning using neural network for text data\n",
"\n",
"In this practical session, we are going to build simple neural models able to classify reviews as positive or negative. The dataset used comes from AlloCine.\n",
"\n",
"The session is divided in 3 parts, note that the last one is an homework, that required that part 2 is done:\n",
"\n",
"## Part 1- Embeddings from scratch \n",
"Build a neural network that takes as input randomly initialized embedding vectors for each word. A review is represented by a vector that is the average of the word vectors.\n",
"\n",
"## Part 2- Pre-trained word embeddings \n",
"Define a neural network that takes as input pre-trained word embeddings (here FastText embeddings). A review is represented the same way.\n",
"\n",
"## Part 3: Tuning (**DM: to return on Moodle before 04/11/22**) \n",
"Tune the model build on pre-trained word embeddings (part 2) by testing several values for the different hyper-parameters, and by testing two additional architectures (i.e. one additionnal hidden layer, and an LSTM layer). Describe the performance obtained by reporting the scores for each setting on the development set, printing the loss function against the hyper-parameter values, and reporting the score of the best model on the test set. \n",
"\n",
"-------------------------------------"
],
"metadata": {
"id": "jShhTl5Mftkw"
}
},
{
"cell_type": "markdown",
"source": [
"## Part 0: Read and load the data (code given)\n",
"\n",
"First, we need to understand how to use text data. Here we will start with the bag of word representation. \n",
"\n",
"You can find different ways of dealing with the input data. The simplest solution is to use the DataLoader from PyTorch: \n",
"* the doc here https://pytorch.org/docs/stable/data.html and here https://pytorch.org/tutorials/beginner/basics/data_tutorial.html\n",
"* an example of use, with numpy array: https://www.kaggle.com/arunmohan003/sentiment-analysis-using-lstm-pytorch\n",
"\n",
"You can also find many datasets for text ready to load in pytorch on: https://pytorch.org/text/stable/datasets.html"
],
"metadata": {
"id": "Wv6H41YoFycw"
}
},
{
"cell_type": "markdown",
"source": [
"## Useful imports\n",
"\n",
"Here we also:\n",
"* Look at the availability of a GPU. Reminder: in Collab, you have to go to Edit/Notebook settings to set the use of a GPU\n",
"* Setting a seed, for reproducibility: https://pytorch.org/docs/stable/notes/randomness.html\n"
],
"metadata": {
"id": "mT2uF3G6HXko"
}
},
{
"cell_type": "code",
"source": [
"import time\n",
"import pandas as pd\n",
"import numpy as np\n",
"# torch and torch modules to deal with text data\n",
"### 0.2 Generate data batches and iterator (code given)\n",
"\n",
"Then, we use *torch.utils.data.DataLoader* with a Dataset object as built by the code above. DataLoader has an argument to set the size of the batches, but since we have varibale-size input sequences, we need to specify how to build the batches. This is done by redefning the function *collate_fn* used by *DataLoader*.\n",
"* Use the code above and *DataLoader* to load the training and test data with a batch size of 2.\n",
"* Print some instances and their associated labels."
],
"metadata": {
"id": "U0ueXxdpZcqx"
}
},
{
"cell_type": "code",
"source": [
"# Load training and dev sets\n"
],
"metadata": {
"id": "sGAiiL2rY7hD"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 0.4 Exercise: understand the Vocab object\n",
"\n",
"Here the **vocabulary** is a specific object in Pytorch: https://pytorch.org/text/stable/vocab.html\n",
"\n",
"For example, the vocabulary directly converts a list of tokens into integers, see below.\n",
"\n",
"Now try to:\n",
"* Retrieve the indices of a specific word, e.g. 'mauvais'\n",
"* Retrive a word from its index, e.g. 368\n",
"* You can also directly convert a sentence to a list of indices, using the *text_pipeline* defined in the *Dataset* class, try with:\n",
" * 'Avant cette série, je ne connaissais que Urgence'\n",
" * 'Avant cette gibberish, je ne connaissais que Urgence'\n",
" * what happened when you use a word that is unknown?"
],
"metadata": {
"id": "Tus9Kedas5dq"
}
},
{
"cell_type": "code",
"source": [
"train.vocab(['Avant', 'cette', 'série', ','])"
],
"metadata": {
"id": "tb6TYA9Is5v6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Retrieve the indices of a specific word, e.g. 'mauvais'\n",
"\n",
"# Retrive a word from its index, e.g. 368\n",
"\n",
"# Convert a sentence to a list of indices:\n",
"# 'Avant cette série, je ne connaissais que Urgence'\n",
"\n",
"# 'Avant cette gibberish, je ne connaissais que Urgence'\n"
],
"metadata": {
"id": "Tj-gb9O6sXYd"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"------------------------------------------\n",
"\n",
"## Part 1- Embeddings from scratch \n",
"\n",
"We are now going to define a neural network that takes as input randomly initialized word embedding (also called Continuous Bag of words), i.e.:\n",
"\n",
"* **Each word is associated to a randomly initialized real-valued and low-dimensional vector** (e.g. 50 dimensions). Crucially, the neural network will also learn the embeddings during training (if not freezed): the embeddings of the network are also parameters that are optimized according to the loss function, allowing the model to learn a better representation of the words.\n",
"\n",
"* And **each review is represented by a vector** that should represent all the words it contains. One way to do that is to use **the average of the word vectors** (another typical option is to sum them). Instead of a bag-of-words representation of thousands of dimensions (the size of the vocabulary), we will thus end with an input vector of size e.g. 50, that represents the ‘average’, combined meaning of all the words in the document taken together. "
],
"metadata": {
"id": "bf14asthFw9X"
}
},
{
"cell_type": "markdown",
"source": [
"### 1.1 Exercise: Define the model\n",
"\n",
"▶▶ **Bag of embeddings: Define the embedding layer in the __init__() function below.**\n",
"\n",
"In order to use the input embeddings, we need an embedding layer that transforms our input words to vectors of size 'embed_dim' and performs an operation on these vectors to build a representation for each document (default=mean). More specifically, we'll use the *nn.EmbeddingBag* layer: https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html\n",
" # Linear function (readout) # LINEAR ==> y = h1.W2\n",
"\n",
" #return out"
],
"metadata": {
"id": "CVWapsW2sQ2J"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"**Note:** The code of the *forward* function has a 2nd argument: the 'offsets' are used to retrieve the individual documents (each document is concatenated to the others in a batch, the offsets are used to retrieve the separate documents). It has to be used also for embedding the input.\n"
],
"metadata": {
"id": "r3jU50v8dkEI"
}
},
{
"cell_type": "markdown",
"source": [
"### 1.2 Exercise: Train and evaluation functions\n",
"\n",
"* Complete the training function below. A good indicator that your model is doing what is supposed to, is the loss: it should decrease during training. At the same time, the accuracy on the training set should increase.\n",
" * compute the loss and accuracy at the end of each training step (i.e. for each sample)\n",
" * Print the loss after each epoch during training\n",
" * Print the accuracy after each epoch during training\n",
"* Complete the evaluation function below. It should print the final scores (e.g. a classification report)\n",
"\n",
"Note that to we need to take into account the offsets in the training and evaluation procedures."
"* Below we define the values for the hyper-parameters:\n",
" * embedding dimension = 300\n",
" * hidden dimension = 4\n",
" * learning rate = 0.1\n",
" * number of epochs = 5\n",
" * using the Cross Entropy loss function\n",
"* Additionally, we also have: batch size = 2 \n",
"\n",
"\n",
"* What is the input dimension?\n",
"* What is the output dimension? "
],
"metadata": {
"id": "NC2VtTmv-Q_c"
}
},
{
"cell_type": "code",
"source": [
"# Set the values of the hyperparameters\n",
"emb_dim = 300\n",
"hidden_dim = 4\n",
"learning_rate = 0.1\n",
"num_epochs = 5\n",
"criterion = nn.CrossEntropyLoss()\n",
"\n",
"output_dim = 2\n",
"vocab_size = len(train.vocab)"
],
"metadata": {
"id": "Jod8FnWPs_Vi"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 1.4 Exercise: Run experiments\n",
"\n",
"* Initialize the model and move the model to GPU\n",
"* Define an optimizer, i.e. define the method we'll use to optimize / find the best parameters of our model: check the doc https://pytorch.org/docs/stable/optim.html and use the **SGD** optimizer. \n",
"* Train the model\n",
"* Evaluate the model on the dev set\n",
"\n"
],
"metadata": {
"id": "HPbExtkOm-ki"
}
},
{
"cell_type": "code",
"source": [
"# Initialize the model\n",
"\n",
"# Define an optimizer\n",
"\n",
"# Train the model\n",
"\n",
"# Evaluate on dev\n"
],
"metadata": {
"id": "1Xug7ygbpAhS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Note that we don't use here a SoftMax over the output of the final layer to obtain class probability: this is because this SoftMax application is done in the loss function chosen (*nn.CrossEntropyLoss()*). Be careful, it's not the case of all the loss functions available in PyTorch."
],
"metadata": {
"id": "OBqQaAf6mxEI"
}
},
{
"cell_type": "markdown",
"source": [
"## Part 2- Using pretrained embeddings\n",
"\n",
"Using the previous continuous representations allows to reduce the number of dimensions. But these representations are initialized randomly, and we probably don't have enough data to build good representations for our problem during training. One solution is to use pre-trained word embeddings, built over very big corpora with the aim of building good representations of the meaning of words.\n",
"\n",
"Upload the file *cc.fr.300.10000.vec': first 10,000 lines of the FastText embeddings for French, https://fasttext.cc/docs/en/crawl-vectors.html."
],
"metadata": {
"id": "UDlM7OZq56HO"
}
},
{
"cell_type": "markdown",
"source": [
"### 2.1 Load the vectors (code given)\n",
"\n",
"The function below loads the pre-trained embeddings, returning a dictionary mapping a word to its vector, as defined in the fasttext file. \n",
"\n",
"Note that the first line of the file gives the number of unique tokens and the size of the embeddings.\n",
"\n",
"At the end, we print the vocabulary and the size of the embeddings."
],
"metadata": {
"id": "RX2DkAqws1gU"
}
},
{
"cell_type": "code",
"source": [
"import io\n",
"\n",
"def load_vectors(fname):\n",
" fin = io.open(fname, 'r', encoding='utf-8', newline='\\n', errors='ignore')\n",
" n, d = map(int, fin.readline().split())\n",
" print(\"Originally we have: \", n, 'tokens, and vectors of',d, 'dimensions') #here in fact only 10000 words\n",
" data = {}\n",
" for line in fin:\n",
" tokens = line.rstrip().split(' ')\n",
" data[tokens[0]] = [float(t) for t in tokens[1:]]\n",
"Now we need to build a matrix over the dataset associating each word present in the dataset to its vector. For each word in dataset’s vocabulary, we check if it is in FastText’s vocabulary:\n",
"* if yes: load its pre-trained word vector. \n",
"* else: we initialize a random vector.\n",
"\n",
"Print the number of tokens from FastText found in the training set."
"Make plots of the loss and accuracy during training (one point per epoch)."
],
"metadata": {
"id": "2LzSpj_OqK3D"
}
},
{
"cell_type": "markdown",
"source": [
"## Additional tricks\n",
"\n",
"### Adjusting the learning rate\n",
"\n",
"*scheduler*: *torch.optim.lr_scheduler* provides several methods to adjust the learning rate based on the number of epochs.\n",
"\n",
"* Learning rate scheduling should be applied after optimizer’s update.\n",
"* torch.optim.lr_scheduler.StepLR: Decays the learning rate of each parameter group by gamma every step_size epochs.\n",
"\n",
"https://pytorch.org/docs/stable/optim.html\n",
"\n",
"\n",
"### Weight initialization \n",
"Weight initialization is done by default uniformly. You can also specify the initialization and choose among varied options: https://pytorch.org/docs/stable/nn.init.html. Further info there https://stackoverflow.com/questions/49433936/how-to-initialize-weights-in-pytorch or there https://discuss.pytorch.org/t/clarity-on-default-initialization-in-pytorch/84696/3\n",
"* We can also explore the embeddings that are created by the architecture. Run the script in interactive mode, and issue the following commands at the python prompt :\n",
"```\n",
"m = model.layers[0].get_weights()[0]\n",
"tp3_utils.calcSim(’mauvais’, w2i, i2w, m)\n",
"```\n",
"The first line extract the embedding matrix from the model, and the second line computes the most similar embeddings for the word 'mauvais', using cosine similarity. Do the results make sense ? Try another word with a positive connotation."
],
"metadata": {
"id": "nC_RmFH3k3QT"
}
},
{
"cell_type": "markdown",
"source": [
"## Part 3: Tuning your model (homework)\n",
"\n",
"The model comes with a variety of hyper-parameters. To find the best model, we need to test different values for these free parameters.\n",
"\n",
"Be careful: you always optimize / fine-tune your model on the development set. Then you compare the results obtained with the differen settings on the dev set, and finally report the results of the best model on the test set. \n",
"\n",
"For this homework, you have to test different values for the following hyper-parameters:\n",
"1. Batch size \n",
"2. Max number of epochs (with best batch size)\n",
"3. Size of the hidden layer\n",
"4. Activation function\n",
"5. Optimizer\n",
"6. Learning rate\n",
"\n",
"Inspect your model to give some hypothesis on the influence of these parameters on the model by inspecting how they affect the loss during training and the performance of the model. \n",
"\n",
"Once done, evaluate two variations of the architecture of the model (here you don't need to test different hyper-parameter values, you can for example keep the best ones from the previous experiments):\n",
"\n",
"7. Try with 1 additional hidden layer\n",
"8. Try with an LSTM layer \n"
],
"metadata": {
"id": "1HmIthzRumir"
}
}
]
}
\ No newline at end of file
%% Cell type:markdown id: tags:
# TP 2 : machine learning using neural network for text data
In this practical session, we are going to build simple neural models able to classify reviews as positive or negative. The dataset used comes from AlloCine.
The session is divided in 3 parts, note that the last one is an homework, that required that part 2 is done:
## Part 1- Embeddings from scratch
Build a neural network that takes as input randomly initialized embedding vectors for each word. A review is represented by a vector that is the average of the word vectors.
## Part 2- Pre-trained word embeddings
Define a neural network that takes as input pre-trained word embeddings (here FastText embeddings). A review is represented the same way.
## Part 3: Tuning (**DM: to return on Moodle before 04/11/22**)
Tune the model build on pre-trained word embeddings (part 2) by testing several values for the different hyper-parameters, and by testing two additional architectures (i.e. one additionnal hidden layer, and an LSTM layer). Describe the performance obtained by reporting the scores for each setting on the development set, printing the loss function against the hyper-parameter values, and reporting the score of the best model on the test set.
-------------------------------------
%% Cell type:markdown id: tags:
## Part 0: Read and load the data (code given)
First, we need to understand how to use text data. Here we will start with the bag of word representation.
You can find different ways of dealing with the input data. The simplest solution is to use the DataLoader from PyTorch:
* the doc here https://pytorch.org/docs/stable/data.html and here https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
* an example of use, with numpy array: https://www.kaggle.com/arunmohan003/sentiment-analysis-using-lstm-pytorch
You can also find many datasets for text ready to load in pytorch on: https://pytorch.org/text/stable/datasets.html
%% Cell type:markdown id: tags:
## Useful imports
Here we also:
* Look at the availability of a GPU. Reminder: in Collab, you have to go to Edit/Notebook settings to set the use of a GPU
* Setting a seed, for reproducibility: https://pytorch.org/docs/stable/notes/randomness.html
%% Cell type:code id: tags:
```
import time
import pandas as pd
import numpy as np
# torch and torch modules to deal with text data
import torch
import torch.nn as nn
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator
from torch.utils.data import DataLoader
# you can use scikit to print scores
from sklearn.metrics import classification_report
# For reproducibility, set a seed
torch.manual_seed(0)
# Check for GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)
# Data files
train_file = "allocine_train.tsv"
dev_file = "allocine_dev.tsv"
test_file = "allocine_test.tsv"
# embeddings
embed_file='cc.fr.300.10000.vec'
```
%% Cell type:markdown id: tags:
### 0.1 Load data (code given)
Read the code below that allows to load the data, note that:
- we tokenize the text (here a simple tokenization based on spaces)
- we build the vocabulary corresponding to the training data:
- the vocabulary corresponds to the set of unique tokens
- only tokens in the training data are known by the system
### 0.2 Generate data batches and iterator (code given)
Then, we use *torch.utils.data.DataLoader* with a Dataset object as built by the code above. DataLoader has an argument to set the size of the batches, but since we have varibale-size input sequences, we need to specify how to build the batches. This is done by redefning the function *collate_fn* used by *DataLoader*.
* Use the code above and *DataLoader* to load the training and test data with a batch size of 2.
* Print some instances and their associated labels.
%% Cell type:code id: tags:
```
# Load training and dev sets
```
%% Cell type:markdown id: tags:
### 0.4 Exercise: understand the Vocab object
Here the **vocabulary** is a specific object in Pytorch: https://pytorch.org/text/stable/vocab.html
For example, the vocabulary directly converts a list of tokens into integers, see below.
Now try to:
* Retrieve the indices of a specific word, e.g. 'mauvais'
* Retrive a word from its index, e.g. 368
* You can also directly convert a sentence to a list of indices, using the *text_pipeline* defined in the *Dataset* class, try with:
* 'Avant cette série, je ne connaissais que Urgence'
* 'Avant cette gibberish, je ne connaissais que Urgence'
* what happened when you use a word that is unknown?
%% Cell type:code id: tags:
```
train.vocab(['Avant', 'cette', 'série', ','])
```
%% Cell type:code id: tags:
```
# Retrieve the indices of a specific word, e.g. 'mauvais'
# Retrive a word from its index, e.g. 368
# Convert a sentence to a list of indices:
# 'Avant cette série, je ne connaissais que Urgence'
# 'Avant cette gibberish, je ne connaissais que Urgence'
```
%% Cell type:markdown id: tags:
------------------------------------------
## Part 1- Embeddings from scratch
We are now going to define a neural network that takes as input randomly initialized word embedding (also called Continuous Bag of words), i.e.:
***Each word is associated to a randomly initialized real-valued and low-dimensional vector** (e.g. 50 dimensions). Crucially, the neural network will also learn the embeddings during training (if not freezed): the embeddings of the network are also parameters that are optimized according to the loss function, allowing the model to learn a better representation of the words.
* And **each review is represented by a vector** that should represent all the words it contains. One way to do that is to use **the average of the word vectors** (another typical option is to sum them). Instead of a bag-of-words representation of thousands of dimensions (the size of the vocabulary), we will thus end with an input vector of size e.g. 50, that represents the ‘average’, combined meaning of all the words in the document taken together.
%% Cell type:markdown id: tags:
### 1.1 Exercise: Define the model
▶▶ **Bag of embeddings: Define the embedding layer in the __init__() function below.**
In order to use the input embeddings, we need an embedding layer that transforms our input words to vectors of size 'embed_dim' and performs an operation on these vectors to build a representation for each document (default=mean). More specifically, we'll use the *nn.EmbeddingBag* layer: https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html
* mode (string, optional) – "sum", "mean" or "max". Default=mean.
▶▶ **Now write the rest of the code to define the neural network:**
In the __init__(...) function, you need to:
- define a linear function that maps the input to the hidden dimensions (e.g. self.fc1)
- define an activation function, using the non-linear function sigmoid (e.g. self.sigmoid)
- define a second linear function, that takes the output of the hidden layer and maps to the output dimensions (e.g. self.fc2)
In the forward(self, x) function, you need to:
- pass the input *x* through the first linear function
- pass the output of this linear application through the activation function
- pass the final output through the second linear function and return its output
# Linear function (readout) # LINEAR ==> y = h1.W2
#return out
```
%% Cell type:markdown id: tags:
**Note:** The code of the *forward* function has a 2nd argument: the 'offsets' are used to retrieve the individual documents (each document is concatenated to the others in a batch, the offsets are used to retrieve the separate documents). It has to be used also for embedding the input.
%% Cell type:markdown id: tags:
### 1.2 Exercise: Train and evaluation functions
* Complete the training function below. A good indicator that your model is doing what is supposed to, is the loss: it should decrease during training. At the same time, the accuracy on the training set should increase.
* compute the loss and accuracy at the end of each training step (i.e. for each sample)
* Print the loss after each epoch during training
* Print the accuracy after each epoch during training
* Complete the evaluation function below. It should print the final scores (e.g. a classification report)
Note that to we need to take into account the offsets in the training and evaluation procedures.
* Below we define the values for the hyper-parameters:
* embedding dimension = 300
* hidden dimension = 4
* learning rate = 0.1
* number of epochs = 5
* using the Cross Entropy loss function
* Additionally, we also have: batch size = 2
* What is the input dimension?
* What is the output dimension?
%% Cell type:code id: tags:
```
# Set the values of the hyperparameters
emb_dim = 300
hidden_dim = 4
learning_rate = 0.1
num_epochs = 5
criterion = nn.CrossEntropyLoss()
output_dim = 2
vocab_size = len(train.vocab)
```
%% Cell type:markdown id: tags:
### 1.4 Exercise: Run experiments
* Initialize the model and move the model to GPU
* Define an optimizer, i.e. define the method we'll use to optimize / find the best parameters of our model: check the doc https://pytorch.org/docs/stable/optim.html and use the **SGD** optimizer.
* Train the model
* Evaluate the model on the dev set
%% Cell type:code id: tags:
```
# Initialize the model
# Define an optimizer
# Train the model
# Evaluate on dev
```
%% Cell type:markdown id: tags:
Note that we don't use here a SoftMax over the output of the final layer to obtain class probability: this is because this SoftMax application is done in the loss function chosen (*nn.CrossEntropyLoss()*). Be careful, it's not the case of all the loss functions available in PyTorch.
%% Cell type:markdown id: tags:
## Part 2- Using pretrained embeddings
Using the previous continuous representations allows to reduce the number of dimensions. But these representations are initialized randomly, and we probably don't have enough data to build good representations for our problem during training. One solution is to use pre-trained word embeddings, built over very big corpora with the aim of building good representations of the meaning of words.
Upload the file *cc.fr.300.10000.vec': first 10,000 lines of the FastText embeddings for French, https://fasttext.cc/docs/en/crawl-vectors.html.
%% Cell type:markdown id: tags:
### 2.1 Load the vectors (code given)
The function below loads the pre-trained embeddings, returning a dictionary mapping a word to its vector, as defined in the fasttext file.
Note that the first line of the file gives the number of unique tokens and the size of the embeddings.
At the end, we print the vocabulary and the size of the embeddings.
%% Cell type:code id: tags:
```
import io
def load_vectors(fname):
fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
n, d = map(int, fin.readline().split())
print("Originally we have: ", n, 'tokens, and vectors of',d, 'dimensions') #here in fact only 10000 words
data = {}
for line in fin:
tokens = line.rstrip().split(' ')
data[tokens[0]] = [float(t) for t in tokens[1:]]
return data
vectors = load_vectors( embed_file )
print( 'Version with', len( vectors), 'tokens')
print(vectors.keys() )
print( vectors['de'] )
```
%% Cell type:markdown id: tags:
### 2.2 Build the weight matrix (code given)
Now we need to build a matrix over the dataset associating each word present in the dataset to its vector. For each word in dataset’s vocabulary, we check if it is in FastText’s vocabulary:
* if yes: load its pre-trained word vector.
* else: we initialize a random vector.
Print the number of tokens from FastText found in the training set.
Make plots of the loss and accuracy during training (one point per epoch).
%% Cell type:markdown id: tags:
## Additional tricks
### Adjusting the learning rate
*scheduler*: *torch.optim.lr_scheduler* provides several methods to adjust the learning rate based on the number of epochs.
* Learning rate scheduling should be applied after optimizer’s update.
* torch.optim.lr_scheduler.StepLR: Decays the learning rate of each parameter group by gamma every step_size epochs.
https://pytorch.org/docs/stable/optim.html
### Weight initialization
Weight initialization is done by default uniformly. You can also specify the initialization and choose among varied options: https://pytorch.org/docs/stable/nn.init.html. Further info there https://stackoverflow.com/questions/49433936/how-to-initialize-weights-in-pytorch or there https://discuss.pytorch.org/t/clarity-on-default-initialization-in-pytorch/84696/3
* We can also explore the embeddings that are created by the architecture. Run the script in interactive mode, and issue the following commands at the python prompt :
```
m = model.layers[0].get_weights()[0]
tp3_utils.calcSim(’mauvais’, w2i, i2w, m)
```
The first line extract the embedding matrix from the model, and the second line computes the most similar embeddings for the word 'mauvais', using cosine similarity. Do the results make sense ? Try another word with a positive connotation.
%% Cell type:markdown id: tags:
## Part 3: Tuning your model (homework)
The model comes with a variety of hyper-parameters. To find the best model, we need to test different values for these free parameters.
Be careful: you always optimize / fine-tune your model on the development set. Then you compare the results obtained with the differen settings on the dev set, and finally report the results of the best model on the test set.
For this homework, you have to test different values for the following hyper-parameters:
1. Batch size
2. Max number of epochs (with best batch size)
3. Size of the hidden layer
4. Activation function
5. Optimizer
6. Learning rate
Inspect your model to give some hypothesis on the influence of these parameters on the model by inspecting how they affect the loss during training and the performance of the model.
Once done, evaluate two variations of the architecture of the model (here you don't need to test different hyper-parameter values, you can for example keep the best ones from the previous experiments):