diff --git a/notebooks/TP2_M2TAL_learningWithNN_2223.ipynb b/notebooks/TP2_M2TAL_learningWithNN_2223.ipynb deleted file mode 100644 index c51a57e50f24a31bbc725390d30ca49d3d57e8cd..0000000000000000000000000000000000000000 --- a/notebooks/TP2_M2TAL_learningWithNN_2223.ipynb +++ /dev/null @@ -1,762 +0,0 @@ -{ - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "toc_visible": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# TP 2 : machine learning using neural network for text data\n", - "\n", - "In this practical session, we are going to build simple neural models able to classify reviews as positive or negative. The dataset used comes from AlloCine.\n", - "\n", - "The session is divided in 3 parts, note that the last one is an homework, that required that part 2 is done:\n", - "\n", - "## Part 1- Embeddings from scratch \n", - "Build a neural network that takes as input randomly initialized embedding vectors for each word. A review is represented by a vector that is the average of the word vectors.\n", - "\n", - "## Part 2- Pre-trained word embeddings \n", - "Define a neural network that takes as input pre-trained word embeddings (here FastText embeddings). A review is represented the same way.\n", - "\n", - "## Part 3: Tuning (**DM: to return on Moodle before 04/11/22**) \n", - "Tune the model build on pre-trained word embeddings (part 2) by testing several values for the different hyper-parameters, and by testing two additional architectures (i.e. one additionnal hidden layer, and an LSTM layer). Describe the performance obtained by reporting the scores for each setting on the development set, printing the loss function against the hyper-parameter values, and reporting the score of the best model on the test set. \n", - "\n", - "-------------------------------------" - ], - "metadata": { - "id": "jShhTl5Mftkw" - } - }, - { - "cell_type": "markdown", - "source": [ - "## Part 0: Read and load the data (code given)\n", - "\n", - "First, we need to understand how to use text data. Here we will start with the bag of word representation. \n", - "\n", - "You can find different ways of dealing with the input data. The simplest solution is to use the DataLoader from PyTorch: \n", - "* the doc here https://pytorch.org/docs/stable/data.html and here https://pytorch.org/tutorials/beginner/basics/data_tutorial.html\n", - "* an example of use, with numpy array: https://www.kaggle.com/arunmohan003/sentiment-analysis-using-lstm-pytorch\n", - "\n", - "You can also find many datasets for text ready to load in pytorch on: https://pytorch.org/text/stable/datasets.html" - ], - "metadata": { - "id": "Wv6H41YoFycw" - } - }, - { - "cell_type": "markdown", - "source": [ - "## Useful imports\n", - "\n", - "Here we also:\n", - "* Look at the availability of a GPU. Reminder: in Collab, you have to go to Edit/Notebook settings to set the use of a GPU\n", - "* Setting a seed, for reproducibility: https://pytorch.org/docs/stable/notes/randomness.html\n" - ], - "metadata": { - "id": "mT2uF3G6HXko" - } - }, - { - "cell_type": "code", - "source": [ - "import time\n", - "import pandas as pd\n", - "import numpy as np\n", - "# torch and torch modules to deal with text data\n", - "import torch \n", - "import torch.nn as nn\n", - "from torchtext.data.utils import get_tokenizer\n", - "from torchtext.vocab import build_vocab_from_iterator\n", - "from torch.utils.data import DataLoader\n", - "# you can use scikit to print scores\n", - "from sklearn.metrics import classification_report\n", - "\n", - "# For reproducibility, set a seed\n", - "torch.manual_seed(0) \n", - "\n", - "# Check for GPU\n", - "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", - "print(device)\n", - "\n", - "# Data files\n", - "train_file = \"allocine_train.tsv\"\n", - "dev_file = \"allocine_dev.tsv\"\n", - "test_file = \"allocine_test.tsv\"\n", - "# embeddings\n", - "embed_file='cc.fr.300.10000.vec'" - ], - "metadata": { - "id": "nB_k89m8xAOt" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 0.1 Load data (code given)\n", - "\n", - "Read the code below that allows to load the data, note that:\n", - "- we tokenize the text (here a simple tokenization based on spaces)\n", - "- we build the vocabulary corresponding to the training data:\n", - " - the vocabulary corresponds to the set of unique tokens\n", - " - only tokens in the training data are known by the system\n", - " - the vocabulary here is a Torch specific object\n" - ], - "metadata": { - "id": "04vEei9QHPou" - } - }, - { - "cell_type": "code", - "source": [ - "class Dataset(torch.utils.data.Dataset):\n", - " \n", - " def __init__(self, tsv_file, vocab=None ):\n", - " self.tsv_file = tsv_file\n", - " self.data, self.label_list = self.load_data( )\n", - " # splits the string sentence by space.\n", - " self.tokenizer = get_tokenizer( None )\n", - " self.vocab = vocab\n", - " if not vocab:\n", - " self.build_vocab()\n", - " # pipelines for text and label\n", - " self.text_pipeline = lambda x: self.vocab(self.tokenizer(x))\n", - " self.label_pipeline = lambda x: int(x) #simple mapping to self \n", - " \n", - " def load_data( self ):\n", - " data = pd.read_csv( self.tsv_file, header=0, delimiter=\"\\t\", quoting=3)\n", - " instances = []\n", - " label_list = []\n", - " for i in data.index:\n", - " label_list.append( data[\"sentiment\"][i] )\n", - " instances.append( data[\"review\"][i] )\n", - " return instances, label_list \n", - "\n", - " def build_vocab(self):\n", - " self.vocab = build_vocab_from_iterator(self.yield_tokens(), specials=[\"<unk>\"])\n", - " self.vocab.set_default_index(self.vocab[\"<unk>\"])\n", - " \n", - " def yield_tokens(self):\n", - " for text in self.data:\n", - " yield self.tokenizer(text)\n", - " \n", - " def __len__(self):\n", - " return len(self.data)\n", - " \n", - " def __getitem__(self, index):\n", - " return (\n", - " tuple( [torch.tensor(self.text_pipeline( self.data[index] ), dtype=torch.int64),\n", - " torch.tensor( self.label_pipeline( self.label_list[index] ), dtype=torch.int64) ] ) \n", - " )" - ], - "metadata": { - "id": "GdK1WAmcFYHS" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 0.2 Generate data batches and iterator (code given)\n", - "\n", - "Then, we use *torch.utils.data.DataLoader* with a Dataset object as built by the code above. DataLoader has an argument to set the size of the batches, but since we have varibale-size input sequences, we need to specify how to build the batches. This is done by redefning the function *collate_fn* used by *DataLoader*.\n", - "\n", - "```\n", - "dataloader = DataLoader(train_iter, batch_size=8, shuffle=False, collate_fn=collate_batch)\n", - "```\n", - "\n", - "Below: \n", - "* the text entries in the original data batch input are packed into a list and concatenated as a single tensor. \n", - "* the offset is a tensor of delimiters to represent the beginning index of the individual sequence in the text tensor\n", - "* Label is a tensor saving the labels of individual text entries.\n", - "\n", - "The offsets are used to retrieve the individual sequences in each batch (the sequences are concatenated)." - ], - "metadata": { - "id": "bG3T9LQFTD73" - } - }, - { - "cell_type": "code", - "source": [ - "def collate_batch(batch):\n", - " label_list, text_list, offsets = [], [], [0]\n", - " for ( _text, _label) in batch:\n", - " text_list.append( _text )\n", - " label_list.append( _label )\n", - " offsets.append(_text.size(0))\n", - " label = torch.tensor(label_list, dtype=torch.int64)\n", - " offsets = torch.tensor(offsets[:-1]).cumsum(dim=0)\n", - " text_list = torch.cat(text_list)\n", - " return text_list.to(device), label.to(device), offsets.to(device)" - ], - "metadata": { - "id": "oG0ZEYvYccBr" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 0.3 Exercise: Load the data\n", - "\n", - "* Use the code above and *DataLoader* to load the training and test data with a batch size of 2.\n", - "* Print some instances and their associated labels." - ], - "metadata": { - "id": "U0ueXxdpZcqx" - } - }, - { - "cell_type": "code", - "source": [ - "# Load training and dev sets\n" - ], - "metadata": { - "id": "sGAiiL2rY7hD" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 0.4 Exercise: understand the Vocab object\n", - "\n", - "Here the **vocabulary** is a specific object in Pytorch: https://pytorch.org/text/stable/vocab.html\n", - "\n", - "For example, the vocabulary directly converts a list of tokens into integers, see below.\n", - "\n", - "Now try to:\n", - "* Retrieve the indices of a specific word, e.g. 'mauvais'\n", - "* Retrive a word from its index, e.g. 368\n", - "* You can also directly convert a sentence to a list of indices, using the *text_pipeline* defined in the *Dataset* class, try with:\n", - " * 'Avant cette série, je ne connaissais que Urgence'\n", - " * 'Avant cette gibberish, je ne connaissais que Urgence'\n", - " * what happened when you use a word that is unknown?" - ], - "metadata": { - "id": "Tus9Kedas5dq" - } - }, - { - "cell_type": "code", - "source": [ - "train.vocab(['Avant', 'cette', 'série', ','])" - ], - "metadata": { - "id": "tb6TYA9Is5v6" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "# Retrieve the indices of a specific word, e.g. 'mauvais'\n", - "\n", - "# Retrive a word from its index, e.g. 368\n", - "\n", - "# Convert a sentence to a list of indices:\n", - "# 'Avant cette série, je ne connaissais que Urgence'\n", - "\n", - "# 'Avant cette gibberish, je ne connaissais que Urgence'\n" - ], - "metadata": { - "id": "Tj-gb9O6sXYd" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "------------------------------------------\n", - "\n", - "## Part 1- Embeddings from scratch \n", - "\n", - "We are now going to define a neural network that takes as input randomly initialized word embedding (also called Continuous Bag of words), i.e.:\n", - "\n", - "* **Each word is associated to a randomly initialized real-valued and low-dimensional vector** (e.g. 50 dimensions). Crucially, the neural network will also learn the embeddings during training (if not freezed): the embeddings of the network are also parameters that are optimized according to the loss function, allowing the model to learn a better representation of the words.\n", - "\n", - "* And **each review is represented by a vector** that should represent all the words it contains. One way to do that is to use **the average of the word vectors** (another typical option is to sum them). Instead of a bag-of-words representation of thousands of dimensions (the size of the vocabulary), we will thus end with an input vector of size e.g. 50, that represents the ‘average’, combined meaning of all the words in the document taken together. " - ], - "metadata": { - "id": "bf14asthFw9X" - } - }, - { - "cell_type": "markdown", - "source": [ - "### 1.1 Exercise: Define the model\n", - "\n", - "▶▶ **Bag of embeddings: Define the embedding layer in the __init__() function below.**\n", - "\n", - "In order to use the input embeddings, we need an embedding layer that transforms our input words to vectors of size 'embed_dim' and performs an operation on these vectors to build a representation for each document (default=mean). More specifically, we'll use the *nn.EmbeddingBag* layer: https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html\n", - "\n", - " * mode (string, optional) – \"sum\", \"mean\" or \"max\". Default=mean.\n", - "\n", - "▶▶ **Now write the rest of the code to define the neural network:**\n", - "\n", - "In the __init__(...) function, you need to:\n", - "- define a linear function that maps the input to the hidden dimensions (e.g. self.fc1)\n", - "- define an activation function, using the non-linear function sigmoid (e.g. self.sigmoid)\n", - "- define a second linear function, that takes the output of the hidden layer and maps to the output dimensions (e.g. self.fc2)\n", - "\n", - "In the forward(self, x) function, you need to:\n", - "- pass the input *x* through the first linear function\n", - "- pass the output of this linear application through the activation function\n", - "- pass the final output through the second linear function and return its output" - ], - "metadata": { - "id": "7PRplEud9XHm" - } - }, - { - "cell_type": "code", - "source": [ - "class FeedforwardNeuralNetModel2(nn.Module):\n", - " def __init__(self, vocab_size, embed_dim, hidden_dim, output_dim):\n", - " super(FeedforwardNeuralNetModel2, self).__init__()\n", - "\n", - " # Embedding layer \n", - " \n", - " # Linear function ==> W1\n", - "\n", - " # Non-linearity ==> g\n", - "\n", - " # Linear function (readout) ==> W2\n", - " \n", - "\n", - " def forward(self, text, offsets):\n", - " # Embed the input\n", - " \n", - " # Linear function # LINEAR ==> x.W1+b\n", - "\n", - " # Non-linearity # NON-LINEAR ==> h1 = g(x.W1+b)\n", - "\n", - " # Linear function (readout) # LINEAR ==> y = h1.W2\n", - "\n", - " #return out" - ], - "metadata": { - "id": "CVWapsW2sQ2J" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "**Note:** The code of the *forward* function has a 2nd argument: the 'offsets' are used to retrieve the individual documents (each document is concatenated to the others in a batch, the offsets are used to retrieve the separate documents). It has to be used also for embedding the input.\n" - ], - "metadata": { - "id": "r3jU50v8dkEI" - } - }, - { - "cell_type": "markdown", - "source": [ - "### 1.2 Exercise: Train and evaluation functions\n", - "\n", - "* Complete the training function below. A good indicator that your model is doing what is supposed to, is the loss: it should decrease during training. At the same time, the accuracy on the training set should increase.\n", - " * compute the loss and accuracy at the end of each training step (i.e. for each sample)\n", - " * Print the loss after each epoch during training\n", - " * Print the accuracy after each epoch during training\n", - "* Complete the evaluation function below. It should print the final scores (e.g. a classification report)\n", - "\n", - "Note that to we need to take into account the offsets in the training and evaluation procedures." - ], - "metadata": { - "id": "UsXmIGqApbxj" - } - }, - { - "cell_type": "code", - "source": [ - "def training(model, train_loader, optimizer, num_epochs=5 ):\n", - " for epoch in range(num_epochs):\n", - " train_loss, total_acc, total_count = 0, 0, 0\n", - " for input, label, offsets in train_loader:\n", - " # Step1. Clearing the accumulated gradients\n", - " \n", - " # Step 2. Forward pass to get output/logits\n", - " \n", - " # Step 3. Compute the loss, gradients, and update the parameters by\n", - " # calling optimizer.step()\n", - " # - Calculate Loss: softmax --> cross entropy loss\n", - " \n", - " # - Getting gradients w.r.t. parameters\n", - " \n", - " # - Updating parameters\n", - "\n", - " # Accumulating the loss over time\n", - "\n", - " total_count += label.size(0)\n", - " # Compute accuracy on train set at each epoch\n", - " print('Epoch: {}. Loss: {}. ACC {} '.format(epoch, train_loss/len(train), total_acc/len(train)))\n", - " total_acc, total_count = 0, 0\n", - " train_loss = 0\n", - "\n", - "def evaluate( model, dev_loader ):\n", - " predictions = []\n", - " gold = []\n", - " with torch.no_grad():\n", - " for input, label, offsets in dev_loader:\n", - " # ...\n", - " \n", - " return gold, predictions" - ], - "metadata": { - "id": "US_0JmN5phqs" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 1.3 Hyper-parameters\n", - "\n", - "* Below we define the values for the hyper-parameters:\n", - " * embedding dimension = 300\n", - " * hidden dimension = 4\n", - " * learning rate = 0.1\n", - " * number of epochs = 5\n", - " * using the Cross Entropy loss function\n", - "* Additionally, we also have: batch size = 2 \n", - "\n", - "\n", - "* What is the input dimension?\n", - "* What is the output dimension? " - ], - "metadata": { - "id": "NC2VtTmv-Q_c" - } - }, - { - "cell_type": "code", - "source": [ - "# Set the values of the hyperparameters\n", - "emb_dim = 300\n", - "hidden_dim = 4\n", - "learning_rate = 0.1\n", - "num_epochs = 5\n", - "criterion = nn.CrossEntropyLoss()\n", - "\n", - "output_dim = 2\n", - "vocab_size = len(train.vocab)" - ], - "metadata": { - "id": "Jod8FnWPs_Vi" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 1.4 Exercise: Run experiments\n", - "\n", - "* Initialize the model and move the model to GPU\n", - "* Define an optimizer, i.e. define the method we'll use to optimize / find the best parameters of our model: check the doc https://pytorch.org/docs/stable/optim.html and use the **SGD** optimizer. \n", - "* Train the model\n", - "* Evaluate the model on the dev set\n", - "\n" - ], - "metadata": { - "id": "HPbExtkOm-ki" - } - }, - { - "cell_type": "code", - "source": [ - "# Initialize the model\n", - "\n", - "# Define an optimizer\n", - "\n", - "# Train the model\n", - "\n", - "# Evaluate on dev\n" - ], - "metadata": { - "id": "1Xug7ygbpAhS" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "Note that we don't use here a SoftMax over the output of the final layer to obtain class probability: this is because this SoftMax application is done in the loss function chosen (*nn.CrossEntropyLoss()*). Be careful, it's not the case of all the loss functions available in PyTorch." - ], - "metadata": { - "id": "OBqQaAf6mxEI" - } - }, - { - "cell_type": "markdown", - "source": [ - "## Part 2- Using pretrained embeddings\n", - "\n", - "Using the previous continuous representations allows to reduce the number of dimensions. But these representations are initialized randomly, and we probably don't have enough data to build good representations for our problem during training. One solution is to use pre-trained word embeddings, built over very big corpora with the aim of building good representations of the meaning of words.\n", - "\n", - "Upload the file *cc.fr.300.10000.vec': first 10,000 lines of the FastText embeddings for French, https://fasttext.cc/docs/en/crawl-vectors.html." - ], - "metadata": { - "id": "UDlM7OZq56HO" - } - }, - { - "cell_type": "markdown", - "source": [ - "### 2.1 Load the vectors (code given)\n", - "\n", - "The function below loads the pre-trained embeddings, returning a dictionary mapping a word to its vector, as defined in the fasttext file. \n", - "\n", - "Note that the first line of the file gives the number of unique tokens and the size of the embeddings.\n", - "\n", - "At the end, we print the vocabulary and the size of the embeddings." - ], - "metadata": { - "id": "RX2DkAqws1gU" - } - }, - { - "cell_type": "code", - "source": [ - "import io\n", - "\n", - "def load_vectors(fname):\n", - " fin = io.open(fname, 'r', encoding='utf-8', newline='\\n', errors='ignore')\n", - " n, d = map(int, fin.readline().split())\n", - " print(\"Originally we have: \", n, 'tokens, and vectors of',d, 'dimensions') #here in fact only 10000 words\n", - " data = {}\n", - " for line in fin:\n", - " tokens = line.rstrip().split(' ')\n", - " data[tokens[0]] = [float(t) for t in tokens[1:]]\n", - " return data\n", - "\n", - "\n", - "vectors = load_vectors( embed_file )\n", - "print( 'Version with', len( vectors), 'tokens')\n", - "print(vectors.keys() )\n", - "print( vectors['de'] )" - ], - "metadata": { - "id": "yd2EEjECv4vk" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 2.2 Build the weight matrix (code given)\n", - "\n", - "Now we need to build a matrix over the dataset associating each word present in the dataset to its vector. For each word in dataset’s vocabulary, we check if it is in FastText’s vocabulary:\n", - "* if yes: load its pre-trained word vector. \n", - "* else: we initialize a random vector.\n", - "\n", - "Print the number of tokens from FastText found in the training set." - ], - "metadata": { - "id": "GTA0vXeevSuO" - } - }, - { - "cell_type": "code", - "source": [ - "emb_dim = 300\n", - "matrix_len = len(train.vocab)\n", - "weights_matrix = np.zeros((matrix_len, emb_dim))\n", - "words_found = 0\n", - "\n", - "for i in range(0, len(train.vocab)):\n", - " word = train.vocab.lookup_token(i)\n", - " try: \n", - " weights_matrix[i] = vectors[word]\n", - " words_found += 1\n", - " except KeyError:\n", - " weights_matrix[i] = np.random.normal(scale=0.6, size=(emb_dim, ))\n", - "weights_matrix = torch.from_numpy(weights_matrix)\n", - "print( weights_matrix.size())" - ], - "metadata": { - "id": "4XXFTaRxvRNk" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 2.3 Embedding layer (code given)\n", - "\n", - "The code below defines a function that builds the embedding layer using the pretrained embeddings." - ], - "metadata": { - "id": "EACvg0Uw7qje" - } - }, - { - "cell_type": "code", - "source": [ - "def create_emb_layer(weights_matrix, non_trainable=False):\n", - " num_embeddings, embedding_dim = weights_matrix.size()\n", - " emb_layer = nn.Embedding(num_embeddings, embedding_dim)\n", - " emb_layer.load_state_dict({'weight': weights_matrix}) # <----\n", - " if non_trainable:\n", - " emb_layer.weight.requires_grad = False\n", - "\n", - " return emb_layer, num_embeddings, embedding_dim" - ], - "metadata": { - "id": "e-miRfkYvZpE" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 2.4 Exercise: Model definition\n", - "\n", - "* Now modify your model to add this embedding layer. \n", - "* Train and evaluate the model.\n", - "\n", - "Note that the embedding bag now takes pre initialized weights. " - ], - "metadata": { - "id": "VcLWQgu877rQ" - } - }, - { - "cell_type": "code", - "source": [ - "class FeedforwardNeuralNetModel3(nn.Module):\n", - " def __init__(self, embed_dim, hidden_dim, output_dim, weights_matrix):\n", - " super(FeedforwardNeuralNetModel3, self).__init__()\n", - "\n", - " \n", - "\n", - " def forward(self, text, offsets):\n", - "\n", - " \n", - " #return out" - ], - "metadata": { - "id": "fXOPuCv_vZrr" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "# Initialize the model\n", - "\n", - "# Define an optimizer\n", - "\n", - "# Train the model\n", - "\n", - "# Evaluate on dev\n" - ], - "metadata": { - "id": "Y2H6r-UKXokn" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 2.5 Exercize: Plots\n", - "\n", - "Make plots of the loss and accuracy during training (one point per epoch)." - ], - "metadata": { - "id": "2LzSpj_OqK3D" - } - }, - { - "cell_type": "markdown", - "source": [ - "## Additional tricks\n", - "\n", - "### Adjusting the learning rate\n", - "\n", - "*scheduler*: *torch.optim.lr_scheduler* provides several methods to adjust the learning rate based on the number of epochs.\n", - "\n", - "* Learning rate scheduling should be applied after optimizer’s update.\n", - "* torch.optim.lr_scheduler.StepLR: Decays the learning rate of each parameter group by gamma every step_size epochs.\n", - "\n", - "https://pytorch.org/docs/stable/optim.html\n", - "\n", - "\n", - "### Weight initialization \n", - "Weight initialization is done by default uniformly. You can also specify the initialization and choose among varied options: https://pytorch.org/docs/stable/nn.init.html. Further info there https://stackoverflow.com/questions/49433936/how-to-initialize-weights-in-pytorch or there https://discuss.pytorch.org/t/clarity-on-default-initialization-in-pytorch/84696/3\n", - "\n", - "### Look at the weights learned\n", - "\n", - "```\n", - "print(model.embedding.weight[vocab.lookup_indices(['mauvais'])])\n", - "```\n", - "\n", - "* We can also explore the embeddings that are created by the architecture. Run the script in interactive mode, and issue the following commands at the python prompt :\n", - "```\n", - "m = model.layers[0].get_weights()[0]\n", - "tp3_utils.calcSim(’mauvais’, w2i, i2w, m)\n", - "```\n", - "The first line extract the embedding matrix from the model, and the second line computes the most similar embeddings for the word 'mauvais', using cosine similarity. Do the results make sense ? Try another word with a positive connotation." - ], - "metadata": { - "id": "nC_RmFH3k3QT" - } - }, - { - "cell_type": "markdown", - "source": [ - "## Part 3: Tuning your model (homework)\n", - "\n", - "The model comes with a variety of hyper-parameters. To find the best model, we need to test different values for these free parameters.\n", - "\n", - "Be careful: you always optimize / fine-tune your model on the development set. Then you compare the results obtained with the differen settings on the dev set, and finally report the results of the best model on the test set. \n", - "\n", - "For this homework, you have to test different values for the following hyper-parameters:\n", - "1. Batch size \n", - "2. Max number of epochs (with best batch size)\n", - "3. Size of the hidden layer\n", - "4. Activation function\n", - "5. Optimizer\n", - "6. Learning rate\n", - "\n", - "Inspect your model to give some hypothesis on the influence of these parameters on the model by inspecting how they affect the loss during training and the performance of the model. \n", - "\n", - "Once done, evaluate two variations of the architecture of the model (here you don't need to test different hyper-parameter values, you can for example keep the best ones from the previous experiments):\n", - "\n", - "7. Try with 1 additional hidden layer\n", - "8. Try with an LSTM layer \n" - ], - "metadata": { - "id": "1HmIthzRumir" - } - } - ] -} \ No newline at end of file diff --git a/notebooks/TP2_masterLiTL_2223.ipynb b/notebooks/TP2_masterLiTL_2223.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..d6894faa2ab094e3dec3ca9c6e1cc1339bc44239 --- /dev/null +++ b/notebooks/TP2_masterLiTL_2223.ipynb @@ -0,0 +1,1562 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU", + "gpuClass": "standard" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "XCHhtzOXQ2po" + }, + "source": [ + "# TP 2: Linear Algebra and Feedforward neural network\n", + "Master LiTL - 2022-2023\n", + "\n", + "## Requirements\n", + "In this section, we will go through some code to learn how to manipulate matrices and tensors, and we will take a look at some PyTorch code that allows to define, train and evaluate a simple neural network. \n", + "The modules used are the the same as in the previous session, *Numpy* and *Scikit*, with the addition of *PyTorch*. They are all already available within colab. \n", + "\n", + "## Part 1: Linear Algebra\n", + "\n", + "In this section, we will go through some python code to deal with matrices and also tensors, the data structures used in PyTorch.\n", + "\n", + "Sources: \n", + "* Linear Algebra explained in the context of deep learning: https://towardsdatascience.com/linear-algebra-explained-in-the-context-of-deep-learning-8fcb8fca1494\n", + "* PyTorch tutorial: https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py\n", + "* PyTorch doc on tensors: https://pytorch.org/docs/stable/torch.html\n" + ] + }, + { + "cell_type": "code", + "source": [ + "# Useful imports\n", + "import numpy as np\n", + "import torch" + ], + "metadata": { + "id": "2t2sdvtdsrjO" + }, + "execution_count": 10, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G3Hk9fJuBVxk" + }, + "source": [ + "## 1.1 Numpy arrays\n", + "\n", + "NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### 1.1.1 Numpy arrays\n", + "\n", + "▶▶ **Look at the code below and check that you understand each line:**\n", + "* We define a numpy array (i.e. a vector) **x** from a list\n", + "* We define a numpy array of shape 3x2 (i.e. a matrix) initialized with random numbers, called **W**\n", + "* We define a scalar, **b**\n", + "* Finally, with all these elements, we can compute **h = W.x + b**" + ], + "metadata": { + "id": "5hfuybaGeOX_" + } + }, + { + "cell_type": "code", + "metadata": { + "id": "W2IvCK4gPUAv", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "93e683f7-5f46-46f2-c3e1-9f9cab85f19f" + }, + "source": [ + "x = np.array([1,2])\n", + "print(\"Our input vector with 2 elements:\\n\", x)\n", + "print( \"x shape:\", x.shape) \n", + "\n", + "print( \"x data type\", x.dtype)\n", + "# Give a list of elements\n", + "# a = np.array(1,2,3,4) # WRONG\n", + "# a = np.array([1,2,3,4]) # RIGHT\n", + "\n", + "# Generate a random matrix (with a generator and a seed, for reproducible results)\n", + "rng = np.random.default_rng(seed=42)\n", + "W = rng.random((3, 2))\n", + "print(\"\\n Our weight matrix, of shape 3x2:\\n\", W)\n", + "print( \"W shape:\", W.shape)\n", + "print( \"W data type\", W.dtype)\n", + "\n", + "# Bias, a scalar\n", + "b = 1\n", + "\n", + "# Now, try to multiply\n", + "h = W.dot(x) + b\n", + "print(\"\\n Our h layer:\\n\", h)\n", + "print( \"h shape:\", h.shape)\n", + "print( \"h data type\", h.dtype)" + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Our input vector with 2 elements:\n", + " [1 2]\n", + "x shape: (2,)\n", + "x data type int64\n", + "\n", + " Our weight matrix, of shape 3x2:\n", + " [[0.77395605 0.43887844]\n", + " [0.85859792 0.69736803]\n", + " [0.09417735 0.97562235]]\n", + "W shape: (3, 2)\n", + "W data type float64\n", + "\n", + " Our h layer:\n", + " [2.65171293 3.25333398 3.04542205]\n", + "h shape: (3,)\n", + "h data type float64\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### 1.1.2 Operations on arrays\n", + "\n", + "▶▶ **Look at the code below and check that you understand each line:**\n", + "* How to reshape a matrix i.e. change its dimensions\n", + "* How to compute the transpose of a vector / matrix" + ], + "metadata": { + "id": "L18_HL5qfvFO" + } + }, + { + "cell_type": "code", + "metadata": { + "id": "hKzJk0aaPUv4", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "18ed343f-e231-4418-876b-0807617a599a" + }, + "source": [ + "# Useful transformations\n", + "h = h.reshape((3,1))\n", + "print(\"\\n h reshape:\\n\", h)\n", + "print( \"h shape:\", h.shape)\n", + "\n", + "h1 = np.transpose(h)\n", + "print(\"\\n h transpose:\\n\", h1)\n", + "print( \"h shape:\", h1.shape)\n", + "\n", + "h2 = h.T\n", + "print(\"\\n h transpose:\\n\", h2)\n", + "print( \"h shape:\", h2.shape)\n", + "\n", + "Wt = W.T\n", + "print(\"\\nW:\\n\", W)\n", + "print(\"\\nW.T:\\n\", Wt)" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + " h reshape:\n", + " [[2.65171293]\n", + " [3.25333398]\n", + " [3.04542205]]\n", + "h shape: (3, 1)\n", + "\n", + " h transpose:\n", + " [[2.65171293 3.25333398 3.04542205]]\n", + "h shape: (1, 3)\n", + "\n", + " h transpose:\n", + " [[2.65171293 3.25333398 3.04542205]]\n", + "h shape: (1, 3)\n", + "\n", + "W:\n", + " [[0.77395605 0.43887844]\n", + " [0.85859792 0.69736803]\n", + " [0.09417735 0.97562235]]\n", + "\n", + "W.T:\n", + " [[0.77395605 0.85859792 0.09417735]\n", + " [0.43887844 0.69736803 0.97562235]]\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "▶▶ **A last note: creating an identity matrix**" + ], + "metadata": { + "id": "O_p_oGvRhnkF" + } + }, + { + "cell_type": "code", + "metadata": { + "id": "KpIkzqN6PaJR", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "f8a8a89a-41ed-478e-8926-74395695ca65" + }, + "source": [ + "## numpy code to create identity matrix\n", + "a = np.eye(4)\n", + "print(a)" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[[1. 0. 0. 0.]\n", + " [0. 1. 0. 0.]\n", + " [0. 0. 1. 0.]\n", + " [0. 0. 0. 1.]]\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Il-lX6VCA7gk" + }, + "source": [ + "## 1.2 Tensors\n", + "\n", + "For neural networks implementation in PyTorch, we use tensors: \n", + "* a specialized data structure that are very similar to arrays and matrices\n", + "* used to encode the inputs and outputs of a model, as well as the model’s parameters\n", + "* similar to NumPy’s ndarrays, except that tensors can run on GPUs or other specialized hardware to accelerate computing" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hPqpGGZPCRT-" + }, + "source": [ + "### 1.2.1 Tensor initialization\n", + "\n", + "▶▶ **Look at the code below and check that you understand each line:**\n", + "* We define a PyTorch tensor (i.e. a matrix) **x_data** from a list of list\n", + "* We define a PyTorch tensor (i.e. a matrix) **x_np** from a numpy array\n", + "* How to initialize an random tensor, an one tensor and a zero tensor\n", + "* Finally, we define a PyTorch tensor (i.e. a matrix) from another tensor:\n", + " * **x_ones**: from the identity tensor\n", + " * **x_rand**: from a tensor initialized with random values" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "HaEdsMG6BAh0", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "8be5a975-8f88-4555-a247-a96c23aa7f94" + }, + "source": [ + "# Tensor initialization\n", + "\n", + "## from data. The data type is automatically inferred.\n", + "data = [[1, 2], [3, 4]]\n", + "x_data = torch.tensor(data)\n", + "print( \"x_data\", x_data)\n", + "print( \"data type x_data=\", x_data.dtype)" + ], + "execution_count": 14, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "x_data tensor([[1, 2],\n", + " [3, 4]])\n", + "data type x_data= torch.int64\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "## from a numpy array\n", + "np_array = np.array(data)\n", + "x_np = torch.from_numpy(np_array)\n", + "print(\"\\nx_np\", x_np)\n", + "print( \"data type, np_array=\", np_array.dtype, \"x_data=\", x_np.dtype)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "zk8cQYUZwGJn", + "outputId": "da19b701-11d9-47bc-c02a-ddfcff3a9d33" + }, + "execution_count": 15, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "x_np tensor([[1, 2],\n", + " [3, 4]])\n", + "data type, np_array= int64 x_data= torch.int64\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "## with random values / ones / zeros\n", + "shape = (2, 3,) # shape is a tuple of tensor dimensions\n", + "rand_tensor = torch.rand(shape)\n", + "ones_tensor = torch.ones(shape)\n", + "zeros_tensor = torch.zeros(shape)\n", + "\n", + "print(f\"Random Tensor: \\n {rand_tensor} \\n\")\n", + "print(f\"Ones Tensor: \\n {ones_tensor} \\n\")\n", + "print(f\"Zeros Tensor: \\n {zeros_tensor}\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "tgacq0XuwIM7", + "outputId": "c5c9f8eb-d6ad-4b6d-8efc-e2a78335ff79" + }, + "execution_count": 16, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Random Tensor: \n", + " tensor([[0.0832, 0.3685, 0.7852],\n", + " [0.3742, 0.1828, 0.8512]]) \n", + "\n", + "Ones Tensor: \n", + " tensor([[1., 1., 1.],\n", + " [1., 1., 1.]]) \n", + "\n", + "Zeros Tensor: \n", + " tensor([[0., 0., 0.],\n", + " [0., 0., 0.]])\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "## from another tensor\n", + "x_ones = torch.ones_like(x_data) # retains the properties of x_data\n", + "print(f\"\\nFrom Ones Tensor: \\n {x_ones} \\n\")\n", + "\n", + "x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data\n", + "print(f\"From Random Tensor: \\n {x_rand} \\n\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "f1Ffpc6dwJ4l", + "outputId": "bac75dda-5f58-437e-d561-ad13dd5a594a" + }, + "execution_count": 17, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "From Ones Tensor: \n", + " tensor([[1, 1],\n", + " [1, 1]]) \n", + "\n", + "From Random Tensor: \n", + " tensor([[0.6470, 0.4097],\n", + " [0.1857, 0.6956]]) \n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oFDVEZcBCWF_" + }, + "source": [ + "### 1.2.2 Tensor attributes\n", + "\n", + "▶▶ **A tensor has different attributes, print the values for:**\n", + "* shape of the tensor\n", + "* type of the data stored \n", + "* device on which data are stored\n", + "\n", + "Look at the doc here: https://www.tensorflow.org/api_docs/python/tf/Tensor#shape" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kS4TtR9DCJcq", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "58a8b84c-c034-4f41-d267-cb55e89cc1c8" + }, + "source": [ + "# Tensor attributes\n", + "tensor = torch.rand(3, 4)\n", + "\n", + "print(f\"Shape of tensor: {tensor.shape}\")\n", + "print(f\"Datatype of tensor: {tensor.dtype}\")\n", + "print(f\"Device tensor is stored on: {tensor.device}\")" + ], + "execution_count": 18, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Shape of tensor: torch.Size([3, 4])\n", + "Datatype of tensor: torch.float32\n", + "Device tensor is stored on: cpu\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tu8RM6O7CaKO" + }, + "source": [ + "### 1.2.3 Move to GPU\n", + "\n", + "The code below is used to:\n", + "* check on which device the code is running, 'cuda' stands for GPU. If not GPU is found that we use CPU.\n", + "\n", + "\n", + "▶▶ **Check and move to GPU:**\n", + "* Run the code, it should say 'no cpu'\n", + "* Move to GPU: in Colab, allocate a GPU by going to Edit > Notebook Settings (Modifier > Paramètres du notebook)\n", + " * you'll see an indicator of connexion in the uppper right part of the screen\n", + "* Run the code from 1.2 again and the cell below (you can use the function Run / Run before or Exécution / Exécuter avant), you'll need to do all the imports again. You see the difference?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nT7n30VpCOzF", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "60fe8c2f-cf57-4944-f92d-631d448a4642" + }, + "source": [ + "# We move our tensor to the GPU if available\n", + "if torch.cuda.is_available():\n", + " tensor = tensor.to('cuda')\n", + " print(f\"Device tensor is stored on: {tensor.device}\")\n", + "else:\n", + " print(\"no gpu\")\n", + "\n", + "print(tensor)" + ], + "execution_count": 19, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Device tensor is stored on: cuda:0\n", + "tensor([[0.7117, 0.8822, 0.6255, 0.6968],\n", + " [0.7856, 0.1178, 0.7001, 0.6381],\n", + " [0.2431, 0.9820, 0.0646, 0.1509]], device='cuda:0')\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VdqHVRkHCcgq" + }, + "source": [ + "Below, run after moving to GPU." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nyZPKBvOGsyf", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "7b338a6f-cf14-4759-aae4-604760c04238" + }, + "source": [ + "# We move our tensor to the GPU if available\n", + "if torch.cuda.is_available():\n", + " tensor = tensor.to('cuda')\n", + " print(f\"Device tensor is stored on: {tensor.device}\")\n", + "else:\n", + " print(\"no gpu\")\n", + "\n", + "print(tensor)" + ], + "execution_count": 20, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Device tensor is stored on: cuda:0\n", + "tensor([[0.7117, 0.8822, 0.6255, 0.6968],\n", + " [0.7856, 0.1178, 0.7001, 0.6381],\n", + " [0.2431, 0.9820, 0.0646, 0.1509]], device='cuda:0')\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8um7SDWGCp8o" + }, + "source": [ + "### 1.2.4 Tensor operations\n", + "\n", + "Doc: https://pytorch.org/docs/stable/torch.html\n", + "\n", + "▶▶ **Slicing operations:**\n", + "* Below we use slicing operations to modify tensors" + ] + }, + { + "cell_type": "code", + "source": [ + "# Tensor operations: similar to numpy arrays\n", + "tensor = torch.ones(4, 4)\n", + "print(tensor)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "BgF-ypEurJCk", + "outputId": "ceeb7f21-39e5-4aa0-c816-f021df5a4fd6" + }, + "execution_count": 21, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[1., 1., 1., 1.],\n", + " [1., 1., 1., 1.],\n", + " [1., 1., 1., 1.],\n", + " [1., 1., 1., 1.]])\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "7yLviqmYC3sZ", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "abbfdd22-226d-4dad-cd45-42c629996576" + }, + "source": [ + "# ---------------------------------------------------------\n", + "# TODO: What do you expect?\n", + "# ---------------------------------------------------------\n", + "## Slicing\n", + "print(\"\\nSlicing\")\n", + "tensor[:,1] = 0 \n", + "print(tensor)\n", + "\n", + "# ---------------------------------------------------------\n", + "# TODO: Change the first column with the value in l\n", + "# ---------------------------------------------------------\n", + "l =[1.,2.,3.,4.] \n", + "l = torch.tensor( l )\n", + "tensor[:, 0] = l\n", + "print(tensor)" + ], + "execution_count": 22, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "Slicing\n", + "tensor([[1., 0., 1., 1.],\n", + " [1., 0., 1., 1.],\n", + " [1., 0., 1., 1.],\n", + " [1., 0., 1., 1.]])\n", + "tensor([[1., 0., 1., 1.],\n", + " [2., 0., 1., 1.],\n", + " [3., 0., 1., 1.],\n", + " [4., 0., 1., 1.]])\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "▶▶ **Other operations:**\n", + "* Check the code below that performs:\n", + " * tensor concatenation\n", + " * tensor multiplication" + ], + "metadata": { + "id": "uCZ2AWPmrW6q" + } + }, + { + "cell_type": "code", + "source": [ + "## Concatenation\n", + "print(\"\\nConcatenate tensor 3 times\")\n", + "print('Original tensor:\\n', tensor, '\\n')\n", + "\n", + "t1 = torch.cat([tensor, tensor, tensor], dim=1)\n", + "print(t1)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "t_AkQSnarNX8", + "outputId": "4ced19e6-7f5f-4f81-c1c9-87eafb05063b" + }, + "execution_count": 31, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "Concatenate tensor 3 times\n", + "Original tensor:\n", + " tensor([[1., 0., 1., 1.],\n", + " [2., 0., 1., 1.],\n", + " [3., 0., 1., 1.],\n", + " [4., 0., 1., 1.]]) \n", + "\n", + "tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],\n", + " [2., 0., 1., 1., 2., 0., 1., 1., 2., 0., 1., 1.],\n", + " [3., 0., 1., 1., 3., 0., 1., 1., 3., 0., 1., 1.],\n", + " [4., 0., 1., 1., 4., 0., 1., 1., 4., 0., 1., 1.]])\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "## Multiplication: element_wise\n", + "print(\"\\nMultiply tensor by itself, elementwise\")\n", + "print('Original tensor:\\n', tensor, '\\n')\n", + "\n", + "# This computes the element-wise product\n", + "t2 = tensor.mul(tensor)\n", + "print(f\"tensor.mul(tensor) \\n {t2} \\n\")\n", + "\n", + "# Alternative syntax:\n", + "t3 = tensor * tensor\n", + "print(f\"tensor * tensor \\n {t3}\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QT_J5emmwd9H", + "outputId": "a4fb788a-b9d4-4162-fe5d-e2d79cef1763" + }, + "execution_count": 30, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "Multiply tensor by itself\n", + "Original tensor:\n", + " tensor([[1., 0., 1., 1.],\n", + " [2., 0., 1., 1.],\n", + " [3., 0., 1., 1.],\n", + " [4., 0., 1., 1.]]) \n", + "\n", + "tensor.mul(tensor) \n", + " tensor([[ 1., 0., 1., 1.],\n", + " [ 4., 0., 1., 1.],\n", + " [ 9., 0., 1., 1.],\n", + " [16., 0., 1., 1.]]) \n", + "\n", + "tensor * tensor \n", + " tensor([[ 1., 0., 1., 1.],\n", + " [ 4., 0., 1., 1.],\n", + " [ 9., 0., 1., 1.],\n", + " [16., 0., 1., 1.]])\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "## Matrix multiplication\n", + "print(\"\\nMultiply tensor by itself\")\n", + "print('Original tensor:\\n', tensor, '\\n')\n", + "\n", + "t4 = tensor.matmul(tensor.T)\n", + "print(f\"tensor.matmul(tensor.T) \\n {t4} \\n\")\n", + "# Alternative syntax:\n", + "t5 = tensor @ tensor.T\n", + "print(f\"tensor @ tensor.T \\n {t5}\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "PQrelqWYwfsX", + "outputId": "84fb8345-1daf-40a3-8e21-e6342db5742b" + }, + "execution_count": 32, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "Multiply tensor by itself\n", + "Original tensor:\n", + " tensor([[1., 0., 1., 1.],\n", + " [2., 0., 1., 1.],\n", + " [3., 0., 1., 1.],\n", + " [4., 0., 1., 1.]]) \n", + "\n", + "tensor.matmul(tensor.T) \n", + " tensor([[ 3., 4., 5., 6.],\n", + " [ 4., 6., 8., 10.],\n", + " [ 5., 8., 11., 14.],\n", + " [ 6., 10., 14., 18.]]) \n", + "\n", + "tensor @ tensor.T \n", + " tensor([[ 3., 4., 5., 6.],\n", + " [ 4., 6., 8., 10.],\n", + " [ 5., 8., 11., 14.],\n", + " [ 6., 10., 14., 18.]])\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5ulTT2k_Hs97" + }, + "source": [ + "### 1.2.5 Tensor operations on GPU\n", + "\n", + "The tensor is stored on CPU by default.\n", + "\n", + "▶▶ **Initialize the tensor using *device='cuda'*: where are stored t1, ..., t5?**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "atwxGd1_IdxI", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0fda45ea-e1fe-4bb3-c500-c4fda76379ce" + }, + "source": [ + "# Tensor operations: similar to numpy arrays\n", + "\n", + "tensor = torch.ones(4, 4, device='cuda')\n", + "print(tensor)\n", + "\n", + "# ---------------------------------------------------------\n", + "# TODO: What do you expect?\n", + "# ---------------------------------------------------------\n", + "## Slicing\n", + "print(\"\\nSlicing\")\n", + "tensor[:,1] = 0 \n", + "print(tensor)\n", + "\n", + "# ---------------------------------------------------------\n", + "# TODO: Change the first column with the value in l\n", + "# ---------------------------------------------------------\n", + "l =[1.,2.,3.,4.] \n", + "l = torch.tensor( l )\n", + "tensor[:, 0] = l\n", + "print(tensor)\n", + "\n", + "\n", + "## Concatenation\n", + "print(\"\\nConcatenate\")\n", + "t1 = torch.cat([tensor, tensor, tensor], dim=1)\n", + "print(t1)\n", + "\n", + "## Multiplication: element_wise\n", + "print(\"\\nMultiply\")\n", + "# This computes the element-wise product\n", + "t2 = tensor.mul(tensor)\n", + "print(f\"tensor.mul(tensor) \\n {t2} \\n\")\n", + "# Alternative syntax:\n", + "t3 = tensor * tensor\n", + "print(f\"tensor * tensor \\n {t3}\")\n", + "\n", + "## Matrix multiplication\n", + "t4 = tensor.matmul(tensor.T)\n", + "print(f\"tensor.matmul(tensor.T) \\n {t4} \\n\")\n", + "# Alternative syntax:\n", + "t5 = tensor @ tensor.T\n", + "print(f\"tensor @ tensor.T \\n {t5}\")" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[1., 1., 1., 1.],\n", + " [1., 1., 1., 1.],\n", + " [1., 1., 1., 1.],\n", + " [1., 1., 1., 1.]], device='cuda:0')\n", + "\n", + "Slicing\n", + "tensor([[1., 0., 1., 1.],\n", + " [1., 0., 1., 1.],\n", + " [1., 0., 1., 1.],\n", + " [1., 0., 1., 1.]], device='cuda:0')\n", + "tensor([[1., 0., 1., 1.],\n", + " [2., 0., 1., 1.],\n", + " [3., 0., 1., 1.],\n", + " [4., 0., 1., 1.]], device='cuda:0')\n", + "\n", + "Concatenate\n", + "tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],\n", + " [2., 0., 1., 1., 2., 0., 1., 1., 2., 0., 1., 1.],\n", + " [3., 0., 1., 1., 3., 0., 1., 1., 3., 0., 1., 1.],\n", + " [4., 0., 1., 1., 4., 0., 1., 1., 4., 0., 1., 1.]], device='cuda:0')\n", + "\n", + "Multiply\n", + "tensor.mul(tensor) \n", + " tensor([[ 1., 0., 1., 1.],\n", + " [ 4., 0., 1., 1.],\n", + " [ 9., 0., 1., 1.],\n", + " [16., 0., 1., 1.]], device='cuda:0') \n", + "\n", + "tensor * tensor \n", + " tensor([[ 1., 0., 1., 1.],\n", + " [ 4., 0., 1., 1.],\n", + " [ 9., 0., 1., 1.],\n", + " [16., 0., 1., 1.]], device='cuda:0')\n", + "tensor.matmul(tensor.T) \n", + " tensor([[ 3., 4., 5., 6.],\n", + " [ 4., 6., 8., 10.],\n", + " [ 5., 8., 11., 14.],\n", + " [ 6., 10., 14., 18.]], device='cuda:0') \n", + "\n", + "tensor @ tensor.T \n", + " tensor([[ 3., 4., 5., 6.],\n", + " [ 4., 6., 8., 10.],\n", + " [ 5., 8., 11., 14.],\n", + " [ 6., 10., 14., 18.]], device='cuda:0')\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UxW1jtX-GOfd" + }, + "source": [ + "### 1.2.5 Final exercise: compute *h*\n", + "\n", + "▶▶ **Compute the tensor h, using the same data for x and W as at the beginning of this TP.**\n", + "\n", + "* Define x as a tensor, print x, its shape and dtype\n", + "* Define W as a tensor, print W, its shape and dtype\n", + "* bias is style a scalar, of type float\n", + "* Finally compute h, print h and its dtype\n", + "\n", + "```\n", + "x = np.array([1,2])\n", + "rng = np.random.default_rng(seed=42)\n", + "W = rng.random((3, 2))\n", + "```\n", + "\n", + "Important note: when multiplying matrices, we need to have the same data type, e.g. not **x** with *int* and **W** with *float*.\n", + "So you have to say that the vector **x** has the data type *float*. Two ways:\n", + "* from the initialization: **x = torch.tensor([1,2], dtype=float)**\n", + "* from any tensor: **x = x.to( torch.float64)** (here using only **float** would give *float32*, not what we want) " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lwIanFgWD_YJ" + }, + "source": [ + "# --------------------------------------------------------\n", + "# TODO: Write the code to compute h = W.x+b\n", + "# --------------------------------------------------------\n", + "\n", + "# Define x\n", + "# ...\n", + "\n", + "\n", + "# Define W: generate a random matrix (with e generator, for reproducible results)\n", + "# ...\n", + "\n", + "# Bias, a scalar\n", + "# ...\n", + "\n", + "# Now, try to multiply\n", + "# ..." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "na_tJOnfGDIz" + }, + "source": [ + "### Last minor note" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lql9bH39G4Mw" + }, + "source": [ + "## Operations that have a _ suffix are in-place. For example: x.copy_(y), x.t_(), will change x.\n", + "print(tensor, \"\\n\")\n", + "tensor.add_(5)\n", + "print(tensor)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DGmy-dtuOtiw" + }, + "source": [ + "# Part 2: Feedforward Neural Network\n", + "\n", + "In this practical session, we will explore a simple neural network architecture for NLP applications ; specifically, we will train a feedforward neural network for sentiment analysis, using the same dataset of reviews as in the previous session. We will also keep the bag of words representation. \n", + "\n", + "\n", + "Sources:\n", + "* This TP is inspired by a TP by Tim van de Cruys\n", + "* https://www.deeplearningwizard.com/deep_learning/practical_pytorch/pytorch_feedforward_neuralnetwork/\n", + "* https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html\n", + "* https://medium.com/swlh/sentiment-classification-using-feed-forward-neural-network-in-pytorch-655811a0913f \n", + "* https://www.deeplearningwizard.com/deep_learning/practical_pytorch/pytorch_feedforward_neuralnetwork/" + ] + }, + { + "cell_type": "code", + "source": [ + "# Useful imports\n", + "import pandas as pd\n", + "import numpy as np\n", + "import re\n", + "import sklearn\n", + "\n", + "from sklearn.feature_extraction.text import CountVectorizer\n", + "\n", + "import torch\n", + "from torch.utils.data import TensorDataset, DataLoader\n", + "import torch.nn as nn" + ], + "metadata": { + "id": "TKukE_hAAn_2" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Path to data\n", + "train_path = \"allocine_train.tsv\"\n", + "dev_path = \"allocine_dev.tsv\"" + ], + "metadata": { + "id": "iUxRwO37Ap8h" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wdSyhJqpVczO" + }, + "source": [ + "## 2.1 Read and load the data\n", + "\n", + "Here we will keep the bag of word representation, as in the previous session. \n", + "\n", + "You can find different ways of dealing with the input data in PyTorch. The simplest solution is to use the DataLoader from PyTorch: \n", + "* the doc here https://pytorch.org/docs/stable/data.html and here https://pytorch.org/tutorials/beginner/basics/data_tutorial.html\n", + "* an example of use, with numpy array: https://www.kaggle.com/arunmohan003/sentiment-analysis-using-lstm-pytorch\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "You can also find many datasets for text ready to load in pytorch on: https://pytorch.org/text/stable/datasets.html" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CxRbwziSV_BY" + }, + "source": [ + "#### 2.1.1 Build BoW vectors (code given)\n", + "\n", + "The code below allows to use scikit methods you already know to generate the bag of word representation." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SoVJ18s_oxkn" + }, + "source": [ + "# This will be the size of the vectors reprensenting the input\n", + "MAX_FEATURES = 5000 \n", + "\n", + "def vectorize_data( data_path, vectorizer=None ):\n", + " data_df = pd.read_csv( data_path, header=0,\n", + " delimiter=\"\\t\", quoting=3)\n", + " # If an existing vectorizer is not given, initialize the \"CountVectorizer\" \n", + " # object, which is scikit-learn's bag of words tool. \n", + " if not vectorizer:\n", + " vectorizer = CountVectorizer(\n", + " analyzer = \"word\",\n", + " max_features = MAX_FEATURES\n", + " ) \n", + " vectorizer.fit(data_df[\"review\"])\n", + " # Then transform the data\n", + " x_data = vectorizer.transform(data_df[\"review\"])\n", + " # Vectorize also the labels\n", + " y_data = np.asarray(data_df[\"sentiment\"])\n", + " return x_data, y_data, vectorizer \n", + "\n", + "x_train, y_train, vectorizer = vectorize_data( train_path )\n", + "x_dev, y_dev, _ = vectorize_data( dev_path, vectorizer )\n", + "\n", + "# Count_Vectorizer returns sparse arrays (for computational reasons)\n", + "# but PyTorch will expect dense input:\n", + "x_train = x_train.toarray()\n", + "x_dev = x_dev.toarray()\n", + "\n", + "print(\"Train:\", x_train.shape)\n", + "print(\"Dev:\", x_dev.shape)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Mt00MaMmW1_P" + }, + "source": [ + "#### 2.1.2 Transform to tensors\n", + "\n", + "▶▶ **Create a dataset object within the PyTorch library:**\n", + "\n", + "Now we need to transform our data to tensors, in order to provide them as input to PyTorch. Follow the following steps:\n", + "\n", + "* 1- **torch.from_numpy( A_NUMPY_ARRAY )**: transform your array into a tensor\n", + " * Note: you need to transform tensor type to float, with **MY_TENSOR.to(torch.float)** (or cryptic error saying it was expecting long...).\n", + " * Print the shape of the tensor for your training data.\n", + "* 2- **torch.utils.data.TensorDataset(INPUT_TENSOR, TARGET_TENSOR)**: Dataset wrapping tensors. In particular: giv\n", + " * Take tensors as inputs, \n", + " \n", + "* 3- **torch.utils.data.DataLoader**: many arguments in the constructor:\n", + " * In particular, *dataset* of the type TensorDataset can be used\n", + " * We'd rather shuffling our data in general, can be done here by changing the value of one argument\n", + " * Note also the possibility to change the batch_size, we'll talk about it later\n", + "\n", + "```\n", + "DataLoader(\n", + " dataset,\n", + " batch_size=1,\n", + " shuffle=False,\n", + " num_workers=0,\n", + " collate_fn=None,\n", + " pin_memory=False,\n", + " )\n", + " ```\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JMLPp3vnoxnG" + }, + "source": [ + "# create Tensor dataset, i.e. torch.from_numpy( A_NUMPY_ARRAY ). \n", + "# for x\n", + "# ...\n", + "\n", + "# for y\n", + "# ...\n", + "\n", + "# Print x shape\n", + "# ...\n", + "\n", + "# TensorDataset(INPUT_TENSOR, TARGET_TENSOR)\n", + "# ...\n", + "\n", + "# DataLoader( dataset, ...): \n", + "## - make sure to SHUFFLE your data\n", + "## - use batch_size = 1 (i.e. no batch)\n", + "# ..." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zOeZCY09o6CV" + }, + "source": [ + "## 2.2 Neural Network\n", + "\n", + "Now we can build our learning model.\n", + "\n", + "For this TP, we're going to walk through the code of a **simple feedforward neural network, with one hidden layer**.\n", + "\n", + "This network takes as input bag of words vectors, exactly as our 'classic' models: each review is represented by a vector of the size the number of tokens in the vocabulary with '1' when a word is present and '0' for the other words. " + ] + }, + { + "cell_type": "markdown", + "source": [ + "### 2.2.1 Questions\n", + "\n", + "▶▶ **What is the input dimension?** \n", + "\n", + "▶▶ **What is the output dimension?** " + ], + "metadata": { + "id": "5KOM7ofrKUte" + } + }, + { + "cell_type": "markdown", + "source": [ + "### 2.2.2 Write the skeleton of the class\n", + "\n", + "▶▶ We're going to **define our own neural network type**, by defining a new class: \n", + "* The class is called **FeedforwardNeuralNetModel**\n", + "* it inherits from the class **nn.Module**\n", + "* the constructor takes the following arguments:\n", + " * size of the input (i.e. **input_dim**)\n", + " * size of the hidden layer (i.e. **hidden_dim**)\n", + " * size of the output layer (i.e. **output_dim**)\n", + "* in the constructor, we will call the constructor of the parent class\n", + "\n" + ], + "metadata": { + "id": "bE4RgHUkGnGl" + } + }, + { + "cell_type": "code", + "source": [ + "# Start to define the class corresponding to our type of neural network \n", + "\n" + ], + "metadata": { + "id": "uKcge-oBG1HV" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### 2.2.3 Write the constructor\n", + "\n", + "▶▶ To continue the definition of our class, we need to explain how are built each layer of our network.\n", + "\n", + "More precisely, we're going to define:\n", + "* a function corresponding to the action of our hidden layer: \n", + " * what kind of function is it ?\n", + " * you need to indicate the size of the input and output for this function, what are they?\n", + "* a non linear function, that will be used on the ouput of our hidden layer\n", + "* a final output function: \n", + " * what kind of function is it ?\n", + " * you need to indicate the size of the input and output for this function, what are they? \n", + "\n", + "All the functions that can be used in Pytorch are defined here: https://pytorch.org/docs/stable/nn.functional.html\n", + "\n", + "Do you see things that you know?\n", + "\n", + "Hint: here you define fields of your class, these fields corresponding to specific kind of functions. \n", + "E.g. you're going to initialize a field such as **self.fc1=SPECIFIC_TYPE_OF_FCT(expected arguments)**." + ], + "metadata": { + "id": "0BHUuGKCHoU9" + } + }, + { + "cell_type": "code", + "source": [ + "# Continue the definition of the class by defining three functions in your constructor\n", + "\n" + ], + "metadata": { + "id": "LN3aSTSaJNkp" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### 2.2.4 Write the **forward** method\n", + "\n", + "The main function we have to write when defining a neural network is called the **forward** function.\n", + "This function computes the outputs of the network (the logit), it is thus used to train the network.\n", + "It details how we apply the functions defined in the constructor. \n", + "\n", + "Let's define this function, with the following signature, where x is the input to the network:\n", + "```\n", + "def forward(self, x):\n", + "```\n", + "\n", + "▶▶ Follow the steps:\n", + "* 1- Apply the first linear functiond defined in the constructor to **x**, i.e. go through the hidden layer.\n", + "* 2- Apply the non linear function to the output of step 1, i.e. use the activation function.\n", + "* 3- Apply the second linear functiond defined in the constructor to the output of step 2, i.e. go through the output layer.\n", + "* 4- Return the output of step 3.\n", + "\n", + "You're done!" + ], + "metadata": { + "id": "e2IMSprgKJ7K" + } + }, + { + "cell_type": "code", + "source": [ + " # Copy paste the rest of the definition of the class below\n", + " # ...\n", + " \n", + " # Define the forward function, used to make all the calculations\n", + " # through the network\n", + " def forward(self, x):\n", + " ''' y = g(x.W1+b).W2 '''\n", + " # ..." + ], + "metadata": { + "id": "8z-QpBt2NOlu" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## 2.3 Training the network\n", + "\n", + "Now we can use our beautiful class to define and then train our own neural network." + ], + "metadata": { + "id": "sBrDXfQbO5yq" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oWLDfLGxpBvn" + }, + "source": [ + "### 2.3.1 Hyper-parameters\n", + "\n", + "We need to set up the values for the hyper-parameters, and define the form of the loss and the optimization methods.\n", + "\n", + "▶▶ **Check that you understand what are each of the variables below** \n", + "* one that you prabably don't know is the learning rate, we'll explain it in the next course. Broadly speaking, it corresponds to the amount of update used during training." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fcGyjXbUoxx9" + }, + "source": [ + "# Many choices here!\n", + "VOCAB_SIZE = MAX_FEATURES\n", + "input_dim = VOCAB_SIZE \n", + "hidden_dim = 4\n", + "output_dim = 2\n", + "num_epochs = 5\n", + "learning_rate = 0.1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### 2.3.2 Loss function\n", + "\n", + "Another thing that has to be decided is the kind of loss function we want to use.\n", + "Here we use a common one, called CrossEntropy. \n", + "We will come back in more details on this loss.\n", + "One important note is that this function in PyTorch includes the SoftMax function that should be applied after the output layer to get labels." + ], + "metadata": { + "id": "yyJINiVHPoWq" + } + }, + { + "cell_type": "code", + "source": [ + "criterion = nn.CrossEntropyLoss()" + ], + "metadata": { + "id": "TVVy7hhrPl-K" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### 2.3.3 Initialization of the model\n", + "\n", + "Now you can instantiate your class: define a model that is of the type FeedforwardNeuralNetModel using the values defined before as hyper-parameters." + ], + "metadata": { + "id": "kyY91BtPQIeo" + } + }, + { + "cell_type": "code", + "source": [ + "# Initialization of the model\n", + "# ..." + ], + "metadata": { + "id": "hk_nev2-Q0m-" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### 2.3.4 Optimizer\n", + "\n", + "At last, we need to indicate the method we want to use to optimize our network.\n", + "Here, we use a common one called Stochastic Gradient Descent.\n", + "We will also go back on that later on.\n", + "\n", + "Note that its arguments are:\n", + "* the parameters of our models (the Ws)\n", + "* the learning rate\n", + "Based on these information, it can make the necessary updates. \n" + ], + "metadata": { + "id": "wBjNtZ-bQfSQ" + } + }, + { + "cell_type": "code", + "source": [ + "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)" + ], + "metadata": { + "id": "A8AY0bU8Qhyf" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OPt_VbCMqoD2" + }, + "source": [ + "### 2.3.5 Training the network (code given)\n", + "\n", + "A simple code to train the neural network is given below.\n", + "\n", + "▶▶ **Run the code and look at the loss after each training step.** " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OnNx8hZJox3v" + }, + "source": [ + "# Start training\n", + "for epoch in range(num_epochs):\n", + " train_loss, total_acc, total_count = 0, 0, 0\n", + "\n", + " # for each instance + its associated label\n", + " for input, label in train_loader:\n", + "\n", + " # Clearing the accumulated gradients\n", + " # torch *accumulates* gradients. Before passing in a\n", + " # new instance, you need to zero out the gradients from the old\n", + " # instance\n", + " # Clear gradients w.r.t. parameters\n", + " optimizer.zero_grad()\n", + "\n", + " # ==> Forward pass to get output/logits \n", + " # = apply all our functions: y = g(x.W1+b).W2\n", + " outputs = model( input )\n", + "\n", + " # ==> Calculate Loss: softmax --> cross entropy loss\n", + " loss = criterion(outputs, label)\n", + "\n", + " # Getting gradients w.r.t. parameters\n", + " # Here is the way to find how to modify the parameters in\n", + " # order to lower the loss\n", + " loss.backward()\n", + "\n", + " # ==> Updating parameters: you don t need to provide the loss here,\n", + " # when computing the loss, the information is saved in the parameters\n", + " # (more precisely, doing backward computes the gradients for all tensors,\n", + " # and these gradients are saved by each tensor)\n", + " optimizer.step()\n", + "\n", + " # -- a useful print\n", + " # Accumulating the loss over time\n", + " train_loss += loss.item()\n", + " total_acc += (outputs.argmax(1) == label).sum().item()\n", + " total_count += label.size(0)\n", + "\n", + " # Compute accuracy on train set at each epoch\n", + " print('Epoch: {}. Loss: {}. ACC {} '.format(epoch, \n", + " train_loss/x_train.shape[0], \n", + " total_acc/x_train.shape[0]))\n", + " \n", + " total_acc, total_count = 0, 0\n", + " train_loss = 0" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tzMl5wdnqtCW" + }, + "source": [ + "### 2.3.6 Evaluate the model (code given)" + ] + }, + { + "cell_type": "code", + "source": [ + "# Useful imports\n", + "from sklearn.metrics import classification_report" + ], + "metadata": { + "id": "N8wxX85sSyPM" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "ldDubAPDox5K" + }, + "source": [ + "# create Tensor dataset\n", + "valid_data = TensorDataset( torch.from_numpy(x_dev).to(torch.float), \n", + " torch.from_numpy(y_dev))\n", + "valid_loader = DataLoader( valid_data )\n", + "\n", + "\n", + "# Disabling gradient calculation is useful for inference, \n", + "# when you are sure that you will not call Tensor.backward(). \n", + "predictions, gold = [], []\n", + "with torch.no_grad():\n", + " for input, label in valid_loader:\n", + " probs = model(input)\n", + " predictions.append( torch.argmax(probs, dim=1).cpu().numpy()[0] )\n", + " gold.append(int(label))\n", + "\n", + "print(classification_report(gold, predictions))" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file