add TP2 correction

29a110cb · chloebt · b42c252c · 29a110cb
Commit 29a110cb authored 1 year ago by chloebt
--- a/notebooks/TP2_m2LiTL_LinAlg_FFNN_2324_CORRECT.ipynb
+++ b/notebooks/TP2_m2LiTL_LinAlg_FFNN_2324_CORRECT.ipynb
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "toc_visible": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU",
+    "gpuClass": "standard"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "XCHhtzOXQ2po"
+      },
+      "source": [
+        "# TP 2: Linear Algebra and Feedforward neural network\n",
+        "Master LiTL - 2023-2024\n",
+        "\n",
+        "## Requirements\n",
+        "In this section, we will go through some code to learn how to manipulate matrices and tensors, and we will take a look at some PyTorch code that allows to define, train and evaluate a simple neural network.\n",
+        "The modules used are the the same as in the previous session, *Numpy* and *Scikit*, with the addition of *PyTorch*. They are all already available within colab.\n",
+        "\n",
+        "## Part 1: Linear Algebra\n",
+        "\n",
+        "In this section, we will go through some python code to deal with matrices and also tensors, the data structures used in PyTorch.\n",
+        "\n",
+        "Sources:    \n",
+        "* Linear Algebra explained in the context of deep learning: https://towardsdatascience.com/linear-algebra-explained-in-the-context-of-deep-learning-8fcb8fca1494\n",
+        "* PyTorch tutorial: https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py\n",
+        "* PyTorch doc on tensors: https://pytorch.org/docs/stable/torch.html\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Useful imports\n",
+        "import numpy as np\n",
+        "import torch"
+      ],
+      "metadata": {
+        "id": "2t2sdvtdsrjO"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "G3Hk9fJuBVxk"
+      },
+      "source": [
+        "## 1.1 Numpy arrays\n",
+        "\n",
+        "NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 1.1.1 Numpy arrays\n",
+        "\n",
+        "▶▶ **Look at the code below and check that you understand each line:**\n",
+        "* We define a numpy array (i.e. a vector) **x** from a list\n",
+        "* We define a numpy array of shape 3x2 (i.e. a matrix) initialized with random numbers, called **W**\n",
+        "* We define a scalar, **b**\n",
+        "* Finally, with all these elements, we can compute **h = W.x + b**"
+      ],
+      "metadata": {
+        "id": "5hfuybaGeOX_"
+      }
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "W2IvCK4gPUAv",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "ece647f4-10d4-4251-850a-848e8c07d748"
+      },
+      "source": [
+        "x = np.array([1,2])\n",
+        "print(\"Our input vector with 2 elements:\\n\", x)\n",
+        "print( \"x shape:\", x.shape)\n",
+        "\n",
+        "print( \"x data type\", x.dtype)\n",
+        "# Give a list of elements\n",
+        "# a = np.array(1,2,3,4)    # WRONG\n",
+        "# a = np.array([1,2,3,4])  # RIGHT\n",
+        "\n",
+        "# Generate a random matrix (with a generator and a seed, for reproducible results)\n",
+        "rng = np.random.default_rng(seed=42)\n",
+        "W = rng.random((3, 2))\n",
+        "print(\"\\n Our weight matrix, of shape 3x2:\\n\", W)\n",
+        "print( \"W shape:\", W.shape)\n",
+        "print( \"W data type\", W.dtype)\n",
+        "\n",
+        "# Bias, a scalar\n",
+        "b = 1\n",
+        "\n",
+        "# Now, try to multiply\n",
+        "h = W.dot(x) + b\n",
+        "print(\"\\n Our h layer:\\n\", h)\n",
+        "print( \"h shape:\", h.shape)\n",
+        "print( \"h data type\", h.dtype)"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Our input vector with 2 elements:\n",
+            " [1 2]\n",
+            "x shape: (2,)\n",
+            "x data type int64\n",
+            "\n",
+            " Our weight matrix, of shape 3x2:\n",
+            " [[0.77395605 0.43887844]\n",
+            " [0.85859792 0.69736803]\n",
+            " [0.09417735 0.97562235]]\n",
+            "W shape: (3, 2)\n",
+            "W data type float64\n",
+            "\n",
+            " Our h layer:\n",
+            " [2.65171293 3.25333398 3.04542205]\n",
+            "h shape: (3,)\n",
+            "h data type float64\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 1.1.2 Operations on arrays\n",
+        "\n",
+        "▶▶ **Look at the code below and check that you understand each line:**\n",
+        "* How to reshape a matrix i.e. change its dimensions\n",
+        "* How to compute the transpose of a vector / matrix"
+      ],
+      "metadata": {
+        "id": "L18_HL5qfvFO"
+      }
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "hKzJk0aaPUv4",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "89f0668b-dc76-404c-d83f-53b559f737d3"
+      },
+      "source": [
+        "# Useful transformations\n",
+        "h = h.reshape((3,1))\n",
+        "print(\"\\n h reshape:\\n\", h)\n",
+        "print( \"h shape:\", h.shape)\n",
+        "\n",
+        "h1 = np.transpose(h)\n",
+        "print(\"\\n h transpose:\\n\", h1)\n",
+        "print( \"h shape:\", h1.shape)\n",
+        "\n",
+        "h2 = h.T\n",
+        "print(\"\\n h transpose:\\n\", h2)\n",
+        "print( \"h shape:\", h2.shape)\n",
+        "\n",
+        "Wt = W.T\n",
+        "print(\"\\nW:\\n\", W)\n",
+        "print(\"\\nW.T:\\n\", Wt)"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "\n",
+            " h reshape:\n",
+            " [[2.65171293]\n",
+            " [3.25333398]\n",
+            " [3.04542205]]\n",
+            "h shape: (3, 1)\n",
+            "\n",
+            " h transpose:\n",
+            " [[2.65171293 3.25333398 3.04542205]]\n",
+            "h shape: (1, 3)\n",
+            "\n",
+            " h transpose:\n",
+            " [[2.65171293 3.25333398 3.04542205]]\n",
+            "h shape: (1, 3)\n",
+            "\n",
+            "W:\n",
+            " [[0.77395605 0.43887844]\n",
+            " [0.85859792 0.69736803]\n",
+            " [0.09417735 0.97562235]]\n",
+            "\n",
+            "W.T:\n",
+            " [[0.77395605 0.85859792 0.09417735]\n",
+            " [0.43887844 0.69736803 0.97562235]]\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "▶▶ **A last note: creating an identity matrix**"
+      ],
+      "metadata": {
+        "id": "O_p_oGvRhnkF"
+      }
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "KpIkzqN6PaJR",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "204ff237-4369-4511-b15d-51a3fed6324a"
+      },
+      "source": [
+        "## numpy code to create identity matrix\n",
+        "a = np.eye(4)\n",
+        "print(a)"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "[[1. 0. 0. 0.]\n",
+            " [0. 1. 0. 0.]\n",
+            " [0. 0. 1. 0.]\n",
+            " [0. 0. 0. 1.]]\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Il-lX6VCA7gk"
+      },
+      "source": [
+        "## 1.2 Tensors\n",
+        "\n",
+        "For neural networks implementation in PyTorch, we use tensors:\n",
+        "* a specialized data structure that are very similar to arrays and matrices\n",
+        "* used to encode the inputs and outputs of a model, as well as the model’s parameters\n",
+        "* similar to NumPy’s ndarrays, except that tensors can run on GPUs or other specialized hardware to accelerate computing"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "hPqpGGZPCRT-"
+      },
+      "source": [
+        "### 1.2.1 Tensor initialization\n",
+        "\n",
+        "▶▶ **Look at the code below and check that you understand each line:**\n",
+        "* We define a PyTorch tensor (i.e. a matrix) **x_data** from a list of list\n",
+        "* We define a PyTorch tensor (i.e. a matrix) **x_np** from a numpy array\n",
+        "* How to initialize an random tensor, an one tensor and a zero tensor\n",
+        "* Finally, we define a PyTorch tensor (i.e. a matrix) from another tensor:\n",
+        "  * **x_ones**: from the identity tensor\n",
+        "  * **x_rand**: from a tensor initialized with random values"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "HaEdsMG6BAh0",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "562edd9d-c42a-4038-bdc8-84d19b75e6f1"
+      },
+      "source": [
+        "# Tensor initialization\n",
+        "\n",
+        "## from data. The data type is automatically inferred.\n",
+        "data = [[1, 2], [3, 4]]\n",
+        "x_data = torch.tensor(data)\n",
+        "print( \"x_data\", x_data)\n",
+        "print( \"data type x_data=\", x_data.dtype)\n",
+        "\n",
+        "## from a numpy array\n",
+        "np_array = np.array(data)\n",
+        "x_np = torch.from_numpy(np_array)\n",
+        "print(\"\\nx_np\", x_np)\n",
+        "print( \"data type, np_array=\", np_array.dtype, \"x_data=\", x_np.dtype)\n",
+        "\n",
+        "## with random values / ones / zeros\n",
+        "shape = (2, 3,) # shape is a tuple of tensor dimensions\n",
+        "rand_tensor = torch.rand(shape)\n",
+        "ones_tensor = torch.ones(shape)\n",
+        "zeros_tensor = torch.zeros(shape)\n",
+        "\n",
+        "print(f\"Random Tensor: \\n {rand_tensor} \\n\")\n",
+        "print(f\"Ones Tensor: \\n {ones_tensor} \\n\")\n",
+        "print(f\"Zeros Tensor: \\n {zeros_tensor}\")\n",
+        "\n",
+        "## from another tensor\n",
+        "x_ones = torch.ones_like(x_data) # retains the properties of x_data\n",
+        "print(f\"\\nFrom Ones Tensor: \\n {x_ones} \\n\")\n",
+        "\n",
+        "x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data\n",
+        "print(f\"From Random Tensor: \\n {x_rand} \\n\")"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "x_data tensor([[1, 2],\n",
+            "        [3, 4]])\n",
+            "data type x_data= torch.int64\n",
+            "\n",
+            "x_np tensor([[1, 2],\n",
+            "        [3, 4]])\n",
+            "data type, np_array= int64 x_data= torch.int64\n",
+            "Random Tensor: \n",
+            " tensor([[0.4516, 0.4125, 0.0914],\n",
+            "        [0.1381, 0.4802, 0.4308]]) \n",
+            "\n",
+            "Ones Tensor: \n",
+            " tensor([[1., 1., 1.],\n",
+            "        [1., 1., 1.]]) \n",
+            "\n",
+            "Zeros Tensor: \n",
+            " tensor([[0., 0., 0.],\n",
+            "        [0., 0., 0.]])\n",
+            "\n",
+            "From Ones Tensor: \n",
+            " tensor([[1, 1],\n",
+            "        [1, 1]]) \n",
+            "\n",
+            "From Random Tensor: \n",
+            " tensor([[0.8048, 0.0088],\n",
+            "        [0.8002, 0.7587]]) \n",
+            "\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "oFDVEZcBCWF_"
+      },
+      "source": [
+        "### 1.2.2 Tensor attributes\n",
+        "\n",
+        "▶▶ **A tensor has different attributes, print the values for:**\n",
+        "* shape of the tensor\n",
+        "* type of the data stored\n",
+        "* device on which data are stored\n",
+        "\n",
+        "Look at the doc here: https://www.tensorflow.org/api_docs/python/tf/Tensor#shape"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "kS4TtR9DCJcq",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "daa8d1dc-9946-483b-f239-03da4ec3feba"
+      },
+      "source": [
+        "# Tensor attributes\n",
+        "tensor = torch.rand(3, 4)\n",
+        "\n",
+        "print(f\"Shape of tensor: {tensor.shape}\")\n",
+        "print(f\"Datatype of tensor: {tensor.dtype}\")\n",
+        "print(f\"Device tensor is stored on: {tensor.device}\")"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Shape of tensor: torch.Size([3, 4])\n",
+            "Datatype of tensor: torch.float32\n",
+            "Device tensor is stored on: cpu\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "tu8RM6O7CaKO"
+      },
+      "source": [
+        "### 1.2.3 Move to GPU\n",
+        "\n",
+        "The code below is used to:\n",
+        "* check on which device the code is running, 'cuda' stands for GPU. If not GPU is found that we use CPU.\n",
+        "\n",
+        "\n",
+        "▶▶ **Check and move to GPU:**\n",
+        "* Run the code, it should say 'no cpu'\n",
+        "* Move to GPU: in Colab, allocate a GPU by going to Edit > Notebook Settings (Modifier > Paramètres du notebook)\n",
+        "  * you'll see an indicator of connexion in the uppper right part of the screen\n",
+        "* Run the code from 1.2 again and the cell below (you can use the function Run / Run before or Exécution / Exécuter avant), you'll need to do all the imports again. You see the difference?"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "nT7n30VpCOzF",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "24cdbdff-1bc0-4846-d5a7-15da897ad97e"
+      },
+      "source": [
+        "# We move our tensor to the GPU if available\n",
+        "if torch.cuda.is_available():\n",
+        "  tensor = tensor.to('cuda')\n",
+        "  print(f\"Device tensor is stored on: {tensor.device}\")\n",
+        "else:\n",
+        "  print(\"no gpu\")\n",
+        "\n",
+        "print(tensor)"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Device tensor is stored on: cuda:0\n",
+            "tensor([[0.8227, 0.8155, 0.3872, 0.2957],\n",
+            "        [0.8694, 0.4785, 0.4519, 0.8665],\n",
+            "        [0.9097, 0.6743, 0.1998, 0.5599]], device='cuda:0')\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "VdqHVRkHCcgq"
+      },
+      "source": [
+        "Below, run after moving to GPU."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "nyZPKBvOGsyf",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "5e75668b-d860-40f2-8503-2a5903a38326"
+      },
+      "source": [
+        "# We move our tensor to the GPU if available\n",
+        "if torch.cuda.is_available():\n",
+        "  tensor = tensor.to('cuda')\n",
+        "  print(f\"Device tensor is stored on: {tensor.device}\")\n",
+        "else:\n",
+        "  print(\"no gpu\")\n",
+        "\n",
+        "print(tensor)"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Device tensor is stored on: cuda:0\n",
+            "tensor([[0.8227, 0.8155, 0.3872, 0.2957],\n",
+            "        [0.8694, 0.4785, 0.4519, 0.8665],\n",
+            "        [0.9097, 0.6743, 0.1998, 0.5599]], device='cuda:0')\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "8um7SDWGCp8o"
+      },
+      "source": [
+        "### 1.2.4 Tensor operations\n",
+        "\n",
+        "Doc: https://pytorch.org/docs/stable/torch.html\n",
+        "\n",
+        "▶▶ **Slicing operations:**\n",
+        "* Below we use slicing operations to modify tensors"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Tensor operations: similar to numpy arrays\n",
+        "tensor = torch.ones(4, 4)\n",
+        "print(tensor)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "BgF-ypEurJCk",
+        "outputId": "9a4b3d7f-1026-4fab-c912-4050512e6110"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "tensor([[1., 1., 1., 1.],\n",
+            "        [1., 1., 1., 1.],\n",
+            "        [1., 1., 1., 1.],\n",
+            "        [1., 1., 1., 1.]])\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "7yLviqmYC3sZ",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "d2221ee4-e091-4eec-dc41-64ba7eb4699e"
+      },
+      "source": [
+        "# ---------------------------------------------------------\n",
+        "# TODO: What do you expect?\n",
+        "# ---------------------------------------------------------\n",
+        "## Slicing\n",
+        "print(\"\\nSlicing\")\n",
+        "tensor[:,1] = 0\n",
+        "print(tensor)\n",
+        "\n",
+        "# ---------------------------------------------------------\n",
+        "# TODO: Change the first column with the value in l\n",
+        "# ---------------------------------------------------------\n",
+        "l =[1.,2.,3.,4.]\n",
+        "l = torch.tensor( l )\n",
+        "tensor[:, 0] = l\n",
+        "print(tensor)"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "\n",
+            "Slicing\n",
+            "tensor([[1., 0., 1., 1.],\n",
+            "        [1., 0., 1., 1.],\n",
+            "        [1., 0., 1., 1.],\n",
+            "        [1., 0., 1., 1.]])\n",
+            "tensor([[1., 0., 1., 1.],\n",
+            "        [2., 0., 1., 1.],\n",
+            "        [3., 0., 1., 1.],\n",
+            "        [4., 0., 1., 1.]])\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "▶▶ *Other operations:**\n",
+        "* Check the code below that performs:\n",
+        "  * tensor concatenation\n",
+        "  * tensor multiplication"
+      ],
+      "metadata": {
+        "id": "uCZ2AWPmrW6q"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "## Concatenation\n",
+        "print(\"\\nConcatenate\")\n",
+        "t1 = torch.cat([tensor, tensor, tensor], dim=1)\n",
+        "print(t1)\n",
+        "\n",
+        "## Multiplication: element_wise\n",
+        "print(\"\\nMultiply\")\n",
+        "# This computes the element-wise product\n",
+        "t2 = tensor.mul(tensor)\n",
+        "print(f\"tensor.mul(tensor) \\n {t2} \\n\")\n",
+        "# Alternative syntax:\n",
+        "t3 = tensor * tensor\n",
+        "print(f\"tensor * tensor \\n {t3}\")\n",
+        "\n",
+        "## Matrix multiplication\n",
+        "t4 = tensor.matmul(tensor.T)\n",
+        "print(f\"tensor.matmul(tensor.T) \\n {t4} \\n\")\n",
+        "# Alternative syntax:\n",
+        "t5 = tensor @ tensor.T\n",
+        "print(f\"tensor @ tensor.T \\n {t5}\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "t_AkQSnarNX8",
+        "outputId": "10048274-9911-48fe-be8b-2d5846db2e6e"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "\n",
+            "Concatenate\n",
+            "tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],\n",
+            "        [2., 0., 1., 1., 2., 0., 1., 1., 2., 0., 1., 1.],\n",
+            "        [3., 0., 1., 1., 3., 0., 1., 1., 3., 0., 1., 1.],\n",
+            "        [4., 0., 1., 1., 4., 0., 1., 1., 4., 0., 1., 1.]])\n",
+            "\n",
+            "Multiply\n",
+            "tensor.mul(tensor) \n",
+            " tensor([[ 1.,  0.,  1.,  1.],\n",
+            "        [ 4.,  0.,  1.,  1.],\n",
+            "        [ 9.,  0.,  1.,  1.],\n",
+            "        [16.,  0.,  1.,  1.]]) \n",
+            "\n",
+            "tensor * tensor \n",
+            " tensor([[ 1.,  0.,  1.,  1.],\n",
+            "        [ 4.,  0.,  1.,  1.],\n",
+            "        [ 9.,  0.,  1.,  1.],\n",
+            "        [16.,  0.,  1.,  1.]])\n",
+            "tensor.matmul(tensor.T) \n",
+            " tensor([[ 3.,  4.,  5.,  6.],\n",
+            "        [ 4.,  6.,  8., 10.],\n",
+            "        [ 5.,  8., 11., 14.],\n",
+            "        [ 6., 10., 14., 18.]]) \n",
+            "\n",
+            "tensor @ tensor.T \n",
+            " tensor([[ 3.,  4.,  5.,  6.],\n",
+            "        [ 4.,  6.,  8., 10.],\n",
+            "        [ 5.,  8., 11., 14.],\n",
+            "        [ 6., 10., 14., 18.]])\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "5ulTT2k_Hs97"
+      },
+      "source": [
+        "### 1.2.5 Tensor operations on GPU\n",
+        "\n",
+        "The tensor is stored on CPU by default.\n",
+        "\n",
+        "▶▶ **Initialize the tensor using *device='cuda'*: where are stored t1, ..., t5?**"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "atwxGd1_IdxI",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "55e2c7c1-45cc-466f-c74e-b8b1fec3d375"
+      },
+      "source": [
+        "# Tensor operations: similar to numpy arrays\n",
+        "\n",
+        "tensor = torch.ones(4, 4, device='cuda')\n",
+        "print(tensor)\n",
+        "\n",
+        "# ---------------------------------------------------------\n",
+        "# TODO: What do you expect?\n",
+        "# ---------------------------------------------------------\n",
+        "## Slicing\n",
+        "print(\"\\nSlicing\")\n",
+        "tensor[:,1] = 0\n",
+        "print(tensor)\n",
+        "\n",
+        "# ---------------------------------------------------------\n",
+        "# TODO: Change the first column with the value in l\n",
+        "# ---------------------------------------------------------\n",
+        "l =[1.,2.,3.,4.]\n",
+        "l = torch.tensor( l )\n",
+        "tensor[:, 0] = l\n",
+        "print(tensor)\n",
+        "\n",
+        "\n",
+        "## Concatenation\n",
+        "print(\"\\nConcatenate\")\n",
+        "t1 = torch.cat([tensor, tensor, tensor], dim=1)\n",
+        "print(t1)\n",
+        "\n",
+        "## Multiplication: element_wise\n",
+        "print(\"\\nMultiply\")\n",
+        "# This computes the element-wise product\n",
+        "t2 = tensor.mul(tensor)\n",
+        "print(f\"tensor.mul(tensor) \\n {t2} \\n\")\n",
+        "# Alternative syntax:\n",
+        "t3 = tensor * tensor\n",
+        "print(f\"tensor * tensor \\n {t3}\")\n",
+        "\n",
+        "## Matrix multiplication\n",
+        "t4 = tensor.matmul(tensor.T)\n",
+        "print(f\"tensor.matmul(tensor.T) \\n {t4} \\n\")\n",
+        "# Alternative syntax:\n",
+        "t5 = tensor @ tensor.T\n",
+        "print(f\"tensor @ tensor.T \\n {t5}\")"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "tensor([[1., 1., 1., 1.],\n",
+            "        [1., 1., 1., 1.],\n",
+            "        [1., 1., 1., 1.],\n",
+            "        [1., 1., 1., 1.]], device='cuda:0')\n",
+            "\n",
+            "Slicing\n",
+            "tensor([[1., 0., 1., 1.],\n",
+            "        [1., 0., 1., 1.],\n",
+            "        [1., 0., 1., 1.],\n",
+            "        [1., 0., 1., 1.]], device='cuda:0')\n",
+            "tensor([[1., 0., 1., 1.],\n",
+            "        [2., 0., 1., 1.],\n",
+            "        [3., 0., 1., 1.],\n",
+            "        [4., 0., 1., 1.]], device='cuda:0')\n",
+            "\n",
+            "Concatenate\n",
+            "tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],\n",
+            "        [2., 0., 1., 1., 2., 0., 1., 1., 2., 0., 1., 1.],\n",
+            "        [3., 0., 1., 1., 3., 0., 1., 1., 3., 0., 1., 1.],\n",
+            "        [4., 0., 1., 1., 4., 0., 1., 1., 4., 0., 1., 1.]], device='cuda:0')\n",
+            "\n",
+            "Multiply\n",
+            "tensor.mul(tensor) \n",
+            " tensor([[ 1.,  0.,  1.,  1.],\n",
+            "        [ 4.,  0.,  1.,  1.],\n",
+            "        [ 9.,  0.,  1.,  1.],\n",
+            "        [16.,  0.,  1.,  1.]], device='cuda:0') \n",
+            "\n",
+            "tensor * tensor \n",
+            " tensor([[ 1.,  0.,  1.,  1.],\n",
+            "        [ 4.,  0.,  1.,  1.],\n",
+            "        [ 9.,  0.,  1.,  1.],\n",
+            "        [16.,  0.,  1.,  1.]], device='cuda:0')\n",
+            "tensor.matmul(tensor.T) \n",
+            " tensor([[ 3.,  4.,  5.,  6.],\n",
+            "        [ 4.,  6.,  8., 10.],\n",
+            "        [ 5.,  8., 11., 14.],\n",
+            "        [ 6., 10., 14., 18.]], device='cuda:0') \n",
+            "\n",
+            "tensor @ tensor.T \n",
+            " tensor([[ 3.,  4.,  5.,  6.],\n",
+            "        [ 4.,  6.,  8., 10.],\n",
+            "        [ 5.,  8., 11., 14.],\n",
+            "        [ 6., 10., 14., 18.]], device='cuda:0')\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "UxW1jtX-GOfd"
+      },
+      "source": [
+        "### 1.2.5 Final exercise: compute *h*\n",
+        "\n",
+        "▶▶ **Compute the tensor h, using the same data for x and W as at the beginning of this TP.**\n",
+        "\n",
+        "```\n",
+        "x = np.array([1,2])\n",
+        "rng = np.random.default_rng(seed=42)\n",
+        "W = rng.random((3, 2))\n",
+        "```\n",
+        "\n",
+        "Important note: when multiplying matrices, we need to have the same data type, e.g. not **x** with *int* and **W** with *float*.\n",
+        "So you have to say that the vector **x** has the data type *float*. Two ways:\n",
+        "* from the initialization: **x = torch.tensor([1,2], dtype=float)**\n",
+        "* from any tensor: **x = x.to( torch.float64)** (here using only **float** would give *float32*, not what we want)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "lwIanFgWD_YJ",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "ddd58f96-5747-4c74-87b0-292e8aae11ab"
+      },
+      "source": [
+        "# --------------------------------------------------------\n",
+        "# TODO: Write the code to compute h = W.x+b\n",
+        "# --------------------------------------------------------\n",
+        "\n",
+        "# h = x.W + b\n",
+        "x = torch.tensor([1,2])\n",
+        "x = x.to( torch.float64) # be careful: using just 'float' here gives float32\n",
+        "## OR\n",
+        "#x = torch.tensor([1,2], dtype=float)\n",
+        "print(\"Our input vector with 2 elements:\\n\", x)\n",
+        "print( \"x shape:\", x.shape)\n",
+        "print( \"x type:\", x.dtype )\n",
+        "\n",
+        "# Generate a random matrix (with e generator, for reproducible results)\n",
+        "rng = np.random.default_rng(seed=42)\n",
+        "W = rng.random((3, 2))\n",
+        "W_t = torch.from_numpy(W)\n",
+        "print(\"\\n Our weight matrix, of shape 3x2:\\n\", W)\n",
+        "print( \"W shape:\", W_t.shape)\n",
+        "print( \"W type:\", W.dtype)\n",
+        "\n",
+        "# Bias, a scalar\n",
+        "b = 1.0\n",
+        "\n",
+        "# Now, try to multiply\n",
+        "h_t = W_t.matmul(x) + b\n",
+        "print(\"\\n Our h layer:\\n\", h_t)\n",
+        "print( \"h shape:\", h_t.shape)"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Our input vector with 2 elements:\n",
+            " tensor([1., 2.], dtype=torch.float64)\n",
+            "x shape: torch.Size([2])\n",
+            "x type: torch.float64\n",
+            "\n",
+            " Our weight matrix, of shape 3x2:\n",
+            " [[0.77395605 0.43887844]\n",
+            " [0.85859792 0.69736803]\n",
+            " [0.09417735 0.97562235]]\n",
+            "W shape: torch.Size([3, 2])\n",
+            "W type: float64\n",
+            "\n",
+            " Our h layer:\n",
+            " tensor([2.6517, 3.2533, 3.0454], dtype=torch.float64)\n",
+            "h shape: torch.Size([3])\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "na_tJOnfGDIz"
+      },
+      "source": [
+        "### Last minor note"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "lql9bH39G4Mw",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "3ae7e1c1-8eb8-48c9-a5bc-624d8eb25e44"
+      },
+      "source": [
+        "## Operations that have a _ suffix are in-place. For example: x.copy_(y), x.t_(), will change x.\n",
+        "print(tensor, \"\\n\")\n",
+        "tensor.add_(5)\n",
+        "print(tensor)"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "tensor([[1., 0., 1., 1.],\n",
+            "        [2., 0., 1., 1.],\n",
+            "        [3., 0., 1., 1.],\n",
+            "        [4., 0., 1., 1.]], device='cuda:0') \n",
+            "\n",
+            "tensor([[6., 5., 6., 6.],\n",
+            "        [7., 5., 6., 6.],\n",
+            "        [8., 5., 6., 6.],\n",
+            "        [9., 5., 6., 6.]], device='cuda:0')\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "DGmy-dtuOtiw"
+      },
+      "source": [
+        "# Part 2: Feedforward Neural Network\n",
+        "\n",
+        "In this practical session, we will explore a simple neural network architecture for NLP applications ; specifically, we will train a feedforward neural network for sentiment analysis, using the same dataset of reviews as in the previous session.  We will also keep the bag of words representation.\n",
+        "\n",
+        "\n",
+        "Sources:\n",
+        "* This TP is inspired by a TP by Tim van de Cruys\n",
+        "* https://www.deeplearningwizard.com/deep_learning/practical_pytorch/pytorch_feedforward_neuralnetwork/\n",
+        "* https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html\n",
+        "* https://medium.com/swlh/sentiment-classification-using-feed-forward-neural-network-in-pytorch-655811a0913f\n",
+        "* https://www.deeplearningwizard.com/deep_learning/practical_pytorch/pytorch_feedforward_neuralnetwork/"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Useful imports\n",
+        "import pandas as pd\n",
+        "import numpy as np\n",
+        "import re\n",
+        "import sklearn\n",
+        "\n",
+        "from sklearn.feature_extraction.text import CountVectorizer"
+      ],
+      "metadata": {
+        "id": "TKukE_hAAn_2"
+      },
+      "execution_count": 2,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Path to data\n",
+        "train_path = \"allocine_train.tsv\"\n",
+        "dev_path = \"allocine_dev.tsv\""
+      ],
+      "metadata": {
+        "id": "iUxRwO37Ap8h"
+      },
+      "execution_count": 3,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "wdSyhJqpVczO"
+      },
+      "source": [
+        "## 2.1 Read and load the data\n",
+        "\n",
+        "Here we will keep the bag of word representation, as in the previous session.\n",
+        "\n",
+        "You can find different ways of dealing with the input data in PyTorch. The simplest solution is to use the DataLoader from PyTorch:    \n",
+        "* the doc here https://pytorch.org/docs/stable/data.html and here https://pytorch.org/tutorials/beginner/basics/data_tutorial.html\n",
+        "* an example of use, with numpy array: https://www.kaggle.com/arunmohan003/sentiment-analysis-using-lstm-pytorch\n",
+        "\n",
+        "\n",
+        "\n",
+        "\n",
+        "\n",
+        "\n",
+        "You can also find many datasets for text ready to load in pytorch on: https://pytorch.org/text/stable/datasets.html"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "CxRbwziSV_BY"
+      },
+      "source": [
+        "#### 2.1.1 Build BoW vectors (code given)\n",
+        "\n",
+        "The code below allows to use scikit methods you already know to generate the bag of word representation."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "SoVJ18s_oxkn"
+      },
+      "source": [
+        "# This will be the size of the vectors reprensenting the input\n",
+        "MAX_FEATURES = 5000\n",
+        "\n",
+        "def vectorize_data( data_path, vectorizer=None ):\n",
+        "  data_df = pd.read_csv( data_path, header=0,\n",
+        "                    delimiter=\"\\t\", quoting=3)\n",
+        "  # If an existing vectorizer is not given, initialize the \"CountVectorizer\"\n",
+        "  # object, which is scikit-learn's bag of words tool.\n",
+        "  if not vectorizer:\n",
+        "    vectorizer = CountVectorizer(\n",
+        "        analyzer = \"word\",\n",
+        "        max_features = MAX_FEATURES\n",
+        "    )\n",
+        "    vectorizer.fit(data_df[\"review\"])\n",
+        "  # Then transform the data\n",
+        "  x_data = vectorizer.transform(data_df[\"review\"])\n",
+        "  # Vectorize also the labels\n",
+        "  y_data = np.asarray(data_df[\"sentiment\"])\n",
+        "  return x_data, y_data, vectorizer\n",
+        "\n",
+        "x_train, y_train, vectorizer = vectorize_data( train_path )\n",
+        "x_dev, y_dev, _ = vectorize_data( dev_path, vectorizer )\n",
+        "\n"
+      ],
+      "execution_count": 4,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Count_Vectorizer returns sparse arrays (for computational reasons) but PyTorch will expect dense input:"
+      ],
+      "metadata": {
+        "id": "PpeZ5ZtSVDSt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# from sparse to dense\n",
+        "x_train = x_train.toarray()\n",
+        "x_dev = x_dev.toarray()\n",
+        "\n",
+        "print(\"Train:\", x_train.shape)\n",
+        "print(\"Dev:\", x_dev.shape)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "PuE-xtwAVDeR",
+        "outputId": "efa1b16d-6620-495f-af60-1fb4ef74bd57"
+      },
+      "execution_count": 5,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Train: (5027, 5000)\n",
+            "Dev: (549, 5000)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Mt00MaMmW1_P"
+      },
+      "source": [
+        "#### 2.1.2 Transform to tensors\n",
+        "\n",
+        "▶▶ **Create a dataset object within the PyTorch library:**\n",
+        "\n",
+        "The easiest way to load datasets with PyTorch is to use the DataLoader class. Here we're going to give our numpy array to this class, and first, we need to transform our data to tensors. Follow the following steps:\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Useful imports\n",
+        "import torch\n",
+        "from torch.utils.data import TensorDataset, DataLoader"
+      ],
+      "metadata": {
+        "id": "x4gzAYdyFUoR"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "* 1- **torch.from_numpy( A_NUMPY_ARRAY )**: transform your array into a tensor\n",
+        "  * Note: you need to transform tensor type to float (for x), with **MY_TENSOR.to(torch.float)** (or cryptic error saying it was expecting long...).\n",
+        "  * Print the shape of the tensor for your training data.\n",
+        "\n",
+        "https://pytorch.org/docs/stable/generated/torch.from_numpy.html#torch-from-numpy"
+      ],
+      "metadata": {
+        "id": "jOCezmakX43w"
+      }
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "JMLPp3vnoxnG",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "347dd4ad-4543-4a24-89b6-8ecaaaf7472a"
+      },
+      "source": [
+        "# create Tensor dataset\n",
+        "tensor_x_train = torch.from_numpy(x_train).to(torch.float)\n",
+        "print( tensor_x_train.shape )"
+      ],
+      "execution_count": 7,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "torch.Size([5027, 5000])\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "* 2- **torch.utils.data.TensorDataset(INPUT_TENSOR, TARGET_TENSOR)**: Dataset wrapping tensors.\n",
+        "  * Take tensors as inputs\n",
+        "\n",
+        "https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset"
+      ],
+      "metadata": {
+        "id": "5N7dgdYkX9ju"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "train_data = TensorDataset( torch.from_numpy(x_train).to(torch.float),\n",
+        "                           torch.from_numpy(y_train))"
+      ],
+      "metadata": {
+        "id": "HYLuRXYeX9vc"
+      },
+      "execution_count": 8,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "* 3- **torch.utils.data.DataLoader**: many arguments in the constructor:\n",
+        "  * In particular, *dataset* of the type TensorDataset can be used\n",
+        "  * We'd rather shuffling our data in general, can be done here by changing the value of one argument\n",
+        "  * Note also the possibility to change the batch_size, we'll talk about it later\n",
+        "\n",
+        "https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader\n",
+        "\n",
+        "```\n",
+        "DataLoader(\n",
+        "    dataset,\n",
+        "    batch_size=1,\n",
+        "    shuffle=False,\n",
+        "    num_workers=0,\n",
+        "    collate_fn=None,\n",
+        "    pin_memory=False,\n",
+        " )\n",
+        " ```"
+      ],
+      "metadata": {
+        "id": "aSOuhYozYBA-"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# dataloaders\n",
+        "batch_size = 1 #no batch, or batch = 1\n",
+        "\n",
+        "# make sure to SHUFFLE your data\n",
+        "train_loader = DataLoader( train_data, shuffle=True, batch_size=batch_size )"
+      ],
+      "metadata": {
+        "id": "V2E_AQ4bYBN6"
+      },
+      "execution_count": 10,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "zOeZCY09o6CV"
+      },
+      "source": [
+        "## 2.2 Neural Network\n",
+        "\n",
+        "Now we can build our learning model.\n",
+        "\n",
+        "For this TP, we're going to walk through the code of a **simple feedforward neural network, with one hidden layer**.\n",
+        "\n",
+        "This network takes as input bag of words vectors, exactly as our 'classic' models: each review is represented by a vector of the size the number of tokens in the vocabulary with '1' when a word is present and '0' for the other words."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 2.2.1 Questions\n",
+        "\n",
+        "▶▶ **What is the input dimension?**\n",
+        "\n",
+        "▶▶ **What is the output dimension?**"
+      ],
+      "metadata": {
+        "id": "5KOM7ofrKUte"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BSK0j8YASriA"
+      },
+      "source": [
+        "▶▶ **What is the input dimension?** --> MAX FEATURES = 5000\n",
+        "\n",
+        "▶▶ **What is the output dimension?** --> number of classes = 2"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Useful imports\n",
+        "import torch\n",
+        "import torch.nn as nn"
+      ],
+      "metadata": {
+        "id": "DiNm2XwlG2_0"
+      },
+      "execution_count": 11,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 2.2.2 Write the skeleton of the class\n",
+        "\n",
+        "▶▶ We're going to **define our own neural network type**, by defining a new class:\n",
+        "* The class is called **FeedforwardNeuralNetModel**\n",
+        "* it inherits from the class **nn.Module**\n",
+        "* the constructor takes the following arguments:\n",
+        "  * size of the input (i.e. **input_dim**)\n",
+        "  * size of the hidden layer (i.e. **hidden_dim**)\n",
+        "  * size of the output layer (i.e. **output_dim**)\n",
+        "* in the constructor, we will call the constructor of the parent class\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "bE4RgHUkGnGl"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Start to define the class corresponding to our type of neural network\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "uKcge-oBG1HV"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FeedforwardNeuralNetModel(nn.Module):\n",
+        "    def __init__(self, input_dim, hidden_dim, output_dim):\n",
+        "        super(FeedforwardNeuralNetModel, self).__init__()\n"
+      ],
+      "metadata": {
+        "id": "IyQinowpJ2ic"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 2.2.3 Constructor\n",
+        "\n",
+        "▶▶ To continue the definition of our class, we need to explain how are built each layer of our network.\n",
+        "\n",
+        "More precisely, we're going to define a few fields:\n",
+        "* a function corresponding to the action of our hidden layer:\n",
+        "  * what kind of function is it ?\n",
+        "  * you need to indicate the size of the input and output for this function, what are they?\n",
+        "* a non linear function, that will be used on the ouput of our hidden layer\n",
+        "* a final output function:\n",
+        "  * what kind of function is it ?\n",
+        "  * you need to indicate the size of the input and output for this function, what are they?\n",
+        "\n",
+        "All the functions that can be used in Pytorch are defined here: https://pytorch.org/docs/stable/nn.functional.html\n",
+        "\n",
+        "Do you see things that you know?\n",
+        "\n",
+        "Hint: here you define fields of your class, these fields corresponding to specific kind of functions.\n",
+        "E.g. you're going to initialize a field such as **self.fc1=SPECIFIC_TYPE_OF_FCT(expected arguments)**."
+      ],
+      "metadata": {
+        "id": "0BHUuGKCHoU9"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Continue the definition of the class by defining three functions in your constructor\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "LN3aSTSaJNkp"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FeedforwardNeuralNetModel(nn.Module):\n",
+        "    def __init__(self, input_dim, hidden_dim, output_dim):\n",
+        "        super(FeedforwardNeuralNetModel, self).__init__()\n",
+        "\n",
+        "        # Linear function ==> W1\n",
+        "        self.fc1 = nn.Linear(input_dim, hidden_dim)\n",
+        "\n",
+        "        # Non-linearity ==> g\n",
+        "        self.sigmoid = nn.Sigmoid()\n",
+        "\n",
+        "        # Linear function (readout) ==> W2\n",
+        "        self.fc2 = nn.Linear(hidden_dim, output_dim)"
+      ],
+      "metadata": {
+        "id": "pIkm7Wc-J6ce"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "And that's it ;)"
+      ],
+      "metadata": {
+        "id": "uuN7LtHSKGDL"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 2.2.4 The **forward** method\n",
+        "\n",
+        "The main function we have to write when defining a neural network is called the **forward** function.\n",
+        "This function computes the outputs of the network (the logit), it is thus used to train the network.\n",
+        "It details how we apply the functions defined in the constructor.\n",
+        "\n",
+        "Let's define this function, with the following signature, where x is the input to the network:\n",
+        "```\n",
+        "def forward(self, x):\n",
+        "```\n",
+        "\n",
+        "▶▶ Follow the steps:\n",
+        "* 1- Apply the first linear functiond defined in the constructor to **x**, i.e. go through the hidden layer.\n",
+        "* 2- Apply the non linear function to the output of step 1, i.e. use the activation function.\n",
+        "* 3- Apply the second linear functiond defined in the constructor to the output of step 2, i.e. go through the output layer.\n",
+        "* 4- Return the output of step 3.\n",
+        "\n",
+        "You're done!"
+      ],
+      "metadata": {
+        "id": "e2IMSprgKJ7K"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Define the forward function, used to make all the calculations\n",
+        "# through the network\n",
+        "def forward(self, x):\n",
+        "  ''' y = g(x.W1+b).W2 '''\n",
+        "  # ..."
+      ],
+      "metadata": {
+        "id": "8z-QpBt2NOlu"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "Kvmc-_zqoxvF"
+      },
+      "source": [
+        "class FeedforwardNeuralNetModel(nn.Module):\n",
+        "    def __init__(self, input_dim, hidden_dim, output_dim):\n",
+        "        super(FeedforwardNeuralNetModel, self).__init__()\n",
+        "        # Linear function ==> W1\n",
+        "        self.fc1 = nn.Linear(input_dim, hidden_dim)\n",
+        "\n",
+        "        # Non-linearity ==> g\n",
+        "        self.sigmoid = nn.Sigmoid()\n",
+        "\n",
+        "        # Linear function (readout) ==> W2\n",
+        "        self.fc2 = nn.Linear(hidden_dim, output_dim)\n",
+        "\n",
+        "    def forward(self, x):\n",
+        "        '''\n",
+        "        y = g(x.W1+b).W2\n",
+        "        '''\n",
+        "        # Linear function  # LINEAR ==> x.W1+b\n",
+        "        out1 = self.fc1(x)\n",
+        "\n",
+        "        # Non-linearity  # NON-LINEAR ==> h1 = g(x.W1+b)\n",
+        "        out2 = self.sigmoid(out1)\n",
+        "\n",
+        "        # Linear function (readout)  # LINEAR ==> y = h1.W2\n",
+        "        out3 = self.fc2(out2)\n",
+        "        return out3"
+      ],
+      "execution_count": 12,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 2.3 Training the network\n",
+        "\n",
+        "Now we can use our beautiful class to define and then train our own neural network."
+      ],
+      "metadata": {
+        "id": "sBrDXfQbO5yq"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "oWLDfLGxpBvn"
+      },
+      "source": [
+        "### 2.3.1 Hyper-parameters\n",
+        "\n",
+        "We need to set up the values for the hyper-parameters, and define the form of the loss and the optimization methods.\n",
+        "\n",
+        "▶▶ **Check that you understand what are each of the variables below**\n",
+        "* one that you prabably don't know is the learning rate, we'll explain it in the next course. Broadly speaking, it corresponds to the amount of update used during training."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "fcGyjXbUoxx9"
+      },
+      "source": [
+        "# Many choices here!\n",
+        "VOCAB_SIZE = MAX_FEATURES\n",
+        "input_dim = VOCAB_SIZE\n",
+        "hidden_dim = 4\n",
+        "output_dim = 2\n",
+        "num_epochs = 5\n",
+        "learning_rate = 0.1"
+      ],
+      "execution_count": 13,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 2.3.2 Loss function\n",
+        "\n",
+        "Another thing that has to be decided is the kind of loss function we want to use.\n",
+        "Here we use a common one, called CrossEntropy.\n",
+        "We will come back in more details on this loss.\n",
+        "One important note is that this function in PyTorch includes the SoftMax function that should be applied after the output layer to get labels."
+      ],
+      "metadata": {
+        "id": "yyJINiVHPoWq"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "criterion = nn.CrossEntropyLoss()"
+      ],
+      "metadata": {
+        "id": "TVVy7hhrPl-K"
+      },
+      "execution_count": 14,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 2.3.3 Initialization of the model\n",
+        "\n",
+        "Now you can instantiate your class: define a model that is of the type FeedforwardNeuralNetModel using the values defined before as hyper-parameters."
+      ],
+      "metadata": {
+        "id": "kyY91BtPQIeo"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Initialization of the model\n",
+        "# ..."
+      ],
+      "metadata": {
+        "id": "hk_nev2-Q0m-"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "15WR_Jdtoxze"
+      },
+      "source": [
+        "# Initialization of the model\n",
+        "model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)"
+      ],
+      "execution_count": 15,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 2.3.4 Optimizer\n",
+        "\n",
+        "At last, we need to indicate the method we want to use to optimize our network.\n",
+        "Here, we use a common one called Stochastic Gradient Descent.\n",
+        "We will also go back on that later on.\n",
+        "\n",
+        "Note that its arguments are:\n",
+        "* the parameters of our models (the Ws)\n",
+        "* the learning rate\n",
+        "\n",
+        "Based on these information, it can make the necessary updates.\n"
+      ],
+      "metadata": {
+        "id": "wBjNtZ-bQfSQ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)"
+      ],
+      "metadata": {
+        "id": "A8AY0bU8Qhyf"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "OPt_VbCMqoD2"
+      },
+      "source": [
+        "### Training the network\n",
+        "\n",
+        "A simple code to train the neural network is given below.\n",
+        "\n",
+        "▶▶ **Run the code and look at the loss after each training step.**"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "OnNx8hZJox3v",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "d560ca2a-4349-466a-b38c-ca4f0433029f"
+      },
+      "source": [
+        "# Start training\n",
+        "for epoch in range(num_epochs):\n",
+        "    train_loss, total_acc, total_count = 0, 0, 0\n",
+        "\n",
+        "    # for each instance + its associated label\n",
+        "    for input, label in train_loader:\n",
+        "\n",
+        "        # Clearing the accumulated gradients\n",
+        "        # torch *accumulates* gradients. Before passing in a\n",
+        "        # new instance, you need to zero out the gradients from the old\n",
+        "        # instance\n",
+        "        # Clear gradients w.r.t. parameters\n",
+        "        optimizer.zero_grad()\n",
+        "\n",
+        "        # ==> Forward pass to get output/logits\n",
+        "        # = apply all our functions: y = g(x.W1+b).W2\n",
+        "        outputs = model( input )\n",
+        "\n",
+        "        # ==> Calculate Loss: softmax --> cross entropy loss\n",
+        "        loss = criterion(outputs, label)\n",
+        "\n",
+        "        # Getting gradients w.r.t. parameters\n",
+        "        # Here is the way to find how to modify the parameters in\n",
+        "        # order to lower the loss\n",
+        "        loss.backward()\n",
+        "\n",
+        "        # ==> Updating parameters: you don t need to provide the loss here,\n",
+        "        # when computing the loss, the information is saved in the parameters\n",
+        "        # (more precisely, doing backward computes the gradients for all tensors,\n",
+        "        # and these gradients are saved by each tensor)\n",
+        "        optimizer.step()\n",
+        "\n",
+        "        # -- a useful print\n",
+        "        # Accumulating the loss over time\n",
+        "        train_loss += loss.item()\n",
+        "        total_acc += (outputs.argmax(1) == label).sum().item()\n",
+        "        total_count += label.size(0)\n",
+        "\n",
+        "    # Compute accuracy on train set at each epoch\n",
+        "    print('Epoch: {}. Loss: {}. ACC {} '.format(epoch,\n",
+        "                                                train_loss/x_train.shape[0],\n",
+        "                                                total_acc/x_train.shape[0]))\n",
+        "\n",
+        "    total_acc, total_count = 0, 0\n",
+        "    train_loss = 0"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Epoch: 0. Loss: 0.5252582924727662. ACC 0.7376168689078973 \n",
+            "Epoch: 1. Loss: 0.3890879917333115. ACC 0.8329023274318679 \n",
+            "Epoch: 2. Loss: 0.3207374065181158. ACC 0.8651283071414363 \n",
+            "Epoch: 3. Loss: 0.2718604214944098. ACC 0.8909886612293615 \n",
+            "Epoch: 4. Loss: 0.2544953779364264. ACC 0.903322060871295 \n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "tzMl5wdnqtCW"
+      },
+      "source": [
+        "### Evaluate the model"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Useful imports\n",
+        "from sklearn.metrics import classification_report"
+      ],
+      "metadata": {
+        "id": "N8wxX85sSyPM"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "ldDubAPDox5K",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "f34d759f-f097-4efc-d50d-4ac46ae62404"
+      },
+      "source": [
+        "# create Tensor dataset\n",
+        "valid_data = TensorDataset( torch.from_numpy(x_dev).to(torch.float),\n",
+        "                           torch.from_numpy(y_dev))\n",
+        "valid_loader = DataLoader( valid_data )\n",
+        "\n",
+        "\n",
+        "# Disabling gradient calculation is useful for inference,\n",
+        "# when you are sure that you will not call Tensor.backward().\n",
+        "predictions, gold = [], []\n",
+        "with torch.no_grad():\n",
+        "    for input, label in valid_loader:\n",
+        "        probs = model(input)\n",
+        "        predictions.append( torch.argmax(probs, dim=1).cpu().numpy()[0] )\n",
+        "        gold.append(int(label))\n",
+        "\n",
+        "print(classification_report(gold, predictions))"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "              precision    recall  f1-score   support\n",
+            "\n",
+            "           0       0.77      0.87      0.82       230\n",
+            "           1       0.90      0.82      0.86       319\n",
+            "\n",
+            "    accuracy                           0.84       549\n",
+            "   macro avg       0.84      0.84      0.84       549\n",
+            "weighted avg       0.85      0.84      0.84       549\n",
+            "\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Hq-jspmLL387"
+      },
+      "source": [
+        "## 3. Move to GPU\n",
+        "\n",
+        "Below we indicate the modifications needed to make all the computations on GPU instead of CPU."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "pydK_h3QLZfO",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "c6e41350-207c-4def-f841-e322128283b7"
+      },
+      "source": [
+        "## 1- Define the device to be used\n",
+        "\n",
+        "# CUDA for PyTorch\n",
+        "use_cuda = torch.cuda.is_available()\n",
+        "device = torch.device(\"cuda\" if use_cuda else \"cpu\")\n",
+        "print(device)"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "cuda\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "PuV1OjAdMHOX"
+      },
+      "source": [
+        "## 2- No change here\n",
+        "\n",
+        "import torch\n",
+        "import torch.nn as nn\n",
+        "\n",
+        "class FeedforwardNeuralNetModel(nn.Module):\n",
+        "    def __init__(self, input_dim, hidden_dim, output_dim):\n",
+        "        super(FeedforwardNeuralNetModel, self).__init__()\n",
+        "        # Linear function ==> W1\n",
+        "        self.fc1 = nn.Linear(input_dim, hidden_dim)\n",
+        "\n",
+        "        # Non-linearity ==> g\n",
+        "        self.sigmoid = nn.Sigmoid()\n",
+        "\n",
+        "        # Linear function (readout) ==> W2\n",
+        "        self.fc2 = nn.Linear(hidden_dim, output_dim)\n",
+        "\n",
+        "    def forward(self, x):\n",
+        "        '''\n",
+        "        y = g(x.W1+b).W2\n",
+        "        '''\n",
+        "        # Linear function  # LINEAR ==> x.W1+b\n",
+        "        out = self.fc1(x)\n",
+        "\n",
+        "        # Non-linearity  # NON-LINEAR ==> h1 = g(x.W1+b)\n",
+        "        out = self.sigmoid(out)\n",
+        "\n",
+        "        # Linear function (readout)  # LINEAR ==> y = h1.W2\n",
+        "        out = self.fc2(out)\n",
+        "        return out"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "K7mmAyoPMziY"
+      },
+      "source": [
+        "## 3- Move your model to the GPU\n",
+        "\n",
+        "# Initialization of the model\n",
+        "model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)\n",
+        "\n",
+        "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)\n",
+        "\n",
+        "## ------------ CHANGE HERE -----------------\n",
+        "model = model.to(device)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "ANibLgnhL9jU",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "0a30a491-fe38-4757-ea4e-d351e0c5d099"
+      },
+      "source": [
+        "## 4- Move your data to GPU\n",
+        "\n",
+        "# Start training\n",
+        "for epoch in range(num_epochs):\n",
+        "    train_loss, total_acc, total_count = 0, 0, 0\n",
+        "    for input, label in train_loader:\n",
+        "        ## ------------ CHANGE HERE -----------------\n",
+        "        input = input.to(device)\n",
+        "        label = label.to(device)\n",
+        "\n",
+        "        # Clear gradients w.r.t. parameters\n",
+        "        optimizer.zero_grad()\n",
+        "\n",
+        "        # Forward pass to get output/logits\n",
+        "        outputs = model( input )\n",
+        "\n",
+        "        # Calculate Loss: softmax --> cross entropy loss\n",
+        "        loss = criterion(outputs, label)\n",
+        "\n",
+        "        # Getting gradients w.r.t. parameters\n",
+        "        loss.backward()\n",
+        "\n",
+        "        # Updating parameters\n",
+        "        optimizer.step()\n",
+        "\n",
+        "        # Accumulating the loss over time\n",
+        "        train_loss += loss.item()\n",
+        "        total_acc += (outputs.argmax(1) == label).sum().item()\n",
+        "        total_count += label.size(0)\n",
+        "\n",
+        "    # Compute accuracy on train set at each epoch\n",
+        "    print('Epoch: {}. Loss: {}. ACC {} '.format(epoch,\n",
+        "                                                train_loss/x_train.shape[0],\n",
+        "                                                total_acc/x_train.shape[0]))\n",
+        "\n",
+        "    total_acc, total_count = 0, 0\n",
+        "    train_loss = 0"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Epoch: 0. Loss: 0.5239484066388459. ACC 0.7298587626815198 \n",
+            "Epoch: 1. Loss: 0.3703767738312165. ACC 0.8414561368609509 \n",
+            "Epoch: 2. Loss: 0.3024231172786994. ACC 0.8726874875671374 \n",
+            "Epoch: 3. Loss: 0.2766759563584567. ACC 0.8874079968171872 \n",
+            "Epoch: 4. Loss: 0.2579045861314929. ACC 0.8959618062462701 \n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "dSXQF-ViNUH4",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "3157cb12-23f9-43cc-de72-af229e54f0b3"
+      },
+      "source": [
+        "# -- 5- Again, move your data to GPU\n",
+        "\n",
+        "predictions = []\n",
+        "gold = []\n",
+        "\n",
+        "with torch.no_grad():\n",
+        "    for input, label in valid_loader:\n",
+        "        ## ------------ CHANGE HERE -----------------\n",
+        "        input = input.to(device)\n",
+        "        probs = model(input)\n",
+        "        #Here, we need CPU: else, it will generate the following error\n",
+        "        # can't convert cuda:0 device type tensor to numpy.\n",
+        "        # Use Tensor.cpu() to copy the tensor to host memory first.\n",
+        "        # (if we need a numpy array)\n",
+        "        predictions.append( torch.argmax(probs, dim=1).cpu().numpy()[0] )\n",
+        "        #print( probs )\n",
+        "        #print( torch.argmax(probs, dim=1) ) # Return the index of the max value\n",
+        "        #print( torch.argmax(probs, dim=1).cpu().numpy()[0] )\n",
+        "        gold.append(int(label))\n",
+        "\n",
+        "print(classification_report(gold, predictions))"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "              precision    recall  f1-score   support\n",
+            "\n",
+            "           0       0.72      0.89      0.80       230\n",
+            "           1       0.90      0.75      0.82       319\n",
+            "\n",
+            "    accuracy                           0.81       549\n",
+            "   macro avg       0.81      0.82      0.81       549\n",
+            "weighted avg       0.83      0.81      0.81       549\n",
+            "\n"
+          ]
+        }
+      ]
+    }
+  ]
+}
\ No newline at end of file
+%% Cell type:markdown id: tags:
+
+# TP 2: Linear Algebra and Feedforward neural network
+Master LiTL - 2023-2024
+
+## Requirements
+In this section, we will go through some code to learn how to manipulate matrices and tensors, and we will take a look at some PyTorch code that allows to define, train and evaluate a simple neural network.
+The modules used are the the same as in the previous session, *Numpy* and *Scikit*, with the addition of *PyTorch*. They are all already available within colab.
+
+## Part 1: Linear Algebra
+
+In this section, we will go through some python code to deal with matrices and also tensors, the data structures used in PyTorch.
+
+Sources:
+* Linear Algebra explained in the context of deep learning: https://towardsdatascience.com/linear-algebra-explained-in-the-context-of-deep-learning-8fcb8fca1494
+* PyTorch tutorial: https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py
+* PyTorch doc on tensors: https://pytorch.org/docs/stable/torch.html
+
+%% Cell type:code id: tags:
+
+``` 
+# Useful imports
+import numpy as np
+import torch
+```
+
+%% Cell type:markdown id: tags:
+
+## 1.1 Numpy arrays
+
+NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type
+
+%% Cell type:markdown id: tags:
+
+### 1.1.1 Numpy arrays
+
+▶▶ **Look at the code below and check that you understand each line:**
+* We define a numpy array (i.e. a vector) **x** from a list
+* We define a numpy array of shape 3x2 (i.e. a matrix) initialized with random numbers, called **W**
+* We define a scalar, **b**
+* Finally, with all these elements, we can compute **h = W.x + b**
+
+%% Cell type:code id: tags:
+
+``` 
+x = np.array([1,2])
+print("Our input vector with 2 elements:\n", x)
+print( "x shape:", x.shape)
+
+print( "x data type", x.dtype)
+# Give a list of elements
+# a = np.array(1,2,3,4)    # WRONG
+# a = np.array([1,2,3,4])  # RIGHT
+
+# Generate a random matrix (with a generator and a seed, for reproducible results)
+rng = np.random.default_rng(seed=42)
+W = rng.random((3, 2))
+print("\n Our weight matrix, of shape 3x2:\n", W)
+print( "W shape:", W.shape)
+print( "W data type", W.dtype)
+
+# Bias, a scalar
+b = 1
+
+# Now, try to multiply
+h = W.dot(x) + b
+print("\n Our h layer:\n", h)
+print( "h shape:", h.shape)
+print( "h data type", h.dtype)
+```
+
+%% Output
+
+    Our input vector with 2 elements:
+     [1 2]
+    x shape: (2,)
+    x data type int64
+    
+     Our weight matrix, of shape 3x2:
+     [[0.77395605 0.43887844]
+     [0.85859792 0.69736803]
+     [0.09417735 0.97562235]]
+    W shape: (3, 2)
+    W data type float64
+    
+     Our h layer:
+     [2.65171293 3.25333398 3.04542205]
+    h shape: (3,)
+    h data type float64
+
+%% Cell type:markdown id: tags:
+
+### 1.1.2 Operations on arrays
+
+▶▶ **Look at the code below and check that you understand each line:**
+* How to reshape a matrix i.e. change its dimensions
+* How to compute the transpose of a vector / matrix
+
+%% Cell type:code id: tags:
+
+``` 
+# Useful transformations
+h = h.reshape((3,1))
+print("\n h reshape:\n", h)
+print( "h shape:", h.shape)
+
+h1 = np.transpose(h)
+print("\n h transpose:\n", h1)
+print( "h shape:", h1.shape)
+
+h2 = h.T
+print("\n h transpose:\n", h2)
+print( "h shape:", h2.shape)
+
+Wt = W.T
+print("\nW:\n", W)
+print("\nW.T:\n", Wt)
+```
+
+%% Output
+
+    
+     h reshape:
+     [[2.65171293]
+     [3.25333398]
+     [3.04542205]]
+    h shape: (3, 1)
+    
+     h transpose:
+     [[2.65171293 3.25333398 3.04542205]]
+    h shape: (1, 3)
+    
+     h transpose:
+     [[2.65171293 3.25333398 3.04542205]]
+    h shape: (1, 3)
+    
+    W:
+     [[0.77395605 0.43887844]
+     [0.85859792 0.69736803]
+     [0.09417735 0.97562235]]
+    
+    W.T:
+     [[0.77395605 0.85859792 0.09417735]
+     [0.43887844 0.69736803 0.97562235]]
+
+%% Cell type:markdown id: tags:
+
+▶▶ **A last note: creating an identity matrix**
+
+%% Cell type:code id: tags:
+
+``` 
+## numpy code to create identity matrix
+a = np.eye(4)
+print(a)
+```
+
+%% Output
+
+    [[1. 0. 0. 0.]
+     [0. 1. 0. 0.]
+     [0. 0. 1. 0.]
+     [0. 0. 0. 1.]]
+
+%% Cell type:markdown id: tags:
+
+## 1.2 Tensors
+
+For neural networks implementation in PyTorch, we use tensors:
+* a specialized data structure that are very similar to arrays and matrices
+* used to encode the inputs and outputs of a model, as well as the model’s parameters
+* similar to NumPy’s ndarrays, except that tensors can run on GPUs or other specialized hardware to accelerate computing
+
+%% Cell type:markdown id: tags:
+
+### 1.2.1 Tensor initialization
+
+▶▶ **Look at the code below and check that you understand each line:**
+* We define a PyTorch tensor (i.e. a matrix) **x_data** from a list of list
+* We define a PyTorch tensor (i.e. a matrix) **x_np** from a numpy array
+* How to initialize an random tensor, an one tensor and a zero tensor
+* Finally, we define a PyTorch tensor (i.e. a matrix) from another tensor:
+  * **x_ones**: from the identity tensor
+  * **x_rand**: from a tensor initialized with random values
+
+%% Cell type:code id: tags:
+
+``` 
+# Tensor initialization
+
+## from data. The data type is automatically inferred.
+data = [[1, 2], [3, 4]]
+x_data = torch.tensor(data)
+print( "x_data", x_data)
+print( "data type x_data=", x_data.dtype)
+
+## from a numpy array
+np_array = np.array(data)
+x_np = torch.from_numpy(np_array)
+print("\nx_np", x_np)
+print( "data type, np_array=", np_array.dtype, "x_data=", x_np.dtype)
+
+## with random values / ones / zeros
+shape = (2, 3,) # shape is a tuple of tensor dimensions
+rand_tensor = torch.rand(shape)
+ones_tensor = torch.ones(shape)
+zeros_tensor = torch.zeros(shape)
+
+print(f"Random Tensor: \n {rand_tensor} \n")
+print(f"Ones Tensor: \n {ones_tensor} \n")
+print(f"Zeros Tensor: \n {zeros_tensor}")
+
+## from another tensor
+x_ones = torch.ones_like(x_data) # retains the properties of x_data
+print(f"\nFrom Ones Tensor: \n {x_ones} \n")
+
+x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
+print(f"From Random Tensor: \n {x_rand} \n")
+```
+
+%% Output
+
+    x_data tensor([[1, 2],
+            [3, 4]])
+    data type x_data= torch.int64
+    
+    x_np tensor([[1, 2],
+            [3, 4]])
+    data type, np_array= int64 x_data= torch.int64
+    Random Tensor:
+     tensor([[0.4516, 0.4125, 0.0914],
+            [0.1381, 0.4802, 0.4308]])
+    
+    Ones Tensor:
+     tensor([[1., 1., 1.],
+            [1., 1., 1.]])
+    
+    Zeros Tensor:
+     tensor([[0., 0., 0.],
+            [0., 0., 0.]])
+    
+    From Ones Tensor:
+     tensor([[1, 1],
+            [1, 1]])
+    
+    From Random Tensor:
+     tensor([[0.8048, 0.0088],
+            [0.8002, 0.7587]])
+    
+
+%% Cell type:markdown id: tags:
+
+### 1.2.2 Tensor attributes
+
+▶▶ **A tensor has different attributes, print the values for:**
+* shape of the tensor
+* type of the data stored
+* device on which data are stored
+
+Look at the doc here: https://www.tensorflow.org/api_docs/python/tf/Tensor#shape
+
+%% Cell type:code id: tags:
+
+``` 
+# Tensor attributes
+tensor = torch.rand(3, 4)
+
+print(f"Shape of tensor: {tensor.shape}")
+print(f"Datatype of tensor: {tensor.dtype}")
+print(f"Device tensor is stored on: {tensor.device}")
+```
+
+%% Output
+
+    Shape of tensor: torch.Size([3, 4])
+    Datatype of tensor: torch.float32
+    Device tensor is stored on: cpu
+
+%% Cell type:markdown id: tags:
+
+### 1.2.3 Move to GPU
+
+The code below is used to:
+* check on which device the code is running, 'cuda' stands for GPU. If not GPU is found that we use CPU.
+
+
+▶▶ **Check and move to GPU:**
+* Run the code, it should say 'no cpu'
+* Move to GPU: in Colab, allocate a GPU by going to Edit > Notebook Settings (Modifier > Paramètres du notebook)
+  * you'll see an indicator of connexion in the uppper right part of the screen
+* Run the code from 1.2 again and the cell below (you can use the function Run / Run before or Exécution / Exécuter avant), you'll need to do all the imports again. You see the difference?
+
+%% Cell type:code id: tags:
+
+``` 
+# We move our tensor to the GPU if available
+if torch.cuda.is_available():
+  tensor = tensor.to('cuda')
+  print(f"Device tensor is stored on: {tensor.device}")
+else:
+  print("no gpu")
+
+print(tensor)
+```
+
+%% Output
+
+    Device tensor is stored on: cuda:0
+    tensor([[0.8227, 0.8155, 0.3872, 0.2957],
+            [0.8694, 0.4785, 0.4519, 0.8665],
+            [0.9097, 0.6743, 0.1998, 0.5599]], device='cuda:0')
+
+%% Cell type:markdown id: tags:
+
+Below, run after moving to GPU.
+
+%% Cell type:code id: tags:
+
+``` 
+# We move our tensor to the GPU if available
+if torch.cuda.is_available():
+  tensor = tensor.to('cuda')
+  print(f"Device tensor is stored on: {tensor.device}")
+else:
+  print("no gpu")
+
+print(tensor)
+```
+
+%% Output
+
+    Device tensor is stored on: cuda:0
+    tensor([[0.8227, 0.8155, 0.3872, 0.2957],
+            [0.8694, 0.4785, 0.4519, 0.8665],
+            [0.9097, 0.6743, 0.1998, 0.5599]], device='cuda:0')
+
+%% Cell type:markdown id: tags:
+
+### 1.2.4 Tensor operations
+
+Doc: https://pytorch.org/docs/stable/torch.html
+
+▶▶ **Slicing operations:**
+* Below we use slicing operations to modify tensors
+
+%% Cell type:code id: tags:
+
+``` 
+# Tensor operations: similar to numpy arrays
+tensor = torch.ones(4, 4)
+print(tensor)
+```
+
+%% Output
+
+    tensor([[1., 1., 1., 1.],
+            [1., 1., 1., 1.],
+            [1., 1., 1., 1.],
+            [1., 1., 1., 1.]])
+
+%% Cell type:code id: tags:
+
+``` 
+# ---------------------------------------------------------
+# TODO: What do you expect?
+# ---------------------------------------------------------
+## Slicing
+print("\nSlicing")
+tensor[:,1] = 0
+print(tensor)
+
+# ---------------------------------------------------------
+# TODO: Change the first column with the value in l
+# ---------------------------------------------------------
+l =[1.,2.,3.,4.]
+l = torch.tensor( l )
+tensor[:, 0] = l
+print(tensor)
+```
+
+%% Output
+
+    
+    Slicing
+    tensor([[1., 0., 1., 1.],
+            [1., 0., 1., 1.],
+            [1., 0., 1., 1.],
+            [1., 0., 1., 1.]])
+    tensor([[1., 0., 1., 1.],
+            [2., 0., 1., 1.],
+            [3., 0., 1., 1.],
+            [4., 0., 1., 1.]])
+
+%% Cell type:markdown id: tags:
+
+▶▶ *Other operations:**
+* Check the code below that performs:
+  * tensor concatenation
+  * tensor multiplication
+
+%% Cell type:code id: tags:
+
+``` 
+## Concatenation
+print("\nConcatenate")
+t1 = torch.cat([tensor, tensor, tensor], dim=1)
+print(t1)
+
+## Multiplication: element_wise
+print("\nMultiply")
+# This computes the element-wise product
+t2 = tensor.mul(tensor)
+print(f"tensor.mul(tensor) \n {t2} \n")
+# Alternative syntax:
+t3 = tensor * tensor
+print(f"tensor * tensor \n {t3}")
+
+## Matrix multiplication
+t4 = tensor.matmul(tensor.T)
+print(f"tensor.matmul(tensor.T) \n {t4} \n")
+# Alternative syntax:
+t5 = tensor @ tensor.T
+print(f"tensor @ tensor.T \n {t5}")
+```
+
+%% Output
+
+    
+    Concatenate
+    tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
+            [2., 0., 1., 1., 2., 0., 1., 1., 2., 0., 1., 1.],
+            [3., 0., 1., 1., 3., 0., 1., 1., 3., 0., 1., 1.],
+            [4., 0., 1., 1., 4., 0., 1., 1., 4., 0., 1., 1.]])
+    
+    Multiply
+    tensor.mul(tensor)
+     tensor([[ 1.,  0.,  1.,  1.],
+            [ 4.,  0.,  1.,  1.],
+            [ 9.,  0.,  1.,  1.],
+            [16.,  0.,  1.,  1.]])
+    
+    tensor * tensor
+     tensor([[ 1.,  0.,  1.,  1.],
+            [ 4.,  0.,  1.,  1.],
+            [ 9.,  0.,  1.,  1.],
+            [16.,  0.,  1.,  1.]])
+    tensor.matmul(tensor.T)
+     tensor([[ 3.,  4.,  5.,  6.],
+            [ 4.,  6.,  8., 10.],
+            [ 5.,  8., 11., 14.],
+            [ 6., 10., 14., 18.]])
+    
+    tensor @ tensor.T
+     tensor([[ 3.,  4.,  5.,  6.],
+            [ 4.,  6.,  8., 10.],
+            [ 5.,  8., 11., 14.],
+            [ 6., 10., 14., 18.]])
+
+%% Cell type:markdown id: tags:
+
+### 1.2.5 Tensor operations on GPU
+
+The tensor is stored on CPU by default.
+
+▶▶ **Initialize the tensor using *device='cuda'*: where are stored t1, ..., t5?**
+
+%% Cell type:code id: tags:
+
+``` 
+# Tensor operations: similar to numpy arrays
+
+tensor = torch.ones(4, 4, device='cuda')
+print(tensor)
+
+# ---------------------------------------------------------
+# TODO: What do you expect?
+# ---------------------------------------------------------
+## Slicing
+print("\nSlicing")
+tensor[:,1] = 0
+print(tensor)
+
+# ---------------------------------------------------------
+# TODO: Change the first column with the value in l
+# ---------------------------------------------------------
+l =[1.,2.,3.,4.]
+l = torch.tensor( l )
+tensor[:, 0] = l
+print(tensor)
+
+
+## Concatenation
+print("\nConcatenate")
+t1 = torch.cat([tensor, tensor, tensor], dim=1)
+print(t1)
+
+## Multiplication: element_wise
+print("\nMultiply")
+# This computes the element-wise product
+t2 = tensor.mul(tensor)
+print(f"tensor.mul(tensor) \n {t2} \n")
+# Alternative syntax:
+t3 = tensor * tensor
+print(f"tensor * tensor \n {t3}")
+
+## Matrix multiplication
+t4 = tensor.matmul(tensor.T)
+print(f"tensor.matmul(tensor.T) \n {t4} \n")
+# Alternative syntax:
+t5 = tensor @ tensor.T
+print(f"tensor @ tensor.T \n {t5}")
+```
+
+%% Output
+
+    tensor([[1., 1., 1., 1.],
+            [1., 1., 1., 1.],
+            [1., 1., 1., 1.],
+            [1., 1., 1., 1.]], device='cuda:0')
+    
+    Slicing
+    tensor([[1., 0., 1., 1.],
+            [1., 0., 1., 1.],
+            [1., 0., 1., 1.],
+            [1., 0., 1., 1.]], device='cuda:0')
+    tensor([[1., 0., 1., 1.],
+            [2., 0., 1., 1.],
+            [3., 0., 1., 1.],
+            [4., 0., 1., 1.]], device='cuda:0')
+    
+    Concatenate
+    tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
+            [2., 0., 1., 1., 2., 0., 1., 1., 2., 0., 1., 1.],
+            [3., 0., 1., 1., 3., 0., 1., 1., 3., 0., 1., 1.],
+            [4., 0., 1., 1., 4., 0., 1., 1., 4., 0., 1., 1.]], device='cuda:0')
+    
+    Multiply
+    tensor.mul(tensor)
+     tensor([[ 1.,  0.,  1.,  1.],
+            [ 4.,  0.,  1.,  1.],
+            [ 9.,  0.,  1.,  1.],
+            [16.,  0.,  1.,  1.]], device='cuda:0')
+    
+    tensor * tensor
+     tensor([[ 1.,  0.,  1.,  1.],
+            [ 4.,  0.,  1.,  1.],
+            [ 9.,  0.,  1.,  1.],
+            [16.,  0.,  1.,  1.]], device='cuda:0')
+    tensor.matmul(tensor.T)
+     tensor([[ 3.,  4.,  5.,  6.],
+            [ 4.,  6.,  8., 10.],
+            [ 5.,  8., 11., 14.],
+            [ 6., 10., 14., 18.]], device='cuda:0')
+    
+    tensor @ tensor.T
+     tensor([[ 3.,  4.,  5.,  6.],
+            [ 4.,  6.,  8., 10.],
+            [ 5.,  8., 11., 14.],
+            [ 6., 10., 14., 18.]], device='cuda:0')
+
+%% Cell type:markdown id: tags:
+
+### 1.2.5 Final exercise: compute *h*
+
+▶▶ **Compute the tensor h, using the same data for x and W as at the beginning of this TP.**
+
+```
+x = np.array([1,2])
+rng = np.random.default_rng(seed=42)
+W = rng.random((3, 2))
+```
+
+Important note: when multiplying matrices, we need to have the same data type, e.g. not **x** with *int* and **W** with *float*.
+So you have to say that the vector **x** has the data type *float*. Two ways:
+* from the initialization: **x = torch.tensor([1,2], dtype=float)**
+* from any tensor: **x = x.to( torch.float64)** (here using only **float** would give *float32*, not what we want)
+
+%% Cell type:code id: tags:
+
+``` 
+# --------------------------------------------------------
+# TODO: Write the code to compute h = W.x+b
+# --------------------------------------------------------
+
+# h = x.W + b
+x = torch.tensor([1,2])
+x = x.to( torch.float64) # be careful: using just 'float' here gives float32
+## OR
+#x = torch.tensor([1,2], dtype=float)
+print("Our input vector with 2 elements:\n", x)
+print( "x shape:", x.shape)
+print( "x type:", x.dtype )
+
+# Generate a random matrix (with e generator, for reproducible results)
+rng = np.random.default_rng(seed=42)
+W = rng.random((3, 2))
+W_t = torch.from_numpy(W)
+print("\n Our weight matrix, of shape 3x2:\n", W)
+print( "W shape:", W_t.shape)
+print( "W type:", W.dtype)
+
+# Bias, a scalar
+b = 1.0
+
+# Now, try to multiply
+h_t = W_t.matmul(x) + b
+print("\n Our h layer:\n", h_t)
+print( "h shape:", h_t.shape)
+```
+
+%% Output
+
+    Our input vector with 2 elements:
+     tensor([1., 2.], dtype=torch.float64)
+    x shape: torch.Size([2])
+    x type: torch.float64
+    
+     Our weight matrix, of shape 3x2:
+     [[0.77395605 0.43887844]
+     [0.85859792 0.69736803]
+     [0.09417735 0.97562235]]
+    W shape: torch.Size([3, 2])
+    W type: float64
+    
+     Our h layer:
+     tensor([2.6517, 3.2533, 3.0454], dtype=torch.float64)
+    h shape: torch.Size([3])
+
+%% Cell type:markdown id: tags:
+
+### Last minor note
+
+%% Cell type:code id: tags:
+
+``` 
+## Operations that have a _ suffix are in-place. For example: x.copy_(y), x.t_(), will change x.
+print(tensor, "\n")
+tensor.add_(5)
+print(tensor)
+```
+
+%% Output
+
+    tensor([[1., 0., 1., 1.],
+            [2., 0., 1., 1.],
+            [3., 0., 1., 1.],
+            [4., 0., 1., 1.]], device='cuda:0')
+    
+    tensor([[6., 5., 6., 6.],
+            [7., 5., 6., 6.],
+            [8., 5., 6., 6.],
+            [9., 5., 6., 6.]], device='cuda:0')
+
+%% Cell type:markdown id: tags:
+
+# Part 2: Feedforward Neural Network
+
+In this practical session, we will explore a simple neural network architecture for NLP applications ; specifically, we will train a feedforward neural network for sentiment analysis, using the same dataset of reviews as in the previous session.  We will also keep the bag of words representation.
+
+
+Sources:
+* This TP is inspired by a TP by Tim van de Cruys
+* https://www.deeplearningwizard.com/deep_learning/practical_pytorch/pytorch_feedforward_neuralnetwork/
+* https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html
+* https://medium.com/swlh/sentiment-classification-using-feed-forward-neural-network-in-pytorch-655811a0913f
+* https://www.deeplearningwizard.com/deep_learning/practical_pytorch/pytorch_feedforward_neuralnetwork/
+
+%% Cell type:code id: tags:
+
+``` 
+# Useful imports
+import pandas as pd
+import numpy as np
+import re
+import sklearn
+
+from sklearn.feature_extraction.text import CountVectorizer
+```
+
+%% Cell type:code id: tags:
+
+``` 
+# Path to data
+train_path = "allocine_train.tsv"
+dev_path = "allocine_dev.tsv"
+```
+
+%% Cell type:markdown id: tags:
+
+## 2.1 Read and load the data
+
+Here we will keep the bag of word representation, as in the previous session.
+
+You can find different ways of dealing with the input data in PyTorch. The simplest solution is to use the DataLoader from PyTorch:
+* the doc here https://pytorch.org/docs/stable/data.html and here https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
+* an example of use, with numpy array: https://www.kaggle.com/arunmohan003/sentiment-analysis-using-lstm-pytorch
+
+
+
+
+
+
+You can also find many datasets for text ready to load in pytorch on: https://pytorch.org/text/stable/datasets.html
+
+%% Cell type:markdown id: tags:
+
+#### 2.1.1 Build BoW vectors (code given)
+
+The code below allows to use scikit methods you already know to generate the bag of word representation.
+
+%% Cell type:code id: tags:
+
+``` 
+# This will be the size of the vectors reprensenting the input
+MAX_FEATURES = 5000
+
+def vectorize_data( data_path, vectorizer=None ):
+  data_df = pd.read_csv( data_path, header=0,
+                    delimiter="\t", quoting=3)
+  # If an existing vectorizer is not given, initialize the "CountVectorizer"
+  # object, which is scikit-learn's bag of words tool.
+  if not vectorizer:
+    vectorizer = CountVectorizer(
+        analyzer = "word",
+        max_features = MAX_FEATURES
+    )
+    vectorizer.fit(data_df["review"])
+  # Then transform the data
+  x_data = vectorizer.transform(data_df["review"])
+  # Vectorize also the labels
+  y_data = np.asarray(data_df["sentiment"])
+  return x_data, y_data, vectorizer
+
+x_train, y_train, vectorizer = vectorize_data( train_path )
+x_dev, y_dev, _ = vectorize_data( dev_path, vectorizer )
+
+```
+
+%% Cell type:markdown id: tags:
+
+Count_Vectorizer returns sparse arrays (for computational reasons) but PyTorch will expect dense input:
+
+%% Cell type:code id: tags:
+
+``` 
+# from sparse to dense
+x_train = x_train.toarray()
+x_dev = x_dev.toarray()
+
+print("Train:", x_train.shape)
+print("Dev:", x_dev.shape)
+```
+
+%% Output
+
+    Train: (5027, 5000)
+    Dev: (549, 5000)
+
+%% Cell type:markdown id: tags:
+
+#### 2.1.2 Transform to tensors
+
+▶▶ **Create a dataset object within the PyTorch library:**
+
+The easiest way to load datasets with PyTorch is to use the DataLoader class. Here we're going to give our numpy array to this class, and first, we need to transform our data to tensors. Follow the following steps:
+
+
+%% Cell type:code id: tags:
+
+``` 
+# Useful imports
+import torch
+from torch.utils.data import TensorDataset, DataLoader
+```
+
+%% Cell type:markdown id: tags:
+
+* 1- **torch.from_numpy( A_NUMPY_ARRAY )**: transform your array into a tensor
+  * Note: you need to transform tensor type to float (for x), with **MY_TENSOR.to(torch.float)** (or cryptic error saying it was expecting long...).
+  * Print the shape of the tensor for your training data.
+
+https://pytorch.org/docs/stable/generated/torch.from_numpy.html#torch-from-numpy
+
+%% Cell type:code id: tags:
+
+``` 
+# create Tensor dataset
+tensor_x_train = torch.from_numpy(x_train).to(torch.float)
+print( tensor_x_train.shape )
+```
+
+%% Output
+
+    torch.Size([5027, 5000])
+
+%% Cell type:markdown id: tags:
+
+* 2- **torch.utils.data.TensorDataset(INPUT_TENSOR, TARGET_TENSOR)**: Dataset wrapping tensors.
+  * Take tensors as inputs
+
+https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset
+
+%% Cell type:code id: tags:
+
+``` 
+train_data = TensorDataset( torch.from_numpy(x_train).to(torch.float),
+                           torch.from_numpy(y_train))
+```
+
+%% Cell type:markdown id: tags:
+
+* 3- **torch.utils.data.DataLoader**: many arguments in the constructor:
+  * In particular, *dataset* of the type TensorDataset can be used
+  * We'd rather shuffling our data in general, can be done here by changing the value of one argument
+  * Note also the possibility to change the batch_size, we'll talk about it later
+
+https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
+
+```
+DataLoader(
+    dataset,
+    batch_size=1,
+    shuffle=False,
+    num_workers=0,
+    collate_fn=None,
+    pin_memory=False,
+ )
+ ```
+
+%% Cell type:code id: tags:
+
+``` 
+# dataloaders
+batch_size = 1 #no batch, or batch = 1
+
+# make sure to SHUFFLE your data
+train_loader = DataLoader( train_data, shuffle=True, batch_size=batch_size )
+```
+
+%% Cell type:markdown id: tags:
+
+## 2.2 Neural Network
+
+Now we can build our learning model.
+
+For this TP, we're going to walk through the code of a **simple feedforward neural network, with one hidden layer**.
+
+This network takes as input bag of words vectors, exactly as our 'classic' models: each review is represented by a vector of the size the number of tokens in the vocabulary with '1' when a word is present and '0' for the other words.
+
+%% Cell type:markdown id: tags:
+
+### 2.2.1 Questions
+
+▶▶ **What is the input dimension?**
+
+▶▶ **What is the output dimension?**
+
+%% Cell type:markdown id: tags:
+
+▶▶ **What is the input dimension?** --> MAX FEATURES = 5000
+
+▶▶ **What is the output dimension?** --> number of classes = 2
+
+%% Cell type:code id: tags:
+
+``` 
+# Useful imports
+import torch
+import torch.nn as nn
+```
+
+%% Cell type:markdown id: tags:
+
+### 2.2.2 Write the skeleton of the class
+
+▶▶ We're going to **define our own neural network type**, by defining a new class:
+* The class is called **FeedforwardNeuralNetModel**
+* it inherits from the class **nn.Module**
+* the constructor takes the following arguments:
+  * size of the input (i.e. **input_dim**)
+  * size of the hidden layer (i.e. **hidden_dim**)
+  * size of the output layer (i.e. **output_dim**)
+* in the constructor, we will call the constructor of the parent class
+
+
+%% Cell type:code id: tags:
+
+``` 
+# Start to define the class corresponding to our type of neural network
+
+```
+
+%% Cell type:code id: tags:
+
+``` 
+class FeedforwardNeuralNetModel(nn.Module):
+    def __init__(self, input_dim, hidden_dim, output_dim):
+        super(FeedforwardNeuralNetModel, self).__init__()
+```
+
+%% Cell type:markdown id: tags:
+
+### 2.2.3 Constructor
+
+▶▶ To continue the definition of our class, we need to explain how are built each layer of our network.
+
+More precisely, we're going to define a few fields:
+* a function corresponding to the action of our hidden layer:
+  * what kind of function is it ?
+  * you need to indicate the size of the input and output for this function, what are they?
+* a non linear function, that will be used on the ouput of our hidden layer
+* a final output function:
+  * what kind of function is it ?
+  * you need to indicate the size of the input and output for this function, what are they?
+
+All the functions that can be used in Pytorch are defined here: https://pytorch.org/docs/stable/nn.functional.html
+
+Do you see things that you know?
+
+Hint: here you define fields of your class, these fields corresponding to specific kind of functions.
+E.g. you're going to initialize a field such as **self.fc1=SPECIFIC_TYPE_OF_FCT(expected arguments)**.
+
+%% Cell type:code id: tags:
+
+``` 
+# Continue the definition of the class by defining three functions in your constructor
+
+```
+
+%% Cell type:code id: tags:
+
+``` 
+class FeedforwardNeuralNetModel(nn.Module):
+    def __init__(self, input_dim, hidden_dim, output_dim):
+        super(FeedforwardNeuralNetModel, self).__init__()
+
+        # Linear function ==> W1
+        self.fc1 = nn.Linear(input_dim, hidden_dim)
+
+        # Non-linearity ==> g
+        self.sigmoid = nn.Sigmoid()
+
+        # Linear function (readout) ==> W2
+        self.fc2 = nn.Linear(hidden_dim, output_dim)
+```
+
+%% Cell type:markdown id: tags:
+
+And that's it ;)
+
+%% Cell type:markdown id: tags:
+
+### 2.2.4 The **forward** method
+
+The main function we have to write when defining a neural network is called the **forward** function.
+This function computes the outputs of the network (the logit), it is thus used to train the network.
+It details how we apply the functions defined in the constructor.
+
+Let's define this function, with the following signature, where x is the input to the network:
+```
+def forward(self, x):
+```
+
+▶▶ Follow the steps:
+* 1- Apply the first linear functiond defined in the constructor to **x**, i.e. go through the hidden layer.
+* 2- Apply the non linear function to the output of step 1, i.e. use the activation function.
+* 3- Apply the second linear functiond defined in the constructor to the output of step 2, i.e. go through the output layer.
+* 4- Return the output of step 3.
+
+You're done!
+
+%% Cell type:code id: tags:
+
+``` 
+# Define the forward function, used to make all the calculations
+# through the network
+def forward(self, x):
+  ''' y = g(x.W1+b).W2 '''
+  # ...
+```
+
+%% Cell type:code id: tags:
+
+``` 
+class FeedforwardNeuralNetModel(nn.Module):
+    def __init__(self, input_dim, hidden_dim, output_dim):
+        super(FeedforwardNeuralNetModel, self).__init__()
+        # Linear function ==> W1
+        self.fc1 = nn.Linear(input_dim, hidden_dim)
+
+        # Non-linearity ==> g
+        self.sigmoid = nn.Sigmoid()
+
+        # Linear function (readout) ==> W2
+        self.fc2 = nn.Linear(hidden_dim, output_dim)
+
+    def forward(self, x):
+        '''
+        y = g(x.W1+b).W2
+        '''
+        # Linear function  # LINEAR ==> x.W1+b
+        out1 = self.fc1(x)
+
+        # Non-linearity  # NON-LINEAR ==> h1 = g(x.W1+b)
+        out2 = self.sigmoid(out1)
+
+        # Linear function (readout)  # LINEAR ==> y = h1.W2
+        out3 = self.fc2(out2)
+        return out3
+```
+
+%% Cell type:markdown id: tags:
+
+## 2.3 Training the network
+
+Now we can use our beautiful class to define and then train our own neural network.
+
+%% Cell type:markdown id: tags:
+
+### 2.3.1 Hyper-parameters
+
+We need to set up the values for the hyper-parameters, and define the form of the loss and the optimization methods.
+
+▶▶ **Check that you understand what are each of the variables below**
+* one that you prabably don't know is the learning rate, we'll explain it in the next course. Broadly speaking, it corresponds to the amount of update used during training.
+
+%% Cell type:code id: tags:
+
+``` 
+# Many choices here!
+VOCAB_SIZE = MAX_FEATURES
+input_dim = VOCAB_SIZE
+hidden_dim = 4
+output_dim = 2
+num_epochs = 5
+learning_rate = 0.1
+```
+
+%% Cell type:markdown id: tags:
+
+### 2.3.2 Loss function
+
+Another thing that has to be decided is the kind of loss function we want to use.
+Here we use a common one, called CrossEntropy.
+We will come back in more details on this loss.
+One important note is that this function in PyTorch includes the SoftMax function that should be applied after the output layer to get labels.
+
+%% Cell type:code id: tags:
+
+``` 
+criterion = nn.CrossEntropyLoss()
+```
+
+%% Cell type:markdown id: tags:
+
+### 2.3.3 Initialization of the model
+
+Now you can instantiate your class: define a model that is of the type FeedforwardNeuralNetModel using the values defined before as hyper-parameters.
+
+%% Cell type:code id: tags:
+
+``` 
+# Initialization of the model
+# ...
+```
+
+%% Cell type:code id: tags:
+
+``` 
+# Initialization of the model
+model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)
+```
+
+%% Cell type:markdown id: tags:
+
+### 2.3.4 Optimizer
+
+At last, we need to indicate the method we want to use to optimize our network.
+Here, we use a common one called Stochastic Gradient Descent.
+We will also go back on that later on.
+
+Note that its arguments are:
+* the parameters of our models (the Ws)
+* the learning rate
+
+Based on these information, it can make the necessary updates.
+
+%% Cell type:code id: tags:
+
+``` 
+optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
+```
+
+%% Cell type:markdown id: tags:
+
+### Training the network
+
+A simple code to train the neural network is given below.
+
+▶▶ **Run the code and look at the loss after each training step.**
+
+%% Cell type:code id: tags:
+
+``` 
+# Start training
+for epoch in range(num_epochs):
+    train_loss, total_acc, total_count = 0, 0, 0
+
+    # for each instance + its associated label
+    for input, label in train_loader:
+
+        # Clearing the accumulated gradients
+        # torch *accumulates* gradients. Before passing in a
+        # new instance, you need to zero out the gradients from the old
+        # instance
+        # Clear gradients w.r.t. parameters
+        optimizer.zero_grad()
+
+        # ==> Forward pass to get output/logits
+        # = apply all our functions: y = g(x.W1+b).W2
+        outputs = model( input )
+
+        # ==> Calculate Loss: softmax --> cross entropy loss
+        loss = criterion(outputs, label)
+
+        # Getting gradients w.r.t. parameters
+        # Here is the way to find how to modify the parameters in
+        # order to lower the loss
+        loss.backward()
+
+        # ==> Updating parameters: you don t need to provide the loss here,
+        # when computing the loss, the information is saved in the parameters
+        # (more precisely, doing backward computes the gradients for all tensors,
+        # and these gradients are saved by each tensor)
+        optimizer.step()
+
+        # -- a useful print
+        # Accumulating the loss over time
+        train_loss += loss.item()
+        total_acc += (outputs.argmax(1) == label).sum().item()
+        total_count += label.size(0)
+
+    # Compute accuracy on train set at each epoch
+    print('Epoch: {}. Loss: {}. ACC {} '.format(epoch,
+                                                train_loss/x_train.shape[0],
+                                                total_acc/x_train.shape[0]))
+
+    total_acc, total_count = 0, 0
+    train_loss = 0
+```
+
+%% Output
+
+    Epoch: 0. Loss: 0.5252582924727662. ACC 0.7376168689078973
+    Epoch: 1. Loss: 0.3890879917333115. ACC 0.8329023274318679
+    Epoch: 2. Loss: 0.3207374065181158. ACC 0.8651283071414363
+    Epoch: 3. Loss: 0.2718604214944098. ACC 0.8909886612293615
+    Epoch: 4. Loss: 0.2544953779364264. ACC 0.903322060871295
+
+%% Cell type:markdown id: tags:
+
+### Evaluate the model
+
+%% Cell type:code id: tags:
+
+``` 
+# Useful imports
+from sklearn.metrics import classification_report
+```
+
+%% Cell type:code id: tags:
+
+``` 
+# create Tensor dataset
+valid_data = TensorDataset( torch.from_numpy(x_dev).to(torch.float),
+                           torch.from_numpy(y_dev))
+valid_loader = DataLoader( valid_data )
+
+
+# Disabling gradient calculation is useful for inference,
+# when you are sure that you will not call Tensor.backward().
+predictions, gold = [], []
+with torch.no_grad():
+    for input, label in valid_loader:
+        probs = model(input)
+        predictions.append( torch.argmax(probs, dim=1).cpu().numpy()[0] )
+        gold.append(int(label))
+
+print(classification_report(gold, predictions))
+```
+
+%% Output
+
+                  precision    recall  f1-score   support
+    
+               0       0.77      0.87      0.82       230
+               1       0.90      0.82      0.86       319
+    
+        accuracy                           0.84       549
+       macro avg       0.84      0.84      0.84       549
+    weighted avg       0.85      0.84      0.84       549
+    
+
+%% Cell type:markdown id: tags:
+
+## 3. Move to GPU
+
+Below we indicate the modifications needed to make all the computations on GPU instead of CPU.
+
+%% Cell type:code id: tags:
+
+``` 
+## 1- Define the device to be used
+
+# CUDA for PyTorch
+use_cuda = torch.cuda.is_available()
+device = torch.device("cuda" if use_cuda else "cpu")
+print(device)
+```
+
+%% Output
+
+    cuda
+
+%% Cell type:code id: tags:
+
+``` 
+## 2- No change here
+
+import torch
+import torch.nn as nn
+
+class FeedforwardNeuralNetModel(nn.Module):
+    def __init__(self, input_dim, hidden_dim, output_dim):
+        super(FeedforwardNeuralNetModel, self).__init__()
+        # Linear function ==> W1
+        self.fc1 = nn.Linear(input_dim, hidden_dim)
+
+        # Non-linearity ==> g
+        self.sigmoid = nn.Sigmoid()
+
+        # Linear function (readout) ==> W2
+        self.fc2 = nn.Linear(hidden_dim, output_dim)
+
+    def forward(self, x):
+        '''
+        y = g(x.W1+b).W2
+        '''
+        # Linear function  # LINEAR ==> x.W1+b
+        out = self.fc1(x)
+
+        # Non-linearity  # NON-LINEAR ==> h1 = g(x.W1+b)
+        out = self.sigmoid(out)
+
+        # Linear function (readout)  # LINEAR ==> y = h1.W2
+        out = self.fc2(out)
+        return out
+```
+
+%% Cell type:code id: tags:
+
+``` 
+## 3- Move your model to the GPU
+
+# Initialization of the model
+model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)
+
+optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
+
+## ------------ CHANGE HERE -----------------
+model = model.to(device)
+```
+
+%% Cell type:code id: tags:
+
+``` 
+## 4- Move your data to GPU
+
+# Start training
+for epoch in range(num_epochs):
+    train_loss, total_acc, total_count = 0, 0, 0
+    for input, label in train_loader:
+        ## ------------ CHANGE HERE -----------------
+        input = input.to(device)
+        label = label.to(device)
+
+        # Clear gradients w.r.t. parameters
+        optimizer.zero_grad()
+
+        # Forward pass to get output/logits
+        outputs = model( input )
+
+        # Calculate Loss: softmax --> cross entropy loss
+        loss = criterion(outputs, label)
+
+        # Getting gradients w.r.t. parameters
+        loss.backward()
+
+        # Updating parameters
+        optimizer.step()
+
+        # Accumulating the loss over time
+        train_loss += loss.item()
+        total_acc += (outputs.argmax(1) == label).sum().item()
+        total_count += label.size(0)
+
+    # Compute accuracy on train set at each epoch
+    print('Epoch: {}. Loss: {}. ACC {} '.format(epoch,
+                                                train_loss/x_train.shape[0],
+                                                total_acc/x_train.shape[0]))
+
+    total_acc, total_count = 0, 0
+    train_loss = 0
+```
+
+%% Output
+
+    Epoch: 0. Loss: 0.5239484066388459. ACC 0.7298587626815198
+    Epoch: 1. Loss: 0.3703767738312165. ACC 0.8414561368609509
+    Epoch: 2. Loss: 0.3024231172786994. ACC 0.8726874875671374
+    Epoch: 3. Loss: 0.2766759563584567. ACC 0.8874079968171872
+    Epoch: 4. Loss: 0.2579045861314929. ACC 0.8959618062462701
+
+%% Cell type:code id: tags:
+
+``` 
+# -- 5- Again, move your data to GPU
+
+predictions = []
+gold = []
+
+with torch.no_grad():
+    for input, label in valid_loader:
+        ## ------------ CHANGE HERE -----------------
+        input = input.to(device)
+        probs = model(input)
+        #Here, we need CPU: else, it will generate the following error
+        # can't convert cuda:0 device type tensor to numpy.
+        # Use Tensor.cpu() to copy the tensor to host memory first.
+        # (if we need a numpy array)
+        predictions.append( torch.argmax(probs, dim=1).cpu().numpy()[0] )
+        #print( probs )
+        #print( torch.argmax(probs, dim=1) ) # Return the index of the max value
+        #print( torch.argmax(probs, dim=1).cpu().numpy()[0] )
+        gold.append(int(label))
+
+print(classification_report(gold, predictions))
+```
+
+%% Output
+
+                  precision    recall  f1-score   support
+    
+               0       0.72      0.89      0.80       230
+               1       0.90      0.75      0.82       319
+    
+        accuracy                           0.81       549
+       macro avg       0.81      0.82      0.81       549
+    weighted avg       0.83      0.81      0.81       549
+