diff --git a/notebooks/TP3_m2LiTL_WordEmbeddings_SUJET_2425.ipynb b/notebooks/TP3_m2LiTL_WordEmbeddings_SUJET_2425.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..e2cf62c4e36c1c5e94c31b42831da49e101015cb
--- /dev/null
+++ b/notebooks/TP3_m2LiTL_WordEmbeddings_SUJET_2425.ipynb
@@ -0,0 +1,904 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "id": "fleet-worry",
+      "metadata": {
+        "id": "fleet-worry"
+      },
+      "source": [
+        "# TP3: Word embeddings\n",
+        "Master LiTL\n",
+        "\n",
+        "\n",
+        "In this practical session, we will explore the generation of word embeddings.\n",
+        "\n",
+        "We will make use of *gensim* for generating word embeddings.\n",
+        "If you want to use your own computer you will need to make sure it is installed (e.g. using the command ```pip```).\n",
+        "If you’re using Anaconda/Miniconda, you can use the command ```conda install <modulename>```.\n",
+        "\n",
+        "Sources:\n",
+        "- Practical from T. van de Cruys\n",
+        "- https://machinelearningmastery.com/develop-word-embeddings-python-gensim/\n",
+        "- https://radimrehurek.com/gensim/models/word2vec.html\n",
+        "- https://www.shanelynn.ie/word-embeddings-in-python-with-spacy-and-gensim/: see an example based on the 20NewsGroup corpus\n",
+        "- http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/\n",
+        "- (not used but seems interesting: https://www.machinelearningplus.com/nlp/gensim-tutorial/#14howtotrainword2vecmodelusinggensim)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "abandoned-basketball",
+      "metadata": {
+        "id": "abandoned-basketball"
+      },
+      "source": [
+        "## 1- Look at the data\n",
+        "\n",
+        "Upload the data: *corpus_food.txt.gz*.\n",
+        "The data come from blogs on cooking.\n",
+        "\n",
+        "You cab take a look at your data using a terminal and the following commands:\n",
+        "\n",
+        "* Number of lines:\n",
+        "```\n",
+        "$ wc -l corpus_food.txt\n",
+        "$ 1161270 corpus_food.txt\n",
+        "```\n",
+        "\n",
+        "* first ten lines:\n",
+        "```\n",
+        "$ head -n 10 corpus_food.txt\n",
+        "$ -mention meilleur espoir féminin : on aurait pu ajouter ioudgine .\n",
+        "malheureusement , comme presque tout ce qui est bon , c' est bourré de beurre et de sucre .\n",
+        "j' avais déjà façonné une recette allégée pour weight watchers mais elle contenait encore du beurre et un peu de sucre .\n",
+        "aujourd' hui je vous propose cette recette que j' ai improvisée hier soir , sans beurre et sans sucre .\n",
+        "n' empêche que pour acheter sa propre baguette magique ou pour déguster des bières au beurre , on pourrait partir au bout du monde !\n",
+        "menthe , sucre de canne , rhum , citron vert , sont vos meilleurs amis en soirée ?\n",
+        "parfois , on rêve d' un bon verre de vin .\n",
+        "la marque de biscuits oreo a pensé aux gourmandes et aux gourmands , et s' apprête à lancer des gâteaux dotés de nouvelles saveurs : caramel beurre salé et noix de coco .\n",
+        "rangez les parapluies , et sortez le sel et le citron !\n",
+        "le vin on adore le savourer avec modération .\n",
+        "```\n",
+        "\n",
+        "Première phrase bizarre mais sinon le début : http://www.leblogdelaura.com/2017/03/pancakes-sans-sucre-et-sans-graisses.html"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 2 - Build word embeddings\n",
+        "\n",
+        "We  will  use *gensim* in  order  to  induce  word  embeddings  from  text.\n",
+        "*gensim* is  a  vector  space modeling and topic modeling toolkit for python, and contains an efficient implementation of the *word2vec* algorithms.\n",
+        "\n",
+        "*word2vec* consists of two different algorithms: *skipgram* (sg) and *continuous-bag-of-words* (cbow).\n",
+        "The underlying prediction task of the former is to estimate the context words from the target word ; the prediction task of the latter is to estimate the target word from the sum of the context words.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "FJbl3GWfFYms"
+      },
+      "id": "FJbl3GWfFYms"
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 2.1 - Train a model\n",
+        "▶▶**Run the following code: it will build word embeddings based on the food corpus using the Word2Vec algorithm.**\n",
+        "The model will be saved on your disk."
+      ],
+      "metadata": {
+        "id": "7VGe1L4jJ9ra"
+      },
+      "id": "7VGe1L4jJ9ra"
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "impressed-consumer",
+      "metadata": {
+        "id": "impressed-consumer"
+      },
+      "outputs": [],
+      "source": [
+        "# potential Error: need to update smart_open with conda install smart_open==2.0.0 or pip install smart_open==2.0.0\n",
+        "\n",
+        "# construct word2vec model using gensim\n",
+        "\n",
+        "from gensim.models import Word2Vec\n",
+        "\n",
+        "import gzip\n",
+        "import logging\n",
+        "\n",
+        "import time\n",
+        "\n",
+        "# set up logging for gensim\n",
+        "logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',\n",
+        "                    level=logging.INFO)\n",
+        "\n",
+        "# we define a PlainTextCorpus class; this will provide us with an\n",
+        "# iterator over the corpus (so that we don't have to load the corpus\n",
+        "# into memory)\n",
+        "class PlainTextCorpus(object):\n",
+        "    def __init__(self, fileName):\n",
+        "        self.fileName = fileName\n",
+        "\n",
+        "    def __iter__(self):\n",
+        "        for line in gzip.open(self.fileName, 'rt', encoding='utf-8'):\n",
+        "            yield  line.split()\n",
+        "\n",
+        "# -- Instantiate the corpus class using corpus location\n",
+        "sentences = PlainTextCorpus('corpus_food.txt.gz')\n",
+        "\n",
+        "# -- Trianing\n",
+        "# we only take into account words with a frequency of at least 50, and\n",
+        "# we iterate over the corpus only once\n",
+        "model = Word2Vec(sentences, min_count=50, epochs=1, sorted_vocab=1)\n",
+        "\n",
+        "# -- Finally, save the constructed model to disk\n",
+        "# When getting started, you can save the learned model in ASCII format and review the contents.\n",
+        "model.wv.save_word2vec_format('model_word2vec_food.txt', binary=False)\n",
+        "# by default, it is saved as binary\n",
+        "model.save('model_word2vec_food.bin')\n",
+        "# a model saved can be load again using:\n",
+        "#model = Word2Vec.load('model_word2vec_food.bin')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "bottom-tobacco",
+      "metadata": {
+        "id": "bottom-tobacco"
+      },
+      "source": [
+        "### 2.2 A few remarks:\n",
+        "\n",
+        "From: http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/\n",
+        "\n",
+        "#### Downsampling\n",
+        "\n",
+        "Subsampling frequent words to decrease the number of training examples.\n",
+        "\n",
+        "There are two “problems” with common words like “the”:\n",
+        "* When looking at word pairs, (“fox”, “the”) doesn’t tell us much about the meaning of “fox”. “the” appears in the context of pretty much every word.\n",
+        "* We will have many more samples of (“the”, …) than we need to learn a good vector for “the”.\n",
+        "\n",
+        "Word2Vec implements a “subsampling” scheme to address this.\n",
+        "For each word we encounter in our training text, there is a chance that we will effectively delete it from the text.\n",
+        "The probability that we cut the word is related to the word’s frequency.\n",
+        "\n",
+        "If we have a window size of 10, and we remove a specific instance of “the” from our text:\n",
+        "\n",
+        "* As we train on the remaining words, “the” will not appear in any of their context windows.\n",
+        "* We’ll have 10 fewer training samples where “the” is the input word.\n",
+        "\n",
+        "There is also a parameter in the code named ‘sample’ which controls how much subsampling occurs, and the default value is 0.001. Smaller values of ‘sample’ mean words are less likely to be kept.\n",
+        "\n",
+        "\n",
+        "#### Negative sampling (for SkipGram)\n",
+        "\n",
+        "Training a neural network means taking a training example and adjusting all of the neuron weights slightly so that it predicts that training sample more accurately.\n",
+        "In other words, each training sample will tweak all of the weights in the neural network --> prohibitive\n",
+        "\n",
+        "Negative sampling addresses this by having each training sample only modify a small percentage of the weights, rather than all of them.\n",
+        "\n",
+        "When training the network on the word pair (“fox”, “quick”), i.e. 'fox' is the target, 'quick' a context word: “quick” -> 1; all of the other thousands of output neurons -> 0.\n",
+        "\n",
+        "With negative sampling, we are instead going to randomly select just a small number of “negative” words (let’s say 5) to update the weights for: “quick” -> 1; 5 other random words -> 0.\n",
+        "\n",
+        "Recall that the output layer of our model has a weight matrix that’s dx|V|, e.g. 300 x 23,000. So we will just be updating the weights for our positive word (“quick”), plus the weights for 5 other words that we want to output 0. That’s a total of 6 output neurons, and 1,800 weight values total. That’s only 0.06% of the 3M weights in the output layer! (In the hidden layer, only the weights for the input word are updated -- this is true whether you’re using Negative Sampling or not)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 2.3 Print information about the model learned\n",
+        "Note that the corpus is food-related, so food-related terms will work best.\n",
+        "\n",
+        "You can print the vocabulary using:\n",
+        "```\n",
+        "vocabulary = model.wv.key_to_index\n",
+        "```\n",
+        "\n",
+        "It is possible to look at the individual word embeddings using the following :\n",
+        "```\n",
+        "model.wv[’citron’]\n",
+        "```\n",
+        "```\n",
+        "print(model.vw['citron'])\n",
+        "```\n",
+        "\n",
+        "Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.\n",
+        "\n",
+        "▶▶ **Print the vocabulary and then the vectors for a few terms, e.g. 'citron' and 'fruit'. Do they seem close?**"
+      ],
+      "metadata": {
+        "id": "ZAgXv40pHQXs"
+      },
+      "id": "ZAgXv40pHQXs"
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "IKJG3nNpHQjY"
+      },
+      "id": "IKJG3nNpHQjY",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "BEbhGilgJJil"
+      },
+      "id": "BEbhGilgJJil",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "U2u5ltKhJN0j"
+      },
+      "id": "U2u5ltKhJN0j",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 3 - Compute word similarity\n",
+        "\n",
+        "You can now compute similarity measure between word using\n",
+        "```\n",
+        "model.wv.similarity('manger', goûter')\n",
+        "```\n",
+        "\n",
+        "You can also print the most similar words (which is measured by cosine similarity between the word vectors) by issuing the following command :\n",
+        "```\n",
+        "model.wv.most_similar(’citron’)\n",
+        "```\n",
+        "▶▶**Print the similarity between some terms, e.g. ('manger', 'boire'), ('manger', 'dormir') ... Do the results seem coherent?**\n",
+        "\n",
+        "▶▶**Print the words that are most similar to: 'citron', 'manger' and other words, e.g. not related to food. Do the results seem coherent?**"
+      ],
+      "metadata": {
+        "id": "eSm6IZ3IFTP9"
+      },
+      "id": "eSm6IZ3IFTP9"
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "84TOaPCkOm1z"
+      },
+      "id": "84TOaPCkOm1z",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "D-iA3eJsOm4l"
+      },
+      "id": "D-iA3eJsOm4l",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "Ux4_ce8hOm9B"
+      },
+      "id": "Ux4_ce8hOm9B",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "id": "bigger-drink",
+      "metadata": {
+        "id": "bigger-drink"
+      },
+      "source": [
+        "##  4 - Exercise: change the parameters values\n",
+        "\n",
+        "As a default, the *word2vec* module creates **word embeddings of size 100**, using a **cbow model** with a **window of 5 words**.\n",
+        "\n",
+        "▶▶**Train a model with different parameters:**\n",
+        "- using a different window size,\n",
+        "- using a different embedding size\n",
+        "- using *skipgram*,\n",
+        "\n",
+        "Inspect the results (similar words) qualitatively. Do the similarity computations change ? Are they better or worse ?\n",
+        "\n",
+        "See doc: https://radimrehurek.com/gensim_3.8.3/models/word2vec.html"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from gensim.models import Word2Vec\n",
+        "import gzip\n",
+        "\n",
+        "# we define a PlainTextCorpus class; this will provide us with an\n",
+        "# iterator over the corpus (so that we don't have to load the corpus\n",
+        "# into memory)\n",
+        "class PlainTextCorpus(object):\n",
+        "    def __init__(self, fileName):\n",
+        "        self.fileName = fileName\n",
+        "\n",
+        "    def __iter__(self):\n",
+        "        for line in gzip.open(self.fileName, 'rt', encoding='utf-8'):\n",
+        "            yield  line.split()\n",
+        "\n",
+        "# -- Instantiate the corpus class using corpus location\n",
+        "sentences = PlainTextCorpus('corpus_food.txt.gz')"
+      ],
+      "metadata": {
+        "id": "t29EU--0Txre"
+      },
+      "id": "t29EU--0Txre",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "GOTCCxFKO6pk"
+      },
+      "id": "GOTCCxFKO6pk",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "S6gLHxxgO6so"
+      },
+      "id": "S6gLHxxgO6so",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "7dm3n7p2O6vu"
+      },
+      "id": "7dm3n7p2O6vu",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "bJYQRcMZO6yl"
+      },
+      "id": "bJYQRcMZO6yl",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "id": "satellite-colombia",
+      "metadata": {
+        "id": "satellite-colombia"
+      },
+      "source": [
+        "#### According to Mikolov\n",
+        "\n",
+        "**Skip-gram**: works well with small amount of the training data, represents well even rare words or phrases.\n",
+        "\n",
+        "**CBOW**: several times faster to train than the skip-gram, slightly better accuracy for the frequent words"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "least-justice",
+      "metadata": {
+        "id": "least-justice"
+      },
+      "source": [
+        "## 5 - Analogical reasoning\n",
+        "\n",
+        "As  we  saw  in  class,  word  embeddings  allow  us  to  do  analogical  reasoning  using  vector  addiction and subtraction. *gensim* offers the possibility to do so.\n",
+        "\n",
+        "▶▶ **Try to perform analogical reasoning in the food  realm,  e.g.  fourchette - légume  +  soupe  = ?**\n",
+        "\n",
+        "Hint  :  the  function  *most_similar()*  takes  arguments positive and negative.\n",
+        "\n",
+        "▶▶ **Try the same using the function most_similar_cosmul()** (which performs a similar computation but uses multiplication and division instead), and see what works best\n",
+        "\n",
+        "See: https://tedboy.github.io/nlps/generated/generated/gensim.models.Word2Vec.most_similar.html\n",
+        "\n",
+        "https://tedboy.github.io/nlps/generated/generated/gensim.models.Word2Vec.most_similar_cosmul.html"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "removable-validity",
+      "metadata": {
+        "id": "removable-validity"
+      },
+      "outputs": [],
+      "source": [
+        "model.wv.most_similar(positive=[\"fourchette\", \"soupe\"], negative=[\"légume\"])"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "D2MrhPfHGwla"
+      },
+      "id": "D2MrhPfHGwla",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 6 - Visualize word embeddings\n",
+        "\n",
+        "After you learn word embedding for your text data, it can be nice to explore it with visualization.\n",
+        "\n",
+        "You can use classical projection methods to reduce the high-dimensional word vectors to two-dimensional plots and plot them on a graph.\n",
+        "\n",
+        "The visualizations can provide a qualitative diagnostic for your learned model.\n",
+        "\n",
+        "We can retrieve all of the vectors from a trained model as follows:\n",
+        "```\n",
+        "X = model[model.wv.vocab]\n",
+        "```\n",
+        "\n",
+        "We can then train a projection method on the vectors, such as those methods offered in scikit-learn, then use matplotlib to plot the projection as a scatter plot.\n",
+        "\n",
+        "Let’s look at an example with Principal Component Analysis or PCA."
+      ],
+      "metadata": {
+        "id": "NrSg7OJhKP5_"
+      },
+      "id": "NrSg7OJhKP5_"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "X = model.wv.get_normed_vectors()[:1000]\n",
+        "print(X.shape)"
+      ],
+      "metadata": {
+        "id": "OGnQezWkGwnt"
+      },
+      "id": "OGnQezWkGwnt",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 6.1 - Using PCA\n",
+        "\n",
+        "We can create a 2-dimensional PCA model of the word vectors using the scikit-learn PCA class as follows."
+      ],
+      "metadata": {
+        "id": "V3GEpKVfKxMe"
+      },
+      "id": "V3GEpKVfKxMe"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from sklearn.decomposition import PCA\n",
+        "\n",
+        "pca = PCA(n_components=2)\n",
+        "result = pca.fit_transform(X)"
+      ],
+      "metadata": {
+        "id": "TDPHq7XVGwqb"
+      },
+      "id": "TDPHq7XVGwqb",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from matplotlib import pyplot\n",
+        "\n",
+        "pyplot.scatter(result[:, 0], result[:, 1])"
+      ],
+      "metadata": {
+        "id": "TPxqh9N0Gwta"
+      },
+      "id": "TPxqh9N0Gwta",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We can go one step further and annotate the points on the graph with the words themselves. A crude version without any nice offsets looks as follows."
+      ],
+      "metadata": {
+        "id": "MUW6T3b4LPE7"
+      },
+      "id": "MUW6T3b4LPE7"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "words = [ w for w in model.wv.key_to_index.keys() ][:1000]"
+      ],
+      "metadata": {
+        "id": "EpgU1Is9IMyg"
+      },
+      "id": "EpgU1Is9IMyg",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "for i, word in enumerate(words):\n",
+        "\tpyplot.annotate(word, xy=(result[i, 0], result[i, 1]))\n",
+        "#pyplot.show()\n",
+        "pyplot.savefig('plot_w2v.png')"
+      ],
+      "metadata": {
+        "id": "87nkrdIsGwuv"
+      },
+      "id": "87nkrdIsGwuv",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [],
+      "metadata": {
+        "id": "CvTlbB5eIuGM"
+      },
+      "id": "CvTlbB5eIuGM"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "pyplot.scatter(result[:, 0], result[:, 1])\n",
+        "\n",
+        "for i, word in enumerate(words):\n",
+        "\tpyplot.annotate(word, xy=(result[i, 0], result[i, 1]))\n",
+        "#pyplot.show()\n",
+        "pyplot.savefig('plot_w2v.png')"
+      ],
+      "metadata": {
+        "id": "E79KLz3ARN14"
+      },
+      "id": "E79KLz3ARN14",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "sATOlUC0TSH7"
+      },
+      "id": "sATOlUC0TSH7",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 6.2 - Using TensorFlow projector (at home)\n",
+        "\n",
+        "As we saw during the course, TensorFlow provides a tool to vizualize word embeddings. We need to provide:    \n",
+        "* A TSV file with the vectors\n",
+        "* Another TSV file with the words\n",
+        "\n",
+        "The following code allows to write this file from the model.\n",
+        "\n",
+        "It comes from the source code of the script: https://radimrehurek.com/gensim/scripts/word2vec2tensor.html\n",
+        "\n",
+        "▶▶ **Run the followng code and then load the files within the TensorFlow projector. Look e.g. for 'citron', 'manger', 'pain'..., check their neighbors (with PCA and/or T-SNE).**\n",
+        "\n",
+        "https://projector.tensorflow.org/"
+      ],
+      "metadata": {
+        "id": "fCr9WTKLVRLM"
+      },
+      "id": "fCr9WTKLVRLM"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#model = gensim.models.Word2Vec.load_word2vec_format(model_path, binary=True)\n",
+        "tensorsfp = \"model_word2vec_food_tensor.tsv\"\n",
+        "metadatafp = \"metadata_word2vec_food_tensor.tsv\"\n",
+        "with open( tensorsfp, 'w+') as tensors:\n",
+        "    with open( metadatafp, 'w+') as metadata:\n",
+        "         for word  in model.wv.index_to_key:\n",
+        "           metadata.write(word + '\\n')\n",
+        "           vector_row = '\\t'.join(map(str, model[word]))\n",
+        "           tensors.write(vector_row + '\\n')"
+      ],
+      "metadata": {
+        "id": "I40CgQgJTSKd"
+      },
+      "id": "I40CgQgJTSKd",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [],
+      "metadata": {
+        "id": "oG_qFvHJJvCU"
+      },
+      "id": "oG_qFvHJJvCU"
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 7 - Embeddings sentences\n",
+        "\n",
+        "Now that we have vectors to represent words, how can we represent sentences / sequences of words?"
+      ],
+      "metadata": {
+        "id": "UW-h-NioJvTc"
+      },
+      "id": "UW-h-NioJvTc"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import numpy as np"
+      ],
+      "metadata": {
+        "id": "pbzTLFKpI9JO"
+      },
+      "execution_count": null,
+      "outputs": [],
+      "id": "pbzTLFKpI9JO"
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Below is given a set of sentences, some have a similar meanings: they are paraphrases.\n",
+        "We want to compute a representation for each sentence where similar sentences have a similar vector."
+      ],
+      "metadata": {
+        "id": "9fui4z4aSPPS"
+      },
+      "id": "9fui4z4aSPPS"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "sentence1 = \"My disease was cured with a lot of medication\"\n",
+        "sentence2 = \"I was treated with medicaments\"\n",
+        "sentence3 = 'The sun is shining today'"
+      ],
+      "metadata": {
+        "id": "ZlVT2QGFJvTe"
+      },
+      "execution_count": null,
+      "outputs": [],
+      "id": "ZlVT2QGFJvTe"
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "▶▶ **Write the code to encode a sentence as the average vector over the word vectors.**"
+      ],
+      "metadata": {
+        "id": "0zyDNnh-Shzz"
+      },
+      "id": "0zyDNnh-Shzz"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# **TODO** exercise\n",
+        "def encode(sentence,model):\n",
+        "  #...\n",
+        "  return vector"
+      ],
+      "metadata": {
+        "id": "9HqunfslIiIe"
+      },
+      "execution_count": null,
+      "outputs": [],
+      "id": "9HqunfslIiIe"
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Run the code below to compute vectors for the sentences above and **check that you found the right shape for the sentence vectors**."
+      ],
+      "metadata": {
+        "id": "azVNk7-ZS9O0"
+      },
+      "id": "azVNk7-ZS9O0"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "v1 = encode(sentence1,model)\n",
+        "v2 = encode(sentence2,model)\n",
+        "v3 = encode(sentence3,model)\n",
+        "v1.shape"
+      ],
+      "metadata": {
+        "id": "xJJKlJkAI1r6"
+      },
+      "execution_count": null,
+      "outputs": [],
+      "id": "xJJKlJkAI1r6"
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The code below can be used to compute the cosine similarity between two sentence vectors:\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "-QSJXdi5JvTg"
+      },
+      "id": "-QSJXdi5JvTg"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def cos(v1,v2):\n",
+        "  return np.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2))"
+      ],
+      "metadata": {
+        "id": "oTWuS_oBI9u0"
+      },
+      "execution_count": null,
+      "outputs": [],
+      "id": "oTWuS_oBI9u0"
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "▶▶ **Compute the similarity between the sentences given above. Are the results coherent?**"
+      ],
+      "metadata": {
+        "id": "HwA9JhaxZW8d"
+      },
+      "id": "HwA9JhaxZW8d"
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "bXE1ZLIWPrau"
+      },
+      "id": "bXE1ZLIWPrau",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 8. Other algorithms\n",
+        "\n",
+        "Not enough time, but note that you can also build embeddings using FastText algorithm with Gensim. Doc2vec is also available.\n",
+        "\n",
+        "https://radimrehurek.com/gensim/apiref.html\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "mQDB2hnXxP4Z"
+      },
+      "id": "mQDB2hnXxP4Z"
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#### Embeddings with multiword ngrams\n",
+        "\n",
+        "There is a *gensim.models.phrases* module which lets you automatically detect phrases longer than one word, using collocation statistics. Using phrases, you can learn a word2vec model where “words” are actually multiword expressions, such as new_york_times or financial_crisis:\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "wY8HkerUhbc7"
+      },
+      "id": "wY8HkerUhbc7"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from gensim.models import Phrases\n",
+        "\n",
+        "# Train a bigram detector.\n",
+        "bigram_transformer = Phrases(sentences)\n",
+        "\n",
+        "# Apply the trained MWE detector to a corpus, using the result to train a Word2vec model.\n",
+        "model = Word2Vec(bigram_transformer[sentences], min_count=10, iter=1)"
+      ],
+      "metadata": {
+        "id": "RaYOSCa-TSUy"
+      },
+      "id": "RaYOSCa-TSUy",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "print(list(model.wv.key_to_index))"
+      ],
+      "metadata": {
+        "id": "DOjkiWcUhwmH"
+      },
+      "id": "DOjkiWcUhwmH",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "If you’re finished training a model (i.e. no more updates, only querying), you can switch to the KeyedVectors instance:\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "gbdjeIUlhUvD"
+      },
+      "id": "gbdjeIUlhUvD"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "word_vectors = model.wv\n",
+        "del model"
+      ],
+      "metadata": {
+        "id": "Z9hR0n7JPlem"
+      },
+      "id": "Z9hR0n7JPlem",
+      "execution_count": null,
+      "outputs": []
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.7.9"
+    },
+    "colab": {
+      "provenance": [],
+      "collapsed_sections": [
+        "satellite-colombia"
+      ]
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/projets_etudiants_2425/[NN-LiTL]Projects-NeuralMethodsNLP_2425.pdf b/projets_etudiants_2425/[NN-LiTL]Projects-NeuralMethodsNLP_2425.pdf
new file mode 100644
index 0000000000000000000000000000000000000000..77f1d267287082fef7a1931a10c45834d7b81202
Binary files /dev/null and b/projets_etudiants_2425/[NN-LiTL]Projects-NeuralMethodsNLP_2425.pdf differ