diff --git a/pDB_graph.ipynb b/pDB_graph.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..912c5dcc9357cf5bc994110560397c39bd2f222c
--- /dev/null
+++ b/pDB_graph.ipynb
@@ -0,0 +1,669 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "06f86dac",
+   "metadata": {},
+   "source": [
+    "# PeeringDB capacity graph"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b56b634",
+   "metadata": {},
+   "source": [
+    "**Author: Justin Loye** https://www.linkedin.com/in/justin-loye-66631a14a/  \n",
+    "**Date: 2022/11/02**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a9d51de4",
+   "metadata": {},
+   "source": [
+    "I show in this notebook how to build the PeeringDB capacity graph from [CAIDA dumps v2](https://www.caida.org/catalog/datasets/peeringdb/).\n",
+    "It consists in IXP metadata (table `ix`), ASes metadata (table `net`), a weighted and directed Graph (`DiGraph`) and a table containing the graph's nodes metadata (table `nodes`)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a86bec9a",
+   "metadata": {},
+   "source": [
+    "If this helps you in your research please give credit to CAIDA and cite\n",
+    "\n",
+    "@misc{https://doi.org/10.48550/arxiv.2206.05146,\n",
+    "  doi = {10.48550/ARXIV.2206.05146},  \n",
+    "  url = {https://arxiv.org/abs/2206.05146},  \n",
+    "  author = {Loye, Justin and Mouysset, Sandrine and Bruyère, Marc and Jaffrès-Runser, Katia},  \n",
+    "  keywords = {Networking and Internet Architecture (cs.NI),FOS: Computer and information sciences, FOS: Computer and information sciences},  \n",
+    "  title = {Global Internet public peering capacity of interconnection: a complex network analysis},  \n",
+    "  publisher = {arXiv},  \n",
+    "  year = {2022},  \n",
+    "  copyright = {arXiv.org perpetual, non-exclusive license}\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "0d75ef12",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import json\n",
+    "import pandas as pd\n",
+    "import networkx as nx"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d704614e",
+   "metadata": {},
+   "source": [
+    "# Preprocessing the data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d13746f",
+   "metadata": {},
+   "source": [
+    "Notes on the preprocessing: \n",
+    "* All entries are uniquely defined with an index.\n",
+    " * ASes: the index is the AS number (asn)\n",
+    " * IXPs: a negative number that I attributed and named asn for simplicity\n",
+    "* The graph is first built from infos present in `netixlan_set` of the API. This makes a bipartite graph (AS-IXP) with links weighted by the router port size (`speed` in the API)\n",
+    "* We want to derive a directed graph: we rely on ASes `info_ratio` attribute, that can take the values `Not Disclosed`, `Heavy In(out)bound`, `Mostly In(out)bound`, `Balanced`.\n",
+    " * Inbound: a link is created with a weight=`speed` from IXP to AS. Another link of weight $(1-\\beta)$*`speed` is created in the other direction\n",
+    " * Outbound: a link is created with a weight=`speed` from AS to IXP. Another link of weight $(1-\\beta)$*`speed` is created in the other direction\n",
+    " * `Balanced` or `Not Disclosed`: A link in both direction with a weight=`speed`\n",
+    " * Heavy categories: $\\beta=\\beta_H=0.95$, Mostly categories: $\\beta=\\beta_M=0.75$\n",
+    "* I quantify discrepancies in PDB data (I don't find any from 2020_03_01 to the present date):\n",
+    "  * mismatch between ASN in networks metadata and netixlan\n",
+    "  * mismatch between IXPs index and netixlan index\n",
+    "* I remove ASes not present at IXPs and IXPs without members.  \n",
+    "But I don't check if the size of the ports (I delegate the filter on 0 port capacity for later)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "20dad6b1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "BETA_H = 0.95\n",
+    "BETA_M = 0.75\n",
+    "\n",
+    "snapshot = \"peeringdb_2_dump_2021_03_01.json\" #path to pdb dump"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "6d2cd264",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def matrix2ASNedgelist(M, nodes, filename):\n",
+    "    \"\"\"Given an adjacency matrix `M` and a `nodes` df with the same ordering, write to file the ASN edgelist\"\"\"\n",
+    "    outfile = open(filename, \"w\")\n",
+    "    print(\"#from\", \"to\", \"weight\", file=outfile, sep=\",\")\n",
+    "\n",
+    "    L,l = np.shape(M)\n",
+    "    for i in range(L):\n",
+    "        for j in range(l):\n",
+    "            if(M[i,j]>0):\n",
+    "                print(nodes.asn.iloc[j], nodes.asn.iloc[i] , M[i,j], file=outfile, sep=\",\")\n",
+    "    outfile.close()\n",
+    "\n",
+    "\n",
+    "def prepare_data(pdb_dump_filename):\n",
+    "    \"\"\"\n",
+    "    Main preprocessing pipeline: read pdb snapshot `pdb_dump_filename`, prepare the dataframes and generate the graph\n",
+    "    \n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    pdb_dump_filename : string\n",
+    "        path to the pDB snapshot\n",
+    "        \n",
+    "    Returns\n",
+    "    -------\n",
+    "    DataFrame net\n",
+    "        ASes metadata\n",
+    "    DataFrame ix\n",
+    "        IXPs metadata\n",
+    "    DataFrame nodes\n",
+    "        PeeringDB graph nodes metadata\n",
+    "    narray A\n",
+    "        Weighted adjacency matrix \n",
+    "    \"\"\"\n",
+    "    def rename_net(network_name):\n",
+    "        \"\"\"Rename network with same name as an IXP (likely a route server)\"\"\"\n",
+    "        if len(ix.loc[ix.name == network_name])>0:\n",
+    "            return \"IXP-RS_\" + network_name\n",
+    "        else:\n",
+    "            return network_name\n",
+    "    \n",
+    "    def get_net_port_capacity(net_row):\n",
+    "        port_capacity = 0.0\n",
+    "        for netix in net_row.netixlan_set:\n",
+    "            port_capacity += netix['speed']\n",
+    "        return port_capacity\n",
+    "\n",
+    "    def get_ixp_port_capacity():\n",
+    "        ix_id2port_capa = {}\n",
+    "        for i in range(Nnet):\n",
+    "            netixlan_set=net.iloc[i,net.columns.get_loc(\"netixlan_set\")]\n",
+    "            for netix in netixlan_set:\n",
+    "                if netix[\"ix_id\"] not in ix_id2port_capa:\n",
+    "                    ix_id2port_capa[netix[\"ix_id\"]] = netix[\"speed\"]\n",
+    "                else:  \n",
+    "                    ix_id2port_capa[netix[\"ix_id\"]] += netix[\"speed\"]\n",
+    "        return ix_id2port_capa\n",
+    "\n",
+    "    def get_meta_region(row):\n",
+    "        if row[\"type\"] == \"AS\":\n",
+    "            return net.loc[net.asn==row.asn].info_scope.values[0]\n",
+    "        elif row[\"type\"] == \"IXP\":\n",
+    "            return ix.loc[ix.asn==row.asn].region_continent.values[0]\n",
+    "\n",
+    "\n",
+    "\n",
+    "    ## Loading files      \n",
+    "    file_pdb = open(pdb_dump_filename, \"r\")\n",
+    "    data_pdb = json.load(file_pdb)\n",
+    "\n",
+    "    net = pd.read_json(json.dumps(data_pdb[\"net\"][\"data\"]), orient=\"record\") #ASes metadata\n",
+    "    net = net.set_index(\"asn\")\n",
+    "    net['asn'] = net.index\n",
+    "    ix = pd.read_json(json.dumps(data_pdb[\"ix\"][\"data\"]), orient=\"record\") #IXPs metadata\n",
+    "    netixlan = pd.read_json(json.dumps(data_pdb[\"netixlan\"][\"data\"]), orient=\"record\") #AS membership to IXPs\n",
+    "    \n",
+    "    \n",
+    "    ## cross checking between ASN in in netixlan and ASN in net\n",
+    "    net['netixlan_set'] = np.empty((len(net), 0)).tolist()\n",
+    "    nwrong = 0\n",
+    "    for index, row in netixlan.iterrows():\n",
+    "        try:\n",
+    "            net.loc[row.asn, \"netixlan_set\"].append(dict(row))\n",
+    "        except:\n",
+    "            nwrong += 1\n",
+    "        \n",
+    "    print(nwrong, \"wrong ASN in netixlan\")\n",
+    "    \n",
+    "    print(len(net), \"ASes and\", len(ix), \"IXPs in original dataset\")\n",
+    "\n",
+    "    \n",
+    "    ## Filter AS not present at IXPs\n",
+    "    if \"ix_count\" not in net:\n",
+    "        net.insert(loc=len(net.columns), column='ix_count', value='')\n",
+    "    net.ix_count = net.netixlan_set.apply(lambda x: len(x))\n",
+    "    net = net.loc[net.ix_count>0]\n",
+    "    net = net.reset_index(drop=True)\n",
+    "\n",
+    "\n",
+    "    ## Filter IXPs without members\n",
+    "    if \"net_count\" not in ix:\n",
+    "        ix.insert(loc=len(ix.columns), column=\"net_set\", value='')\n",
+    "        \n",
+    "        def get_as_set(ix):\n",
+    "            as_set = []\n",
+    "            netixlan_set_index = net.columns.get_loc(\"netixlan_set\")\n",
+    "            for i in range(len(net)):\n",
+    "                netixlan_set = net.iloc[i, netixlan_set_index]\n",
+    "                ix_presence = []\n",
+    "                for netix in netixlan_set:\n",
+    "                    ix_presence.append(netix[\"ix_id\"])\n",
+    "                if ix[\"id\"] in ix_presence:\n",
+    "                    as_set.append(net.iloc[i,0])\n",
+    "            return as_set\n",
+    "        \n",
+    "        ix.net_set = ix.apply(get_as_set, axis=1)\n",
+    "        ix[\"net_count\"] = ix.net_set.apply(lambda x: len(x))\n",
+    "        \n",
+    "    ix = ix.loc[ix.net_count>0]\n",
+    "    ix = ix.reset_index(drop=True)\n",
+    "\n",
+    "    Nnet = len(net)\n",
+    "    Nix = len(ix)\n",
+    "    \n",
+    "    print(Nnet, \"ASes and\", Nix, \"IXPs after net_count>0 and ix_count>0 filter\")\n",
+    "\n",
+    "\n",
+    "    ##Preparing graph nodes metadata\n",
+    "    net.name = net.name.apply(rename_net)\n",
+    "    \n",
+    "    ix[\"asn\"] =  [-i for i in range(1,Nix+1)]\n",
+    "\n",
+    "    nodes = pd.DataFrame(data={\"asn\": net.asn.to_list() + ix.asn.to_list(),\n",
+    "                               \"name\": net.name.to_list() + ix.name.to_list(),\n",
+    "                               \"type\": [\"AS\"]*Nnet + [\"IXP\"]*Nix,\n",
+    "                               \"prev_id\": net.id.to_list() + ix.id.to_list()})\n",
+    "\n",
+    "    nodes[\"AStype\"] = nodes.asn.map(net.set_index(\"asn\").info_type)\n",
+    "\n",
+    "    nodes[\"region\"] = nodes.apply(get_meta_region, axis=1)\n",
+    "\n",
+    "    N = len(nodes)\n",
+    "\n",
+    "    AS_pdbindex2AS_dfindex = {nodes.prev_id[i]: i for i in range(Nnet)}\n",
+    "    IXP_pdbindex2IXP_dfindex = {nodes.prev_id[i]: i for i in range(Nnet, N)}\n",
+    "\n",
+    "    net.insert(loc=len(net.columns), column='port_capacity', value='')\n",
+    "        \n",
+    "    net.port_capacity = net.apply(get_net_port_capacity, axis=1)\n",
+    "    \n",
+    "    ix.insert(loc=len(ix.columns), column=\"port_capacity\", value='')\n",
+    "\n",
+    "    ix[\"port_capacity\"]= ix[\"id\"].map(get_ixp_port_capacity())\n",
+    "    \n",
+    "    ix = ix.set_index(\"asn\")\n",
+    "    ix['asn'] = ix.index\n",
+    "    \n",
+    "    nodes = nodes.set_index(\"asn\")\n",
+    "    nodes['asn'] = nodes.index\n",
+    "    \n",
+    "    net = net.set_index(\"asn\")\n",
+    "    net['asn'] = net.index\n",
+    "    \n",
+    "    nodes.insert(loc=len(nodes.columns), column=\"port_capacity\", value=0.0)\n",
+    "    nodes.update(pd.Series({**dict(ix[\"port_capacity\"]), **dict(net[\"port_capacity\"])}, name=\"port_capacity\"))\n",
+    "    \n",
+    "\n",
+    "    ## Building the adjacency matrix\n",
+    "    A = np.zeros((len(nodes), len(nodes))) # adjacency matrix\n",
+    "    nlinks = 0\n",
+    "    nwrong = 0\n",
+    "    for i in range(len(net)):\n",
+    "        AS = net.iloc[i,:]\n",
+    "        traffic_ratio = AS.info_ratio\n",
+    "\n",
+    "        for IXP in AS.netixlan_set:\n",
+    "            try :\n",
+    "                ix_index = IXP_pdbindex2IXP_dfindex[IXP[\"ix_id\"]]\n",
+    "                speed = int(IXP[\"speed\"])\n",
+    "\n",
+    "                if traffic_ratio == \"Heavy Outbound\":\n",
+    "                    if BETA_H == 1.0:\n",
+    "                        A[ix_index, i] += speed\n",
+    "                        nlinks+=1\n",
+    "                    else:\n",
+    "                        A[ix_index, i] += speed\n",
+    "                        A[i, ix_index] += (1.0-BETA_H)*speed\n",
+    "                        nlinks+=2\n",
+    "\n",
+    "                elif traffic_ratio == \"Mostly Outbound\":\n",
+    "                    A[ix_index, i] += speed\n",
+    "                    A[i, ix_index] += (1.0-BETA_M)*speed\n",
+    "                    nlinks+=2\n",
+    "\n",
+    "                elif traffic_ratio == \"Balanced\":\n",
+    "                    A[ix_index, i] += speed\n",
+    "                    A[i, ix_index] += speed\n",
+    "                    nlinks+=2\n",
+    "\n",
+    "                elif traffic_ratio == \"Mostly Inbound\":\n",
+    "                    A[ix_index, i] += (1.0-BETA_M)*speed\n",
+    "                    A[i, ix_index] += speed\n",
+    "                    nlinks+=2\n",
+    "\n",
+    "                elif traffic_ratio == \"Heavy Inbound\":\n",
+    "                    if BETA_H == 1.0:\n",
+    "                        A[i, ix_index] += speed\n",
+    "                        nlinks+=1\n",
+    "                    else:\n",
+    "                        A[i, ix_index] += speed\n",
+    "                        A[ix_index, i] += (1.0-BETA_H)*speed\n",
+    "                        nlinks+=2\n",
+    "\n",
+    "                elif traffic_ratio == \"\" or traffic_ratio == \"Not Disclosed\":\n",
+    "                    A[i, ix_index] += speed\n",
+    "                    A[ix_index, i] += speed\n",
+    "                    nlinks+=2\n",
+    "                else:\n",
+    "                    print(\"DATA MISREAD\", traffic_ratio)\n",
+    "            except:\n",
+    "                nwrong = nwrong+1\n",
+    "            \n",
+    "    print(nwrong, \"wrong links\")\n",
+    "    return net, ix, nodes, A"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "a35e5983",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0 wrong ASN in netixlan\n",
+      "21388 ASes and 869 IXPs in original dataset\n",
+      "11765 ASes and 813 IXPs after net_count>0 and ix_count>0 filter\n",
+      "0 wrong links\n",
+      "(AS+IXP) number before and after port_capacity filter 12578 12282\n",
+      "12282 nodes in the graph\n"
+     ]
+    }
+   ],
+   "source": [
+    "net, ix, nodes, A = prepare_data(snapshot)\n",
+    "    \n",
+    "print(\"(AS+IXP) number before and after port_capacity filter\", len(nodes) ,len(nodes.loc[nodes[\"port_capacity\"]>0]))\n",
+    "## This difference in length is due to the fact that few ASes report a port_size of 0 at IXPs.\n",
+    "\n",
+    "filename = \"_\".join([\"graph\", format(BETA_H, '.4f'), format(BETA_M, '.4f'), snapshot.rstrip(\"json\")+\"txt\"])\n",
+    "matrix2ASNedgelist(A, nodes, filename)    \n",
+    "edgelist = open(filename, \"r\")\n",
+    "DiGraph = nx.parse_edgelist(edgelist, nodetype=int, data=(('weight',float),), create_using = nx.DiGraph, delimiter=\",\")\n",
+    "edgelist.close()\n",
+    "\n",
+    "print(DiGraph.number_of_nodes(), \"nodes in the graph\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0dd4791",
+   "metadata": {},
+   "source": [
+    "# Postprocessing the data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51f4b2e9",
+   "metadata": {},
+   "source": [
+    "## Port capacity filter and checking consistency between graph and metadata"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "b6776a64",
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "nodes table summary\n",
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "Int64Index: 12282 entries, 20940 to -813\n",
+      "Data columns (total 7 columns):\n",
+      " #   Column         Non-Null Count  Dtype  \n",
+      "---  ------         --------------  -----  \n",
+      " 0   name           12282 non-null  object \n",
+      " 1   type           12282 non-null  object \n",
+      " 2   prev_id        12282 non-null  int64  \n",
+      " 3   AStype         11472 non-null  object \n",
+      " 4   region         12282 non-null  object \n",
+      " 5   asn            12282 non-null  int64  \n",
+      " 6   port_capacity  12282 non-null  float64\n",
+      "dtypes: float64(1), int64(2), object(4)\n",
+      "memory usage: 767.6+ KB\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "None"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ix table summary\n",
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "Int64Index: 810 entries, -1 to -813\n",
+      "Data columns (total 26 columns):\n",
+      " #   Column            Non-Null Count  Dtype \n",
+      "---  ------            --------------  ----- \n",
+      " 0   proto_ipv6        810 non-null    bool  \n",
+      " 1   status            810 non-null    object\n",
+      " 2   url_stats         810 non-null    object\n",
+      " 3   id                810 non-null    int64 \n",
+      " 4   tech_email        810 non-null    object\n",
+      " 5   city              810 non-null    object\n",
+      " 6   policy_email      810 non-null    object\n",
+      " 7   tech_phone        810 non-null    object\n",
+      " 8   media             810 non-null    object\n",
+      " 9   proto_multicast   810 non-null    bool  \n",
+      " 10  ixf_last_import   127 non-null    object\n",
+      " 11  website           810 non-null    object\n",
+      " 12  updated           810 non-null    object\n",
+      " 13  net_count         810 non-null    int64 \n",
+      " 14  policy_phone      810 non-null    object\n",
+      " 15  proto_unicast     810 non-null    bool  \n",
+      " 16  region_continent  810 non-null    object\n",
+      " 17  name              810 non-null    object\n",
+      " 18  created           810 non-null    object\n",
+      " 19  country           810 non-null    object\n",
+      " 20  notes             810 non-null    object\n",
+      " 21  org_id            810 non-null    int64 \n",
+      " 22  ixf_net_count     810 non-null    int64 \n",
+      " 23  name_long         810 non-null    object\n",
+      " 24  port_capacity     810 non-null    int64 \n",
+      " 25  asn               810 non-null    int64 \n",
+      "dtypes: bool(3), int64(6), object(17)\n",
+      "memory usage: 154.2+ KB\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "None"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "net table summary\n",
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "Int64Index: 11472 entries, 20940 to 61437\n",
+      "Data columns (total 35 columns):\n",
+      " #   Column                        Non-Null Count  Dtype  \n",
+      "---  ------                        --------------  -----  \n",
+      " 0   status                        11472 non-null  object \n",
+      " 1   looking_glass                 11472 non-null  object \n",
+      " 2   route_server                  11472 non-null  object \n",
+      " 3   netixlan_updated              11472 non-null  object \n",
+      " 4   info_ratio                    11472 non-null  object \n",
+      " 5   id                            11472 non-null  int64  \n",
+      " 6   policy_ratio                  11472 non-null  bool   \n",
+      " 7   info_unicast                  11472 non-null  bool   \n",
+      " 8   policy_general                11472 non-null  object \n",
+      " 9   website                       11472 non-null  object \n",
+      " 10  allow_ixp_update              11472 non-null  bool   \n",
+      " 11  updated                       11472 non-null  object \n",
+      " 12  netfac_updated                7121 non-null   object \n",
+      " 13  info_traffic                  11472 non-null  object \n",
+      " 14  info_multicast                11472 non-null  bool   \n",
+      " 15  policy_locations              11472 non-null  object \n",
+      " 16  name                          11472 non-null  object \n",
+      " 17  info_scope                    11472 non-null  object \n",
+      " 18  notes                         11472 non-null  object \n",
+      " 19  created                       11472 non-null  object \n",
+      " 20  org_id                        11472 non-null  int64  \n",
+      " 21  policy_url                    11472 non-null  object \n",
+      " 22  info_never_via_route_servers  11472 non-null  bool   \n",
+      " 23  poc_updated                   10524 non-null  object \n",
+      " 24  info_type                     11472 non-null  object \n",
+      " 25  policy_contracts              11472 non-null  object \n",
+      " 26  info_prefixes6                11472 non-null  int64  \n",
+      " 27  aka                           11472 non-null  object \n",
+      " 28  info_prefixes4                11472 non-null  int64  \n",
+      " 29  info_ipv6                     11472 non-null  bool   \n",
+      " 30  irr_as_set                    11472 non-null  object \n",
+      " 31  netixlan_set                  11472 non-null  object \n",
+      " 32  ix_count                      11472 non-null  int64  \n",
+      " 33  port_capacity                 11472 non-null  float64\n",
+      " 34  asn                           11472 non-null  int64  \n",
+      "dtypes: bool(6), float64(1), int64(6), object(22)\n",
+      "memory usage: 2.7+ MB\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "None"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Total number of nodes: 12282\n",
+      "Total number of IXPs: 810\n",
+      "Total number of ASes:  11472\n",
+      "Total number of edges:  63914\n"
+     ]
+    }
+   ],
+   "source": [
+    "## Port capacity filter\n",
+    "nodes = nodes.loc[nodes[\"port_capacity\"]>0] ##port capacity = sum of all ports\n",
+    "print(\"nodes table summary\")\n",
+    "display(nodes.info())\n",
+    "\n",
+    "ix = ix.loc[ix[\"port_capacity\"]>0]\n",
+    "print(\"ix table summary\")\n",
+    "display(ix.info())\n",
+    "\n",
+    "net = net.loc[net[\"port_capacity\"]>0]\n",
+    "print(\"net table summary\")\n",
+    "display(net.info())\n",
+    "\n",
+    "\n",
+    "assert(len(nodes) == len(ix) + len(net))\n",
+    "assert(len(nodes) == len(DiGraph))\n",
+    "\n",
+    "\n",
+    "print(\"Total number of nodes:\", len(nodes))\n",
+    "print(\"Total number of IXPs:\", len(ix))\n",
+    "print(\"Total number of ASes: \", len(net))\n",
+    "print(\"Total number of edges: \", len(DiGraph.edges()))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adf98659",
+   "metadata": {},
+   "source": [
+    "## Selecting the main connected component\n",
+    "Most graph algorithms behave best when the graph has a single connected component  \n",
+    "Networkx recquires the Graph to be undirected"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "29b8276d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Number of connected components 28\n",
+      "Percentage of nodes in the graph main connected component 99.22651034033545\n"
+     ]
+    }
+   ],
+   "source": [
+    "##I work only with the main connected component\n",
+    "##Some entries of nodes, ix and net must be removed\n",
+    "##Main connected component.\n",
+    "##Watch out casting DiGraph to Graph is not correct (it deletes double edges). For our use here it will be fine.\n",
+    "##If later you need to work with an undirected version I recommend playing with the adjacency matrix.\n",
+    "components = sorted(nx.connected_components(nx.Graph(DiGraph)), key=len, reverse=True) \n",
+    "print(\"Number of connected components\", len(components))\n",
+    "print(\"Percentage of nodes in the graph main connected component\", 100.0*len(components[0])/DiGraph.number_of_nodes())\n",
+    "DiGraph = DiGraph.subgraph(components[0])\n",
+    "\n",
+    "##Removing entries.\n",
+    "for i in range(1,len(components)):\n",
+    "    component = components[i]\n",
+    "    for node in component:\n",
+    "        #if node is an AS\n",
+    "        if node >= 0:\n",
+    "            net.drop(index=node, inplace=True)\n",
+    "            nodes.drop(index=node, inplace=True)\n",
+    "        #if node is an IXP\n",
+    "        if node < 0:\n",
+    "            ix.drop(index=node, inplace=True)\n",
+    "            nodes.drop(index=node, inplace=True)\n",
+    "            \n",
+    "assert(len(nodes) == len(ix) + len(net))\n",
+    "assert(len(nodes) == len(DiGraph))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a493dde9",
+   "metadata": {},
+   "source": [
+    "# Your code here\n",
+    "![SNOWFALL](illustration.jpg)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.11"
+  },
+  "toc": {
+   "base_numbering": 1,
+   "nav_menu": {},
+   "number_sections": true,
+   "sideBar": true,
+   "skip_h1_title": false,
+   "title_cell": "Table of Contents",
+   "title_sidebar": "Contents",
+   "toc_cell": false,
+   "toc_position": {},
+   "toc_section_display": true,
+   "toc_window_display": false
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "b356519c7051df12bc6b40a6cad22842333960301254b3fc02df73c86b114eb2"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}