Skip to content
Snippets Groups Projects

DeepGrail Linker

This repository contains a Python implementation of a Neural Proof Net using TLGbank data.

This code was designed to work with the DeepGrail Tagger. In this repository we only use the embedding of the word from the tagger and the tags from the dataset, but next step is to use the prediction of the tagger for the linking step.

Usage

Installation

Python 3.9.10 (Warning don't use Python 3.10+) Clone the project locally.

Libraries installation

Run the init.sh script or install the Tagger project under SuperTagger name. And upload the tagger.pt in the directory 'models'. (You may need to modify 'model_tagger' in train.py.)

Structure

The structure should look like this :

.
.
├── Configuration                    # Configuration
│   ├── Configuration.py             # Contains the function to execute for config
│   └── config.ini                   # contains parameters
├── find_config.py                   # auto-configurate datasets parameters (max length sentence etc) according to the dataset given
├── requirements.txt                 # librairies needed
├── Datasets                         # TLGbank data with links
├── SuperTagger                      # The Supertagger directory (that you need to install)
│    ├── Datasets                    # TLGbank data
│    ├── SuperTagger                 # Implementation of BertForTokenClassification
│    │   ├── SuperTagger.py          # Main class
│    │   └── Tagging_bert_model.py   # Bert model
│    ├── predict.py                  # Example of prediction for supertagger
│    └── train.py                    # Example of train for supertagger
├── Linker                           # The Linker directory
│    ├── ...
│    └── Linker.py                   # Linker class containing the neural network
├── models                           
│    └── supertagger.pt              # the pt file contaning the pretrained supertagger (you need to install it)    
├── Output                           # Directory where your linker models will be savec if checkpoint=True in train               
├── TensorBoard                      # Directory where the stats will be savec if tensorboard=True in train
└──  train.py                        # Example of train

Dataset format

The sentences should be in a column "X", the links with '_x' postfix should be in a column "Y" and the categories in a column "Z". For the links each atom_x goes with the one and only other atom_x in the sentence.

Training

Launch train.py, if you look at it you can give another dataset file and another tagging model.

In train, if you use checkpoint=True, the model is automatically saved in a folder: Output/Training_XX-XX_XX-XX. It saves after each epoch. Use tensorboard=True for log saving in folder TensorBoard. (tensorboard --logdir=logs for see logs)

Predicting

For predict on your data you need to load a model (save with this code).

df = read_csv_pgbar(file_path,20)
texts = df['X'].tolist()
categories = df['Z'].tolist()

linker = Linker(tagging_model)
linker.load_weights("your/linker/path")

links = linker.predict_with_categories(texts[7], categories[7])
print(links)

The file postprocessing.py will allow you to draw the prediction. (limited sentence length otherwise it will be confusing)

You can also use the function predict_without_categories which only needs the sentence.

Authors

de Pourtales Caroline, Rabault Julien