Project DisCut22 still WIP : Discourse Annotator Tool
A tool for Discourse Annotation. Inheritor of ToNy and DisCut, segmentors for DISRPT 2019 and 2021. The goal of this version is to be easy to use with or without IT knowledge.
2021
Multi-lingual Discourse Segmentation and Connective Identification: MELODI at Disrpt2021
Code: https://gitlab.irit.fr/melodi/andiamo/discoursesegmentation/discut
2019
ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents
Code: https://gitlab.inria.fr/andiamo/tony
Usage
Usecases
-
Discourse Segmentation: Take a raw text as input, use a loaded model to make predictions. Output the same text but with EDU segmentation. --> config_1
-
Segmentation Evaluation: Take an EDU gold segmented text as input, use a loaded model to make predictions. Output scores of model predictions against gold, and output discrepancies. --> config_2
-
Custom Model Creation: Fine-tuning (over one or two level) a pretrained Language Model with a specific dataset or combination of datasets. Then make predictions and evaluation. --> config_3
Content description
[TBD : xplain directories automatically created during scripts run]
-
data/my.cool.dataset/
Contains input data, raw and/or pre-processed format(s).-
results.{stamp}/
Contains output data, scores and post-processed data. (Also logs of allennlp)
-
-
code/
Contains main scripts.-
discut22_1.py
One python script to run them all. -
config_XX.json
A file to be completed for your specific project (or a dir with choise between simple use_case configs and a template for a custom config). See ** -
utils/
Contains useful scripts to be called.
-
-
model/
Contains model to be loaded or created.-
config_training.jsonnet
A file to be completed. (TBD automatically saved with model when done)
-
-
global_config_file_guideline.md
Contains detailed documentation to build well formed config file.
Set up environnement
- Conda stuff pour python 3.7 (TBD ?)
- Install all librairies required with the following command:
pip install -r requirements.txt
- Install pytorch:
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
Configuration file: to chose or to complete
-
code/config_global_X.json
See global_config_file_guideline.md.
Run usecase 1
(go to code
directory)
Run this command:
python discut22.py --config config_XX.json
Support
Authors and acknowledgment
Morteza Ezzabady
Laura Rivière
Amir Zeldes