Project DisCut22 still WIP
A tool for Discourse Segmentation. Inheritor of ToNy and DisCut, segmentors for DISRPT 2019 and 2021. The goal of this version is to be easy to use with or without IT knowledge.
2021
Multi-lingual Discourse Segmentation and Connective Identification: MELODI at Disrpt2021
Code: https://gitlab.irit.fr/melodi/andiamo/discoursesegmentation/discut
2019
ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents
Code: https://gitlab.inria.fr/andiamo/tony
Usage
Content description
[TBD : xplain directories automatically created during scripts run]
-
data/
Contains input data, raw and/or pre-processed format(s). -
data/results/
Contains output data, scores and post-processed data. (Also logs of allennlp) -
code/
Contains main scripts. -
code/utils
Contains useful scripts to be called. -
model/
Contains loaded or created model. -
doc.pdf
Contains detailed documentation (TBD?) -
code/config.json
A file to be completed (or a dir with choise between simple use_case configs and a template for a custom config)
Set up environnement
- Conda stuff pour python 3.7 (TBD ?)
- Install all librairies required with the following command:
pip install -r <dir?>requirements.txt
Configuration file: to chose or to complete
-
code/config_1.json
Config for usecase_1 : take a sentence splited text, apply ToNy, output same text but with EDU brackets. - [TBD : train models config and all sort of cool options]
Run usecase 1
(go to code
directory)
Run this command:
python discut22.py --config config_1.json