Skip to content
Snippets Groups Projects
user avatar
laura.riviere authored
OA#
2770a668
History
Name Last commit Last update
code
data/chaperontest
README.md

Project DisCut22 still WIP

A tool for Discourse Segmentation. Inheritor of ToNy and DisCut, segmentors for DISRPT 2019 and 2021. The goal of this version is to be easy to use with or without IT knowledge.

2021
Multi-lingual Discourse Segmentation and Connective Identification: MELODI at Disrpt2021
Code: https://gitlab.irit.fr/melodi/andiamo/discoursesegmentation/discut

2019
ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents
Code: https://gitlab.inria.fr/andiamo/tony

Usage

Content description

[TBD : xplain directories automatically created during scripts run]

  • data/ Contains input data, raw and/or pre-processed format(s).
  • data/results/ Contains output data, scores and post-processed data. (Also logs of allennlp)
  • code/ Contains main scripts.
  • code/utils Contains useful scripts to be called.
  • model/ Contains loaded or created model.
  • doc.pdf Contains detailed documentation (TBD?)
  • code/config.json A file to be completed (or a dir with choise between simple use_case configs and a template for a custom config)

Set up environnement

  • Conda stuff pour python 3.7 (TBD ?)
  • Install all librairies required with the following command:
pip install -r <dir?>requirements.txt

Configuration file: to chose or to complete

  • code/config_1.json Config for usecase_1 : take a sentence splited text, apply ToNy, output same text but with EDU brackets.
  • [TBD : train models config and all sort of cool options]

Run usecase 1

(go to code directory) Run this command:

python discut22.py --config config_1.json