MELODI
AnDiAMO
discourseSegmentation
disCut21

Repository



disCut
Discourse segmenter for DISRPT 2021
Useful Links:

Data for DISRPT 2021: https://github.com/disrpt/sharedtask2021

Website DISRPT 2021: https://sites.google.com/georgetown.edu/disrpt2021

Code for DISRTP 2019: https://gitlab.inria.fr/andiamo/tony


Requirements:

python 3.7
requirements.txt: pip install -r requirements.txt

pytorch: pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html


Before start:

Please put the data at data directory

For shared-task testers:

bash code/main.sh preprocess
bashe code/main.sh conllu
bashe code/main.sh tok
python code/score_collector.py

General usage:

split all datasets for the first time: bash code/main.sh preprocess

generate the best results: bashe code/main.sh conllu

train: bash code/contextual_embeddings/expes.sh eng.rst.rstdt conllu bert train [--s 200] *note: --s is optional and it is for split the long sentences
test: bash code/contextual_embeddings/expes.sh eng.rst.rstdt conllu bert test

fine-tune with other model: bash code/contextual_embeddings/expes.sh eng.rst.rstdt conllu bert train eng

test on other model: bash code/contextual_embeddings/expes.sh eng.rst.rstdt conllu bert test eng

test fine-tuned model: bash code/contextual_embeddings/expes.sh eng.rst.rstdt conllu bert testft eng

merge two datasets: bash code/contextual_embeddings/merger.sh eng.rst.rstdt eng.rst.gum eng

split with stanza: python code/ssplit/parse_corpus.py eng.rst.rstdt --parser stanza

collect best scores: python code/score_collector.py [--dir '.'] *note: --dir is optional and its default is main directory