diff --git a/README.md b/README.md index 37d15e25c8a67a49af33a6ae1fa05724ebd49221..80b201e463471f779c23a58a28ceda2f94da61cc 100644 --- a/README.md +++ b/README.md @@ -2,38 +2,20 @@ Discourse segmenter for DISRPT 2021 -Data for DISRPT 2021: https://github.com/disrpt/sharedtask2021 -Website DISRPT 2021: https://sites.google.com/georgetown.edu/disrpt2021 -Code for DISRTP 2019: https://gitlab.inria.fr/andiamo/tony - -## Meeting 04.06.2021 - -TODO: -x- install allennlp 0.9 + tony19 -x- train a model (on english for instance) -x- test it with tony script -x- begin reading the tutorial on allennlp -x- continue with general reading NLP - -Next steps: -x- change to allennlp 1.xx -- switch to xlm multi lingual -- play with some hyper-parameters -- go back on the architecture: add the CRF layer on top of BERT/LSTM -- grouping the corpora during training -- the sentence problem ? how do we address it - -## Meeting 21.05.2021 - -TODO: -- install allennlp 0.9 + tony19 -- train a model (on english for instance) -- test it with tony script -- begin reading the tutorial on allennlp -- continue with general reading NLP - -Next steps: -- change to allennlp 1.xx -- switch to xlm multi lingual -- grouping the corpora during training -- the sentence problem ? how do we address it +Useful Links: +- Data for DISRPT 2021: https://github.com/disrpt/sharedtask2021 +- Website DISRPT 2021: https://sites.google.com/georgetown.edu/disrpt2021 +- Code for DISRTP 2019: https://gitlab.inria.fr/andiamo/tony + +Requirements: +- python 3.7 +- requirements.txt: `pip install -r requirements.txt` +- pytorch: `pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html` + +Usage: +- train: `bash expes.sh eng.rst.rstdt conllu bert train` +- test: `bash expes.sh eng.rst.rstdt conllu bert test` +- fine-tune with other model: `bash expes.sh eng.rst.rstdt conllu bert train eng` +- test on other model: `bash expes.sh eng.rst.rstdt conllu bert test eng` +- merge two datasets: `bash merger.sh eng.rst.rstdt eng.rst.gum eng` +- split with stanza: `python parse_corpus.py eng.rst.rstdt --parser stanza`