diff --git a/README.md b/README.md index 0417a5a0ab0364cab050a9f3a85606db1e33023c..c6021e3e8a7d0663afa3a4714af043a0089739c5 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Project DisCut22 *still WIP* : Discourse Annotator Tool +# Project DisCut22 : Discourse Annotator Tool A tool for Discourse Annotation. Inheritor of ToNy and DisCut, segmentors for DISRPT 2019 and 2021. The goal of this version is to be easy to use with or without IT knowledge. @@ -12,29 +12,39 @@ Code: https://gitlab.inria.fr/andiamo/tony -# Usage -## Usecases -- **Discourse Segmentation:** Take a raw text as input, use a loaded model to make predictions. Output the same text but with EDU segmentation. --> config_1 - -- **Segmentation Evaluation:** Take an EDU gold segmented text as input, use a loaded model to make predictions. Output scores of model predictions against gold, and output discrepancies. --> config_2 - -- **Custom Model Creation:** Fine-tuning (over one or two level) a pretrained Language Model with a specific dataset or combination of datasets. Then make predictions and evaluation. --> config_3 +## Usecases +- **Discourse Segmentation: "annotation"** Take a raw text as input, use a loaded model to make predictions. Output the same text but with EDU segmentation. +→ `config_global_1.1.json` and `config_global_1.2.json`. +- **Segmentation Evaluation: "test"** Take an EDU gold segmented text as input, use a loaded model to make predictions. Output scores of model predictions against gold, and output discrepancies. +→ `config_global_2.json`. +- **Custom Model Creation: "train"** Train a new model using a pretrained Language Model (BERT, etc) and a specific dataset or combination of datasets. Then make predictions and evaluation. +→ `config_global_3.json`. +- **Custom Model fine-tuning: "fine_tune"** Fine-tune an existing model using a pretrained Language Model (BERT, etc) and a specific dataset or combination of datasets. Then make predictions and evaluation. +→ `config_global_4.json`. + +## Content description + +- `README.md` Description of project. +- `global_config_file_guideline.md` Contains detailed documentation to build well formed config_global file. +- `data/my_cool_dataset/` Contains raw data that need the same name of the directory. +- `code/` Contains main scripts. + - `config_global_XX.json` A file to be completed for your specific project. + - `utils/` Contains useful scripts to be called. + - `discut22_1.py` One python script to run them all. +- `model/` Contains model to be loaded. + - `config_training.jsonnet` A file to be completed for usecases 3 and 4. +- `projects/` This directory will be created automatically. + - `my_cool_exp_v1/` Name of your run. This directory will be created automatically. (see Usage) + - `logs_global.json` Logs of all processes, data and results. This file will be created automatically. + - `data_converted/` Contains pre-processed data if needed. This directory will be created automatically. + - `results/` Contains output files, logs and metrics, if any. This directory will be created automatically. + - `train/` Contains specific output related to train (like the model created), if any. + - `fine_tune/` Contains specific output related to train (like the model created), if any. -## Content description -[TBD : xplain directories automatically created during scripts run] -- `data/my.cool.dataset/` Contains input data, raw and/or pre-processed format(s). - - `results.{stamp}/` Contains output data, scores and post-processed data. (Also logs of allennlp) -- `code/` Contains main scripts. - - `discut22_1.py` One python script to run them all. - - `config_XX.json` A file to be completed for your specific project (or a dir with choise between simple use_case configs and a template for a custom config). See ** - - `utils/` Contains useful scripts to be called. -- `model/` Contains model to be loaded or created. - - `config_training.jsonnet` A file to be completed. (TBD automatically saved with model when done) -- `global_config_file_guideline.md` Contains detailed documentation to build well formed config file. ## Set up environnement -- Conda stuff pour python 3.7 (TBD ?) +- DICUT22 run on Python 3.7. Advise : create a specific virtual envireonment (Miniconda...). - Install all librairies required with the following command: ``` pip install -r requirements.txt @@ -46,15 +56,19 @@ pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio===0.9.0 -f h ## Configuration file: to chose or to complete -- `code/config_global_X.json` See global_config_file_guideline.md. +- `code/config_global_X.json` → See `global_config_file_guideline.md`. -## Run usecase 1 -(go to `code` directory) -Run this command: +# Usage +(go to `code` directory) +Run the command: ``` -python discut22.py --config config_XX.json +python discut22_2.py --config config_XX.json [--name my_run_name] [-o] ``` +--config <> Your config file. (Mandatory) +--name <> A name for your run. (Optional) +-o, --overwrite Allow overwriting of `data_converted/` and `results/`. (optional) + ## Support laura.riviere@irit.fr @@ -63,7 +77,12 @@ laura.riviere@irit.fr ## Authors and acknowledgment Morteza Ezzabady Laura Rivière -Amir Zeldes +Amir Zeldes + + + +## License +Copyright 2023 IRIT-MELODI <!--- ## Test and Deploy