Skip to content
Snippets Groups Projects
Commit 732c34ef authored by laura.riviere's avatar laura.riviere
Browse files

add new readme

parent d44adc0d
No related branches found
No related tags found
No related merge requests found
# Project DisCut22 *still WIP* : Discourse Annotator Tool
# Project DisCut22 : Discourse Annotator Tool
A tool for Discourse Annotation. Inheritor of ToNy and DisCut, segmentors for DISRPT 2019 and 2021. The goal of this version is to be easy to use with or without IT knowledge.
......@@ -12,29 +12,39 @@ Code: https://gitlab.inria.fr/andiamo/tony
# Usage
## Usecases
- **Discourse Segmentation:** Take a raw text as input, use a loaded model to make predictions. Output the same text but with EDU segmentation. --> config_1
- **Segmentation Evaluation:** Take an EDU gold segmented text as input, use a loaded model to make predictions. Output scores of model predictions against gold, and output discrepancies. --> config_2
- **Custom Model Creation:** Fine-tuning (over one or two level) a pretrained Language Model with a specific dataset or combination of datasets. Then make predictions and evaluation. --> config_3
## Usecases
- **Discourse Segmentation: "annotation"** Take a raw text as input, use a loaded model to make predictions. Output the same text but with EDU segmentation.
→ `config_global_1.1.json` and `config_global_1.2.json`.
- **Segmentation Evaluation: "test"** Take an EDU gold segmented text as input, use a loaded model to make predictions. Output scores of model predictions against gold, and output discrepancies.
→ `config_global_2.json`.
- **Custom Model Creation: "train"** Train a new model using a pretrained Language Model (BERT, etc) and a specific dataset or combination of datasets. Then make predictions and evaluation.
→ `config_global_3.json`.
- **Custom Model fine-tuning: "fine_tune"** Fine-tune an existing model using a pretrained Language Model (BERT, etc) and a specific dataset or combination of datasets. Then make predictions and evaluation.
→ `config_global_4.json`.
## Content description
- `README.md` Description of project.
- `global_config_file_guideline.md` Contains detailed documentation to build well formed config_global file.
- `data/my_cool_dataset/` Contains raw data that need the same name of the directory.
- `code/` Contains main scripts.
- `config_global_XX.json` A file to be completed for your specific project.
- `utils/` Contains useful scripts to be called.
- `discut22_1.py` One python script to run them all.
- `model/` Contains model to be loaded.
- `config_training.jsonnet` A file to be completed for usecases 3 and 4.
- `projects/` This directory will be created automatically.
- `my_cool_exp_v1/` Name of your run. This directory will be created automatically. (see Usage)
- `logs_global.json` Logs of all processes, data and results. This file will be created automatically.
- `data_converted/` Contains pre-processed data if needed. This directory will be created automatically.
- `results/` Contains output files, logs and metrics, if any. This directory will be created automatically.
- `train/` Contains specific output related to train (like the model created), if any.
- `fine_tune/` Contains specific output related to train (like the model created), if any.
## Content description
[TBD : xplain directories automatically created during scripts run]
- `data/my.cool.dataset/` Contains input data, raw and/or pre-processed format(s).
- `results.{stamp}/` Contains output data, scores and post-processed data. (Also logs of allennlp)
- `code/` Contains main scripts.
- `discut22_1.py` One python script to run them all.
- `config_XX.json` A file to be completed for your specific project (or a dir with choise between simple use_case configs and a template for a custom config). See **
- `utils/` Contains useful scripts to be called.
- `model/` Contains model to be loaded or created.
- `config_training.jsonnet` A file to be completed. (TBD automatically saved with model when done)
- `global_config_file_guideline.md` Contains detailed documentation to build well formed config file.
## Set up environnement
- Conda stuff pour python 3.7 (TBD ?)
- DICUT22 run on Python 3.7. Advise : create a specific virtual envireonment (Miniconda...).
- Install all librairies required with the following command:
```
pip install -r requirements.txt
......@@ -46,15 +56,19 @@ pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio===0.9.0 -f h
## Configuration file: to chose or to complete
- `code/config_global_X.json` See global_config_file_guideline.md.
- `code/config_global_X.json` → See `global_config_file_guideline.md`.
## Run usecase 1
(go to `code` directory)
Run this command:
# Usage
(go to `code` directory)
Run the command:
```
python discut22.py --config config_XX.json
python discut22_2.py --config config_XX.json [--name my_run_name] [-o]
```
--config <> &nbsp; &nbsp; &nbsp; &nbsp; Your config file. (Mandatory)
--name <> &nbsp; &nbsp; &nbsp; &nbsp; A name for your run. (Optional)
-o, --overwrite &nbsp; &nbsp; &nbsp; &nbsp; Allow overwriting of `data_converted/` and `results/`. (optional)
## Support
laura.riviere@irit.fr
......@@ -63,7 +77,12 @@ laura.riviere@irit.fr
## Authors and acknowledgment
Morteza Ezzabady
Laura Rivière
Amir Zeldes
Amir Zeldes
## License
Copyright 2023 IRIT-MELODI
<!---
## Test and Deploy
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment