Corpora
We provide an integrated tool to obtain and preprocess 35 NER corpora. The script contains the following steps:- Download the corpora
- Conversion of the corpora
- Sentence splitting, tokenization and POS tagging (if not given)
- Deterministic split into training, development and test set with ratio 60:10:30
If you use the tool in your research, please cite us.