Bookmarks for 2024-05-28

2 bookmarks were saved this day.

20.

NUS NLP datasets

www.comp.nus.edu.sg/~nlp/corpora.html
19.

cLang-8 Dataset

github.com/google-research-datasets/clang8

cLang-8 (“cleaned Lang-8”) is a dataset for grammatical error correction (GEC). The source sentences originate from the popular NAIST Lang-8 Learner Corpora, while the target sentences are generated by our state-of-the-art GEC method called gT5. The method is described in our ACL-IJCNLP 2021 paper.