cLang-8 (“cleaned Lang-8”) is a dataset for grammatical error correction (GEC). The source sentences originate from the popular NAIST Lang-8 Learner Corpora, while the target sentences are generated by our state-of-the-art GEC method called gT5. The method is described in our ACL-IJCNLP 2021 paper.