Resources for Text, Speech and Language Processing

Data collections

Collections maintained at this site

  1. The WordSimilarity-353 Test Collection
  2. TechTC - Technion Repository of Text Categorization Datasets

Collections maintained elsewhere

  1. Tagged datasets for named entity recognition tasks
  2. Test collections at AOL Research
  3. Linguistic Data Consortium (LDC)
  4. UCI Machine Learning Repository
↑ Back to top