RCV1: A New Benchmark Collection for Text Categorization Research. Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made available by Reuters, Ltd. for research purposes. Use of this data for research on text categorization requires a detailed understanding of the real world constraints under which the data was produced. Drawing on interviews with Reuters personnel and access to Reuters documentation, we describe the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data. We refer to the original data as RCV1-v1, and the corrected data as RCV1-v2. We benchmark several widely used supervised learning methods on RCV1-v2, illustrating the collection’s properties, suggesting new directions for research, and providing baseline results for future studies. We make available detailed, per-category experimental results, as well as corrected versions of the category assignments and taxonomy structures, via online appendices.

This software is also peer reviewed by journal TOMS.

References in zbMATH (referenced in 123 articles )

Showing results 1 to 20 of 123.
Sorted by year (citations)

1 2 3 ... 5 6 7 next

  1. Jalilzadeh, Afrooz; Nedić, Angelia; Shanbhag, Uday V.; Yousefian, Farzad: A variable sample-size stochastic quasi-Newton method for smooth and nonsmooth stochastic convex optimization (2022)
  2. Matonoha, Ctirad; Moskovka, Alexej; Valdman, Jan: Minimization of (p)-Laplacian via the finite element method in MATLAB (2022)
  3. Galvan, Giulio; Lapucci, Matteo; Lin, Chih-Jen; Sciandrone, Marco: A two-level decomposition framework exploiting first and second order information for SVM training problems (2021)
  4. Giunchiglia, Eleonora; Lukasiewicz, Thomas: Multi-label classification neural networks with hard logical constraints (2021)
  5. Metel, Michael R.; Takeda, Akiko: Stochastic proximal methods for non-smooth non-convex constrained sparse optimization (2021)
  6. Rong, Wentao; Zhuo, Enhong; Peng, Hong; Chen, Jiazhou; Wang, Haiyan; Han, Chu; Cai, Hongmin: Learning a consensus affinity matrix for multi-view clustering via subspaces merging on Grassmann manifold (2021)
  7. Yang, Yiyang; Deng, Sucheng; Lu, Juan; Li, Yuhong; Gong, Zhiguo; U, Leong Hou; Hao, Zhifeng: GraphLSHC: towards large scale spectral hypergraph clustering (2021)
  8. Zhang, Hongjing; Zhan, Tianyang; Basu, Sugato; Davidson, Ian: A framework for deep constrained clustering (2021)
  9. Jaffe, Ariel; Kluger, Yuval; Linderman, George C.; Mishne, Gal; Steinerberger, Stefan: Randomized near-neighbor graphs, giant components and applications in data science (2020)
  10. Jung, Jinhong; Sael, Lee: Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach (2020)
  11. Loor, Marcelo; De Tré, Guy: Handling subjective information through augmented (fuzzy) computation (2020)
  12. Nakano, Felipe Kenji; Cerri, Ricardo; Vens, Celine: Active learning for hierarchical multi-label classification (2020)
  13. Yang, Tianbao; Zhang, Lijun; Lin, Qihang; Zhu, Shenghuo; Jin, Rong: High-dimensional model recovery from random sketched data by exploring intrinsic sparsity (2020)
  14. Yousefian, Farzad; Nedić, Angelia; Shanbhag, Uday V.: On stochastic and deterministic quasi-Newton methods for nonstrongly convex optimization: asymptotic convergence and rate analysis (2020)
  15. Yuan, Xiao-Tong; Li, Ping: On convergence of distributed approximate Newton methods: globalization, sharper bounds and beyond (2020)
  16. Yuan, Xiao-Tong; Liu, Bo; Wang, Lezi; Liu, Qingshan; Metaxas, Dimitris N.: Dual iterative hard thresholding (2020)
  17. Duchi, John; Namkoong, Hongseok: Variance-based regularization with convex objectives (2019)
  18. Fercoq, Olivier; Bianchi, Pascal: A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions (2019)
  19. Karakus, Can; Sun, Yifan; Diggavi, Suhas; Yin, Wotao: Redundancy techniques for straggler mitigation in distributed optimization and learning (2019)
  20. Krishnamurthy, Akshay; Agarwal, Alekh; Huang, Tzu-Kuo; Daumé, Hal III; Langford, John: Active learning for cost-sensitive classification (2019)

1 2 3 ... 5 6 7 next