A sequential dual method for the structured ramp loss minimization
Abstract
speech tagging, and also presents the results on artificial data when the method is exposed to outlayers. The comparison with a primal sub-gradient method with the structured ramp and hinge loss is also presented.
Keywords
Full Text:
PDFReferences
Balamurugan, P., Shevade, S., Sundararajan, S., and Keerthi, S. S. A Sequential Dual Method for Structural SVMs. In SDM 2011 - Proceedings of the Eleventh SIAM International Conference on Data Mining (2011).
Bordes, A., Ertekin, S., Weston, J., and Bottou, L. Fast kernel classifiers with online and active learning. Journal of Machine Learning Research 6 (Dec. 2005), 1579–1619.
Brooks, J. P. Support vector machines with the ramp loss and the hard margin loss. Operations research 59, 2 (2011), 467–479.
Carrizosa, E., Nogales-Gmez, A., and Romero Morales, D. Heuristic approaches for support vector machines with the ramp loss. Optimization Letters 8, 3(2014), 1125–1135.
Collins, M. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing (2002), vol. 10, Association for
Computational Linguistics, pp. 1–8.
Collobert, R., Sinz, F., Weston, J., and Bottou, L. Trading convexity for scalability. In Proceedings of the 23rd International Conference on Machine Learning (New York, NY, USA, 2006), ICML ’06, ACM, pp. 201–208.
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. Online passive-aggressive algorithms. The Journal of Machine Learning Research 7 (2006), 551–585.
Do, C. B., Le, Q. V., Teo, C. H., Chapelle, O., and Smola, A. J. Tighter bounds for structured estimation. In Advances in neural information processing systems (2008), pp. 281–288.
Ertekin, S., Bottou, L., and Giles, C. L. Nonconvex online support vector machines. Pattern Analysis and Machine Intelligence, IEEE Transactions on 33, 2 (2011), 368–381.
Gimpel, K. Discriminative Feature-Rich Modeling for Syntax-Based Machine Translation. PhD thesis, Carnegie Mellon University, 2012.
Gimpel, K., and Smith, N. A. Structured ramp loss minimization for machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Strouds-
burg, PA, USA, 2012), NAACL HLT ’12, Association for Computational Linguistics, pp. 221–231.
Kuhn, H. W., and Tucker, A. W. Nonlinear programming. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability (Berkeley, Calif., 1951), University of California Press, pp. 481–492.
Platt, J. C. Advances in kernel methods. MIT Press, Cambridge, MA, USA, 1999, ch. Fast training of support vector machines using sequential minimal optimization, pp. 185–208.
Shalev-Shwartz, S., Singer, Y., Srebro, N., and Cotter, A. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming 127, 1 (2011), 3–30.
Taskar, B. Learning Structured Prediction Models: A Large Margin Approach. PhD thesis, Stanford University, CA, 2004.
Tjong Kim Sang, E. F., and Buchholz, S. Introduction to the conll-2000 shared task: Chunking. In Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning-Volume 7 (2000), Association for Computational Linguistics, pp. 127–132.
Tsochantaridis, I., Hofmann, T., Joachims, T., and Altun, Y. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the Twenty-first International Conference on Machine Learning (New York, NY, USA,
, R. Greiner and D. Schuurmans, Eds., ICML ’04, ACM. 18. Vapnik, V. Statistical learning theory. Wiley, 1998.
Wang, L., Jia, H., and Li, J. Letters: Training robust support vector machine with smooth ramp loss in the primal space. Neurocomputing 71, 13-15 (Aug. 2008), 3020–3025.
Wang, Z., and Vucetic, S. Fast online training of ramp loss support vector machines. In Proceedings - IEEE International Conference on Data Mining, ICDM (2009), pp. 569–577.
Xu, W. Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490 (2011).
Yuille, A. L., and Rangarajan, A. The concave-convex procedure. Neural Computation 15, 4 (Apr. 2003), 915–936.
Refbacks
- There are currently no refbacks.
ISSN 0352-9665 (Print)