A sequential dual method for the structured ramp loss minimization

Dejan Mancev

DOI Number
First page
Last page


The paper presents a sequential dual method for the non-convex structured ramp loss minimization. The method uses the concave-convex procedure which transforms a non-convex problem iterativelly into a series of convex ones. The sequential minimal optimization is used to deal with the convex optimization by sequentially traversing through the data and optimizing parameters associated with the incrementally built set of active structures inside each of the training examples. The paper includes the results on two sequence labeling problems, shallow parsing and part-of-
speech tagging, and also presents the results on artificial data when the method is exposed to outlayers. The comparison with a primal sub-gradient method with the structured ramp and hinge loss is also presented.


structured classification, support vector machines, sequence labeling

Full Text:



Balamurugan, P., Shevade, S., Sundararajan, S., and Keerthi, S. S. A Sequential Dual Method for Structural SVMs. In SDM 2011 - Proceedings of the Eleventh SIAM International Conference on Data Mining (2011).

Bordes, A., Ertekin, S., Weston, J., and Bottou, L. Fast kernel classifiers with online and active learning. Journal of Machine Learning Research 6 (Dec. 2005), 1579–1619.

Brooks, J. P. Support vector machines with the ramp loss and the hard margin loss. Operations research 59, 2 (2011), 467–479.

Carrizosa, E., Nogales-Gmez, A., and Romero Morales, D. Heuristic approaches for support vector machines with the ramp loss. Optimization Letters 8, 3(2014), 1125–1135.

Collins, M. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing (2002), vol. 10, Association for

Computational Linguistics, pp. 1–8.

Collobert, R., Sinz, F., Weston, J., and Bottou, L. Trading convexity for scalability. In Proceedings of the 23rd International Conference on Machine Learning (New York, NY, USA, 2006), ICML ’06, ACM, pp. 201–208.

Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. Online passive-aggressive algorithms. The Journal of Machine Learning Research 7 (2006), 551–585.

Do, C. B., Le, Q. V., Teo, C. H., Chapelle, O., and Smola, A. J. Tighter bounds for structured estimation. In Advances in neural information processing systems (2008), pp. 281–288.

Ertekin, S., Bottou, L., and Giles, C. L. Nonconvex online support vector machines. Pattern Analysis and Machine Intelligence, IEEE Transactions on 33, 2 (2011), 368–381.

Gimpel, K. Discriminative Feature-Rich Modeling for Syntax-Based Machine Translation. PhD thesis, Carnegie Mellon University, 2012.

Gimpel, K., and Smith, N. A. Structured ramp loss minimization for machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Strouds-

burg, PA, USA, 2012), NAACL HLT ’12, Association for Computational Linguistics, pp. 221–231.

Kuhn, H. W., and Tucker, A. W. Nonlinear programming. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability (Berkeley, Calif., 1951), University of California Press, pp. 481–492.

Platt, J. C. Advances in kernel methods. MIT Press, Cambridge, MA, USA, 1999, ch. Fast training of support vector machines using sequential minimal optimization, pp. 185–208.

Shalev-Shwartz, S., Singer, Y., Srebro, N., and Cotter, A. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming 127, 1 (2011), 3–30.

Taskar, B. Learning Structured Prediction Models: A Large Margin Approach. PhD thesis, Stanford University, CA, 2004.

Tjong Kim Sang, E. F., and Buchholz, S. Introduction to the conll-2000 shared task: Chunking. In Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning-Volume 7 (2000), Association for Computational Linguistics, pp. 127–132.

Tsochantaridis, I., Hofmann, T., Joachims, T., and Altun, Y. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the Twenty-first International Conference on Machine Learning (New York, NY, USA,

, R. Greiner and D. Schuurmans, Eds., ICML ’04, ACM. 18. Vapnik, V. Statistical learning theory. Wiley, 1998.

Wang, L., Jia, H., and Li, J. Letters: Training robust support vector machine with smooth ramp loss in the primal space. Neurocomputing 71, 13-15 (Aug. 2008), 3020–3025.

Wang, Z., and Vucetic, S. Fast online training of ramp loss support vector machines. In Proceedings - IEEE International Conference on Data Mining, ICDM (2009), pp. 569–577.

Xu, W. Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490 (2011).

Yuille, A. L., and Rangarajan, A. The concave-convex procedure. Neural Computation 15, 4 (Apr. 2003), 915–936.


  • There are currently no refbacks.

© University of Niš | Created on November, 2013
ISSN 0352-9665 (Print)
ISSN 2406-047X (Online)