Maybe it's bitterness from grad school <g>, but I've long held the belief that gradient-descent-style algorithms are often the most efficient approach for parameter estimation.
The topic of maximum entropy parameter estimation just came up on the scikit-learn mailing list to which I had just subscribed after noticing a relevant cross-post on the scipy mailing list. Scikit-learn provides a variety of machine learning algorithm implementations based on scipy and numpy. List members were debating whether maximum entropy estimation code should be preserved. I noted that maximum entropy yields the same models as maximum likelihood estimated exponential models.
Later someone asked why different techniques were used for maximum entropy versus logistic regression. My opinion is that good gradient-type solvers are difficult to implement and it's easier to get published if a new approach comes with its own, new estimation method. I cited Optimization with EM and Expectation-Conjugate-Gradient as an example. Another list member offered a more relevant example, A comparison of algorithms for maximum entropy parameter estimation, where IIS and GIS are shown to be less efficient than gradient-based methods.