Gradient Descent Optimization

Question

Anyone familiar with the G.D.O. techniques help me in this...
1. Adam
2. AdaMax
3. Adagrad
4. Nesterov momentum
5. RMSprop
6. L1 and L2 regularization
These are some of the techniques applied in G.D.O. 
1. Which one of these algorithms can be used together? 
2. Please help with links to resources for further reading.
3. Can I apply them in Linear Regression? I have seen scikit learn implementation not talking about them in its documentation.

Thanks in advance

Accepted Answer

Some links:
https://ruder.io/optimizing-gradient-descent/

https://arxiv.org/pdf/1609.04747.pdf

https://www.kdnuggets.com/2019/06/gradient-descent-algorithms-cheat-sheet.html

https://github.com/harshraj11584/Paper-Implementation-Overview-Gradient-Descent-Optimization-Sebastian-Ruder

Answer

José Ailton Batista da Silva thanks for the links

Gradient Descent Optimization

Costuma ter perguntas como essa?