+ 2

Python Data Analysis Social Sciences

Hope I'm not duplicating anyone's post. I'd love to know more about Python's applicability for data analysis in social sciences/ political science. I'm especially interested in what the advantages are compared to what I can do with R. R is a really powerful tool - yet I heard Python is excellent for plotting more complex stuff and also nice for web scraping, analysis of big data/ social media content etc. Anyone using Python in that area + advice for how to get started, programs to use with it, etc.? :)

16th Dec 2017, 11:35 AM
Fabian Habersack
Fabian Habersack - avatar
5 Réponses
+ 7
If you want a similar experience as the one you have in R-Studio (I assume), you might want to check out the Anaconda kit for Python: https://www.anaconda.com/download/ You have a nice scientific IDE which you can exploit similarly to R-Studio and it comes with Jupyter Notebook inside, which supports Python, R, Julia and Scala, too. As for data management, you have the numpy library, which makes life easier for R-ers, as "everything is a vector" for numpy :) Then, for data sclicing, a renown pandas library - introducing DataFrames. It is great for data cleaning, tidying and extracting, comes with builtin file readers (txt, csv, pdf and more). Finally, for scientific computing you have scipy and if you want to do some machine learning algorithms - scikit-learn is for that. Data visualization is also here with the staple matplotlib (analogue of ggplot) and its extension seaborn (more appealing visualizations). You might also go interactive with bokeh (a tremendous looking, customizable data presentations). Web scraping? Scrapy (running crawling webspiders) or beautifulsoup4 - great for webpage parsing, but you have to provide the webcrawling on your own.
16th Dec 2017, 12:23 PM
Kuba Siekierzyński
Kuba Siekierzyński - avatar
+ 5
I am using Python. I have a basic knowledge of R, but haven't been doing much with it. Python seems to me more versatile, as I can deploy my work into an online application with it. Plus, I am not sure if R has a rich neural network library..?
16th Dec 2017, 1:23 PM
Kuba Siekierzyński
Kuba Siekierzyński - avatar
+ 4
R is mainly used when the data analysis task requires standalone computing or analysis on individual servers. It’s great for exploratory work, and it's handy for almost any type of data analysis because of the huge number of packages and readily usable tests that often provide you with the necessary tools to get up and running quickly. R can even be part of a big data solution. When getting started with R, a good first step is to install the amazing RStudio IDE.  Once this is done, we recommend you to have a look at the following popular packages: dplyr, plyr and data.table to easily manipulate packages, stringr to manipulate strings, zoo to work with regular and irregular time series, ggvis, lattice, and ggplot2 to visualize data, andcaret for machine learning When and how to use Python? You can use Python when your data analysis tasks need to be integrated with web apps or if statistics code needs to be incorporated into a production database. Being a fully fledged programming language, it’s a great tool to implement algorithms for production use. While the infancy of Python packages for data analysis was an issue in the past, this has improved significantly over the years. Make sure to install NumPy /SciPy (scientific computing) and pandas (data manipulation) to make Python usable for data analysis.  Also have a look at matplotlib to make graphics, and scikit-learn for machine learning. Unlike R, Python has no clear “winning” IDE. We recommend you to have a look at Spyder, IPython Notebook and Rodeo to see which one best fits your needs.
16th Dec 2017, 12:20 PM
Bits!
+ 2
R: Pros and Cons Pro: A picture says more than a thousands words Pro: R ecosystem Pro: R lingua franca of data science Pro/Con: R is slow Con: R has a steep learning curve R’s learning curve is non-trivial, especially if you come from a GUI for your statistical analysis. Even finding packages can be time consuming if you’re not familiar with it. Python: Pros and Cons Pro: IPython Notebook Pro: A general purpose language Pro: A multi purpose language Pro/Con: Visualizations Con: Python is a challenger
16th Dec 2017, 12:24 PM
Bits!
+ 1
Cool. Thanks for all the detailed answers. Are you personally using R or Python too for data analysis? I'll definitely look into all of this and try to apply my basic syntax knowledge of Python I got from SoloLearn... :)
16th Dec 2017, 1:21 PM
Fabian Habersack
Fabian Habersack - avatar