Among developers involved with big data projects, a minor controversy has been brewing about which programming language best meets their needs. Many point to R as the ideal language for data science, but others prefer Python or use a mix of both.
Tom Rampley, a data scientist at Dish Network, says, “I use R extensively for the statistical functionality that comes with the various packages. I also use it for data manipulation with small data sets. However, for text parsing, large data set manipulation, and coding my own algorithms I much prefer Python in combination with the Numpy, Scipy, and Pandas packages.”
In a similar vein, Matt Asay writes, “R remains popular with the PhDs of data science, but as data moves mainstream, Python is taking over.”
Others say that the choice of language isn’t really all that important. They point to Linus Torvalds’ maxim which says, “Bad programmers worry about the code. Good programmers worry about data structures and their relationships.”