Saturday, December 1, 2012

Review of Python for Data Analysis by Wes McKinney

I received this book via O'Reilly Blogger review program. I would say that I'm both the correct person to review this book and totally incorrect person to do it. I'm a particle physicist doing lots of data analysis with C++. I've not used Python, but I encountered it in course organized by CSC (Finnish IT Center for Science, which provides high performance computing support for Universities and others).

As I said I have basically no Python background at all, I was happy to see that the book had Appendix of Python essentials. It wasn't much but with some previous programming skills it was enough to carry me through rest of the book.


The back cover mentions that using IPython interactive shell as your primary development environment. It forgets to mention that the book basically doesn't give any other options than the interactive shell...
In the start of the book the standard O'Reilly typographical conventions are given about code are given but they are not followed in the examples. Text is in standard font but all of the code and output are in fixed widht font without any bold bits. As I mentioned the book is stuck into the interactive shell and inputs and outputs are only distinguishable from start of the line being In [xxx]:  on inputs and corresponding Out [xxx]: on some of the output lines. To make things even worse examples are cut in middle multiline outputs routinelly. This would be understandable in long examples but not with lots of short examples. I think that many books would benefit use of more examples but this book suffers for using too many examples, the ratio between example lines and text lines is close to 1:1.

As I said I'm a particle physicist. So for me data analysis means huge binary datasets, confidence limits, sigma values, integrals, derivates, fitting functions to data, parallelisation... The book was more about counting how many babies have been named James or fluctuations of stock values in time. It is not data analysis in my world. But to be honest it didn't actually promise what I wanted it to be but I was hopefull that those would be in. It did promise advanced NumPy (Numerical Python) features though.

The book did deliver some NumPy, panda (library for handling large data sets), and matplotlib (plotting library). I'm quite sure that these will be usefull for me in future if I'm to use python language in my work (as I mentioned Python is recommended by CSC.) One can learn some basic data visualization and some not so serious data handling from the book so it could be usefull for other kind of data analysis. I'll give the book 3/5 with main grievance being in typography but the book as whole also felt short in the data analysis part, but for me it was 2/5.

No comments:

Post a Comment