Starting with statistics first.

Rômulo Peixoto
4 min readMay 4, 2021

So for my first text here in Medium, I’d like to start with an idea I had some weeks ago, and tell you guys about what I’ve been learning from it. Well, I gave away in the title what I’m up to, but some background is actually necessary.

Some Background

Last year I worked in an NGO and got into a manager role, my job was basically to guarantee that sales were going the right path, which I did the best I could, but with a clear feeling that selling was not my thing, soon I had to drive some really basic data analysis to know where we were going, and it was the best part of the job for me, even not being the top math student I really liked math and telling a story with numbers is actually a very meaningful way of telling compelling and interesting stories.

But I never had to use numbers to tell how some business was running and the material provided for the job was lacking a lot in my opinion, many questions without an answer, so there was I going for it to learn a lot about how to extract the best I could from data. Happily or not, it was the period of the pandemic that data science had a big break, and everybody was talking about it. I already have heard about it, but never actually got into that, I was learning python basics back then and bumped into it before.

So, the job I was the most insecure about was knowing how can we get from where we were to our goal. This is really the definition of data science at its peak, and I got into some quick concepts to extract better information and provide a clearer story to what happened and what should happen, from that I was actually a lot interested in going for the jackpot, work as a data scientist, there started my journey.

From zero to where?

Then for some months, I’ve been doing some free fast courses to see if I could understand something and progress somehow, which actually both things happened, soon I was knocking on the door of my own first project for my portfolio, which is where started the problem, I’m not the kind of guy who does things blindly, you see, there’s a whole lot of texts here in medium and on the internet that cover up everything you need to know to achieve your own first project, but some key steps are just a given, and other things that showed up in the code I was like “Why is this like this and not like that?” well, being a college graduate with not much money to spend, I had to find my own way to find out.

So the idea was fairly simple, data science is essentially statistics and your models can only be as good as your EDA and the pipeline you built, so the first step would be EDA, and there must be a class in my university that teaches it. Data science is a very old function, just not with this name, but anyway, and if there is a class where it is taught, there is a book that guides the contents of this class, and, usually, those books have a lot of exercises and problems for you to solve.

I was not only right but I actually found a way to learn the fundamentals and to practice python along the way, yes, my weapon of choice is python, if I can solve a problem with a 10 line dataframe, it would be easier to adjust it for 20,000 lines one ( actually a dataframe is a dataframe, small or big, they work the same if can do it for one, you can do it for the other).

And here is where my journey actually begins for real, I’m gonna be posting some learnings, codes, and visions here, at Github, and Kaggle so you guys can check all out chapter by chapter. I’m open to any constructive feedback and any help you can give me, hope there are discussions about the questions and problems I post here too.

All in all, have a good one everybody, cheers.

--

--