Tom Marsh

Physics | Machine Learning | Engineering

Home - About - Resume - Projects - Writing

A Roadmap for ML Systems & Data Science

This document collects resources I have found genuinely useful over my career as an ML engineer. It is not exhaustive and it is highly opinionated, so please take it with a grain of salt. To be blunt straight out of the gate, I am not a big fan of corporate certificates or generalised courses. I think they are fundamentally limited by their aspiration to be generalisable and your time could be better utilised building actual projects. That being said, do not worry if you have already done them, if you learnt something from them that is awesome, that was just my opinion on them. The resources below are ones I think will actually move the needle if you engage with them seriously.

This field is quite deep, do not be afraid of mistakes or being wrong, that is the first step to learning something. If you are not looking back at your work from 6 to 12 months ago and thinking “wow, I would have done X differently now that I know of Y, i’ll do that moving forward” then you are probably not growing or challenging yourself enough.


1. How Careers Actually Progress

The titles vary by company, but the responsibilities roughly follow this pattern. Do not stress if you spend a long time in junior or intermediate roles. It depends entirely on company structure and what projects come your way.

Soft skills become dramatically more important the further up you go. You can be a brilliant developer, but if you are difficult to work with, promotions will pass you by. Humans are uncannily good at working in groups, so leverage that.


2. Foundational ML & Data Science

Books (History & Intuition)

Textbooks & References

Collections & References


3. Production ML & MLOps

This is where most people hit a wall. Prototyping a model in a notebook is a mere fraction of the actual work. The rest is everything around it: shoring up the data pipelines, productionising your prototype, shimmying the cleaned up model into a live codebase, monitoring it when it inevitably drifts, and convincing your team the pipeline is not held together with duct tape. I spent years learning this the hard way, but these resources will get you there faster and hopefully save you some of the trial and error I did.

Essential Reading

Further Reading


4. Practical Tooling Advice

Pick up Docker early. I cannot stress this enough. Getting comfortable with containerization and deploying code will pay dividends throughout your career and save you from a lot of painful conversations with DevOps. Also, go talk to people. I spent a considerable amount of time talking with Data Engineers and Software Engineers just to understand how my work fit into the bigger picture, and it was some of the most valuable learning I did. That convoluted expensive feature that adds +0.01% to your target metric probably won’t see the light of day if the SWE or MLE thinks it will tank their P99 response times.


5. Cheesy Advice That Needs Saying