β€’4 min readβ€’from Machine Learning

Beyond Jupyter Notebooks: The real work behind Production ML systems [D]

Our take

In the evolving landscape of Machine Learning, the role of an ML Platform Engineer transcends traditional boundaries. While many focus on model training, the true essence of production ML lies in developing a robust system that ensures reliability and efficiency. This includes managing data pipelines, feature stores, and deployment workflows. By collaborating closely with Data Scientists and Product Managers, ML Platform Engineers craft solutions that address the complexities of real-world applications.

One day, someone asked me about my day-to-day work and what ML Platform Engineering entails. They wanted to know how I excelled in this field despite coming from a Software and Data Engineering background. How did I manage to break into ML Platform Engineering and lead an ML platform initiative at my current company, one of the top tech-savvy startups in India?

This question made me pause and reflect for a few moments πŸ˜„

I realised there is no single answer; it involves a lot of context and background. Firstly, I have never limited myself to the roles of a Data or Software Engineer in my various positions. Instead, I have focused on creating products that meet the needs of the moment, handling everything from start to finish. Interestingly, I've enjoyed DevOps more than traditional coding!

In my daily work, I have consistently engaged with tools such as Kubernetes, Docker, CI/CD, Open Table format, compute and query engines, and messaging queues. Often, I have been responsible for designing high-level system architectures for the problem statements I encountered. These high-level and low-level designs have been instrumental in helping me understand products in depth.

One key factor that has contributed to my success is my close collaboration with Data Scientists and Product Managers. They have always been my stakeholders, and I feel fortunate to have worked with many exceptional individuals. I have a habit of asking questions until I fully grasp my requirements.

While it’s true that I have never worked as a pure Data Scientist and have never trained an ML model, I believe that Production ML is not solely about the model itself. In fact, ML model training and development are just small parts of the entire ML lifecycle. Let me elaborate on what an ML Platform Engineer actually does.

When people start learning Machine Learning, most of the attention goes to the model.

  1. Which algorithm should we use?
  2. XGBoost or Neural Network?
  3. How do we improve accuracy?
  4. Can we tune hyperparameters better?

All of that matters.

But once you move from notebooks to production, you quickly realise something:

The model is only one part of the system.

A production ML system has many more questions:

  1. Where is the training data coming from?
  2. Who owns the feature pipelines?
  3. Are the same features available during real-time inference?
  4. How do we deploy the model safely?
  5. What happens if the model starts drifting?
  6. Who gets alerted when predictions become wrong, but the API is still returning 200 OK?

This is where ML becomes platform engineering.

In real production systems, a model needs:

  1. A reliable data pipeline.
  2. A feature store.
  3. A training pipeline.
  4. A model registry.
  5. A deployment workflow.
  6. A low-latency inference path.
  7. Monitoring.
  8. Drift detection.
  9. Retraining.
  10. Rollback strategy.

Without these, even a great model can fail silently.

This is one of the biggest mindset shifts in ML Platform Engineering:

A model is not a product.

A model becomes useful only when it is wrapped inside a reliable system that can train it, serve it, monitor it, and improve it continuously.

That is why production ML is not just about Data Science.

It is also about Data Engineering, Distributed Systems, APIs, Infrastructure, Observability, DevOps, and Software Engineering.

The best ML teams are not always the ones with the most complex models.

They are the ones who can reliably ship models to production and keep them working when data, users, and business conditions change.

That is the real work behind Production ML.

submitted by /u/thebigdatashow-ankur
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article β†’

Tagged with

#real-time data collaboration#generative AI for data analysis#Excel alternatives for data analysis#data visualization tools#data analysis tools#big data management in spreadsheets#conversational data analysis#intelligent data visualization#enterprise data management#big data performance#data cleaning solutions#real-time collaboration#financial modeling with spreadsheets#natural language processing for spreadsheets#machine learning in spreadsheet applications#enterprise-level spreadsheet solutions#digital transformation in spreadsheet software#business intelligence tools#rows.com#self-service analytics tools