Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
Ml Ops
- A buzzword I don't really like, but sadly applies to my work.
- ML Ops Overview, definition & architecture
- "Examine how ML processes can be automated & operationalized"
- Methodology
- Literature survery
- Interviews
- Principles = Best Practices
- CI/CD automation -- fast feedback for build, test, delivery & deploy
- Workflow orchestration
- Reproducibility -- same results
- Versioning -- data, model, code for reproduction and tracing
- Collaboration -- on data, model and code
- Continuous ML training & evaluation --
- monitoring
- feedback loop
- automated ML workflow pipeline
-
- eval run to check for changes in model quality
- ML Metadata tracking/logging -- full traceability
- Continuous Monitoring -- periodic assessment of data, model, code, infra, model perf
- Feedback loops -- eval -> engineering, monitoring -> scheduler, etc.
- Components
- CI/CD automation
- Source code -- code storing
- Workflow Orchestration -- DAGs
- Feature Store -- offline, online
- Model Training Infrastructure
- Model registry -- trained models + metadata
- ML Metadata store
- Model serving component
- Monitoring component -- includes tensorboard
- People -- not so clean
- Business stakeholder
- Solution "architect"
- Data scientist / ML Engineer
- Data Engineer (Feature engineer)
- Software Engineer
- DevOps
- ML Engineer / ML Ops engineer
-
- Can have the monitoring system forward drift detection to the primary system
- Intersection of ML, SWE, DevOps, Data Engineering
- Challenges: organizational, ML changes, operational headaches
- Conclusion:
- In the real world, we observe data scientists still managing ML workflows manually to a great extent. The paradigm of Machine Learning Operations (MLOps) addresses these challenges.
- Follow ups
- Contrast this paper against existing solutions and different systems
- Point S., E. to this paper for interview questions
**
— Kunal