For one of our clients we are looking for a
Data Scientist
Skills:
- Python programming language (pandas, numpy, seaborn)
- Hands-on experience with ML model cloud deployment, MLOps, preferably on AWS Sagemaker
- Experience in building machine learning and deep learning models. (related Python packages such as, scikit-learn, pytorch, keras, tensorflow, etc)
- Experience with building end to end ML pipelines: data pre-processing, model fitting, hyper-parameter tuning, model validation and model deployment
- Hands-on experience of working along data mining and ML modeling standard processes in context of model life cycle management, e.g. CRISP-DM
Nice to have skills:
- Experience with the RDKit python SDK
- Experience with machine-learning based drug discovery: i.e., molecular properties prediction, de novo molecule generation
Tasks:
- Building understanding existing ML models
- Create Gap Assessment of models
- Reproduce scoring/unit testing of existing models
- Adding FAIR metadata/formats to models and curated datasets
- Augmentation of scoring script for each model
- Model registration on Sagemaker
- Model deployment on Sagemaker in test environment
- Validation of deployed model in test environment
- Expose model as API using Sagemaker into production environment
- Sagemaker ML Pipelines (random forest, decision trees, logistic regressions etc.) for pre-processing, model training /validation, performance metrics reporting and deployment
- Establish understanding of required similarity measures
- Define algorithm for similarity measures against required catalogues
- Implement similarity search algorithms by using RDkit
- Pipelines for dimensionality reduction algorithms
Start: ASAP
Location: Remote
Duration: 12 months +