logo

View all jobs

Data Scientist - July 2022

remote, remote
For one of our clients we are looking for a freelance Data Scientist

Description:

We are embarking on two separate Data Science topics (Deviations Recurrence Check, Deviations Trending) in our project that seek to employ Natural Language Processing (NLP) + clustering algorithms to group similar records together in our Global Quality management system (QMS).
The contractor will receive the entire data set out of the QMS for the fulfillment of the services

The Deviations Recurrence Check seeks to replace the existing manual process whereby users determine whether or not a problem is recurrent. For example, for each new deviation, the Deviations workflow user must determine if this deviation is related to existing deviations in order to ascertain whether or not a problem is ongoing. From the QMS perspective, when the user has the basic, required fields for a new deviation filled out, they would click a button that fires a web service using the results of the clustering algorithm to return a list of similar Deviations for user review purposes.
The goal of this Deviations Recurrence Check project is to give our users a function that associate a current/new deviation with existing deviations in order to determine if a problem continues to occur. For example, if I have a deviation focused on broken tablets, we want the function to find all records related to the problem (or: trend) broken tablets in order to determine if this a new problem or continuing problem. By extension, we could use this information to determine if a CAPA was effective.

The contractor needs to employ a clustering algorithm(s)
• by checking different algorithm methods based on the contractor’s experiences and best practices,
• by considering that QMS system has multiple language content (for example long-text fields) for sematic processing
• by proposing drafts which needs presented to the project team for review and usability
• by adapting the draft based on the delivered feedback from the project team
• by handing over the final version to final approval by project team.


The Deviations Trending project seeks to find problems in the sea of data that is QMS. We envision a weekly/nightly-retrained model that analyzes the complete set of quality data (Deviations, Complaints, Audits, Investigations, OOX, Supplier Qualifications, Events, Non-Conformities, OOXs) in order to identify problem trends (e.g., broken tablets caused by a specific model of machine). We want to achieve primary cluster group which we are able to drill down to secondary or third cluster groups. Perform a clustering algorithm on the dataset to divide them into groups of problems/trends. Our plan is to use HDBSCAN to three in order to get very granular trends. The contractor will receive an example of the clustering levels. The overarching goal is to automatically analyze our QMS data and find abnormalities which cannot be easily identified with simple search functionality in the QMS.

The contractor needs to employ a clustering algorithm(s)
• by checking algorithm methods based on the contractor’s experiences and best practices,
• by clustering records in scope of algorithm using previously defined key words over period of time ( e.g. we found 50 “broken tablet” records in the last 12 months).and counting clustered records (e.g. 5 records in January, 7 in Feb, 8 in Mar, 10 in Apr, etc.
• by proposing drafts which needs presented to the project team for review and usability
• by adapting the draft based on the delivered feedback from the project team
• by handing over the final version to final approval by project team.


Skillset:
• Data-oriented personality
• Proficiency in using query languages
• Excellent understanding of machine learning techniques, and algorithms (e.g. HDBSCAN, KNN, K-means)
• Excellent understanding of natural language processing-semantic analysis
• Experience with common data science toolkits,
• Great communication skills

Project-/Budget-/Team responsibility:
• No
• only dedicated to the data science tasks

Language:
• English (must), German(not mandatory)


Start: ASAP
Duration: 7 month+
Location: remote
Capacity: 25 hours/week

Share This Job

Powered by