For one of our clients we are looking for an Azure DevOps Specialist
Background to the assignment:
- The project delivers a design and implementation of a new concept for data ingestion into Azure Data Lake Gen2. The concept must be in line with the existing concept for ingesting data into the client’s data warehouse. Especially curation of the data (naming convention, key definition) should be in line with the curation concept used in NEW BI.
- The project is delivered as a SCRUM project using the same methodology and structure as other NEW BI projects. The client is using is a lightweight SCRUM framework, teams and organizations generate value through adaptive solutions for complex problems.
- The aim of the project is to deliver data in a consistent well documented way to various use cases within the client Data Foundation.
Due to capacity issues, the client does not have own employees with sufficient expertise in the following areas needed for this project:
- Azure Dev Ops - especially experience with CI/CD pipeline
- Azure Synapse Analytics (SPARK cluster)
- Python (pySPARK, pandas and data lake delta)
- Common document models (JSON, XML)
- .Net Framework (especially C#)
- The services shall be provided within the framework of an agile development method. The concrete activities required in each case to implement the services commissioned shall be agreed iteratively between the parties within the framework of sprint meetings and implemented by the Contractor within the respective sprints following the sprint meetings. Prior to each sprint meeting, the contractor shall independently check, on the basis of its professional expertise, which individual services are reasonable and feasible within the scope of the assignment in the respective sprint.
- The delivery of the project has to be organized by the external consultant incl. planning the work packages (backlog), organizing meetings, planning deployments, testing and documenting the developed code. The external consultants will form an independent SCRUM team. A dedicated SCRUM Master is not foreseen (the SCRUM team can pick one team member to partially take over this role).
- A Product Owner orders the work for a complex problem into a Product Backlog. The Backlog is maintained is Azure DevOps. The product backlog corresponds to the project plan, the roadmap for what your team plans to deliver. Backlog is created by adding user stories, backlog items, or requirements.
- One sprint consists of two weeks and there is a daily standup meeting. By the end of a sprint planning meeting, the team will have two items. The first is a sprint goal (a summary of the plan for the next sprint). The second item is the sprint backlog (the list of projects the team will work on during the sprint). During the meetings, the team reviews its backlog and decides what items to prioritize for the next sprint and the contractor independently performs the following tasks:
- Analyzing the current ingestion concept of the client’s Data Warehouse (DWH) as a first step before designing a new concept for ingestion of data into the data lake. Data ingestion extracts data from the source where it was created or originally stored and loads data into a destination or staging area.
- Reviewing the result of the client Data Foundation (client’s data lake and data warehouse implementation) review (available as Word document and Power Point presentation) to be able to design new concepts.
- Independently organizing Workshops with client employees to discuss the results of the review is part of the project. Workshops can be conducted remotely or on-site depending (taking to the infection rules into account). The workshop will have up to five employees from the client and the SCRUM team.
- The current ingestion concepts are described in the wiki documentation of client’s NEW BI project. The project related concept has to be independently reviewed and refined if necessary. Access to DevOps (including wiki pages), GIT repos and development environments are granted by the client in advance.
- Creating a concept for data ingestion and curation of data into client’s Data Lake based on Azure Data Lake technology, taking into account modern DataOps concepts. The concept is to be included into the wiki documentation which is subject to approval by the client. The aim is that the concept and the implementation is fully documented in the wiki pages.
- The proposed design and changes to the design needs to be aligned with the Data Engineering CoE of the client (the Head of the Data Engineering CoE also acts as a Product Owner for this specific project)
- Independently integrate the new developed ingestion concept into the existing data landscape of the client’s Data Foundation, specifically considering the ingestion concept of the client’s DWH)
- Implementation of the new developed concept for ingestion and curation of data leveraging Azure Synapse Analytics (SPARK). Ingestion has aspects of both development and operations:
- From a development perspective, ingest pipelines or a logical connection between a source and multiple destinations need to be created. The Curated Zone created then contains curated data that stored in a data model, which combines like data from a variety of sources.
- The SCRUM team does not take over any operational tasks. The developed code and pipeline are handed over to the maintenance team. Corresponding hand over sessions are organized as part of the sprint review.
- Independently perform unit testing of CI/CD (Continuous Integration / Continuous Delivery) pipelines during implementation. During its development, the criteria need to be coded so that results that are known to be good verify the unit's correctness. During test case execution, frameworks log tests that fail any criterion need to be reported in a summary. The summary has to be reported as a result of the CI/CD pipeline.