Senior Data Engineer

We are looking for a skilled Data Engineer with extensive experience in Databricks, specializing in developing efficient, idempotent, and high-quality data pipelines. The ideal candidate will be proficient in ingesting raw API data and utilizing the Medallion Architecture, with the ability to deliver results within a quick 2-4 hour turnaround time.

Project – is dedicated to tracking the flow of US government funds, from budgets to contracts, contractors, subcontractors, owners, and investors. Its primary goal is to uncover vulnerabilities, detect corruption, identify conflicts of interest, and expose potential foreign influence or espionage within government contracts.

Experience / Skills required:

Must have:

4+ years of experience as Data Engineer
Databricks: Strong expertise in building and maintaining data pipelines within the Databricks environment, using Medallion Architecture for data ingestion and processing
Python: Proficient in Python, with experience in building data APIs using frameworks such as FASTAPI or Django
Docker: Experience with Docker for containerization and deployment
CI/CD: Hands-on experience with GitHub Actions for Continuous Integration and Continuous Deployment
Test-Driven Development (TDD): Strong knowledge of TDD, ensuring high-quality and reliable code
REST APIs: Proficient in designing and consuming RESTful APIs for data exposure
Authentication: Familiar with secure authentication methods for APIs
Upper-intermediate level of English and better

Good to have:

Experience with graph data structures and tools like Neo4j or NetworkX
Knowledge of government contracting data (e.g., Sam.gov, USAspending.com)
Experience in entity resolution, record linkage, ID stitching, or data matching
Advanced web scraping skills, including proxy rotation techniques
Interest or experience in using Large Language Models (LLMs) for data enrichment, semantic structuring, redaction, and Named Entity Recognition (NER)

Responsibilities:

Design and implement data pipelines to ingest raw data from APIs, ensuring they are robust and quality-enforcing
Develop data APIs using Python frameworks like FASTAPI or Django
Deploy and maintain applications using Docker and manage CI/CD workflows with GitHub Actions
Integrate new data sources into the system quickly to test for new insights
Implement entity resolution and data matching processes
Utilize advanced web scraping techniques to gather additional data sources
Explore and apply LLMs in data pipelines for enrichment and analysis

We offer:

Competitive salary with the regular review
Medical Insurance after 3 months probation period (can be used in Ukraine)
Vacation (up to 20 working days)
Paid sick leaves (10 working days)
National Holidays as paid time off (11 days)
Online English courses
Accountant assistance and legal support
Flexible working schedule, remote, office-based or hybrid format
Fully-equipped perfect office space located in the city center (ready for work in blackouts)
Direct cooperation with the customer
Dynamic environment with low level of bureaucracy and great team spirit
Challenging projects in diverse business domains and a variety of tech stacks
Communication with Top/Senior level specialists to strengthen your hard skills
Online/offline teambuildings
Volunteering culture development and support

Send CV