Senior Data Engineer
We are looking for a skilled Data Engineer with extensive experience in Databricks, specializing in developing efficient, idempotent, and high-quality data pipelines. The ideal candidate will be proficient in ingesting raw API data and utilizing the Medallion Architecture, with the ability to deliver results within a quick 2-4 hour turnaround time.
Project – is dedicated to tracking the flow of US government funds, from budgets to contracts, contractors, subcontractors, owners, and investors. Its primary goal is to uncover vulnerabilities, detect corruption, identify conflicts of interest, and expose potential foreign influence or espionage within government contracts.
Experience / Skills required:
Must have:
- 4+ years of experience as Data Engineer
- Databricks: Strong expertise in building and maintaining data pipelines within the Databricks environment, using Medallion Architecture for data ingestion and processing
- Python: Proficient in Python, with experience in building data APIs using frameworks such as FASTAPI or Django
- Docker: Experience with Docker for containerization and deployment
- CI/CD: Hands-on experience with GitHub Actions for Continuous Integration and Continuous Deployment
- Test-Driven Development (TDD): Strong knowledge of TDD, ensuring high-quality and reliable code
- REST APIs: Proficient in designing and consuming RESTful APIs for data exposure
- Authentication: Familiar with secure authentication methods for APIs
- Upper-intermediate level of English and better
Good to have:
- Experience with graph data structures and tools like Neo4j or NetworkX
- Knowledge of government contracting data (e.g., Sam.gov, USAspending.com)
- Experience in entity resolution, record linkage, ID stitching, or data matching
- Advanced web scraping skills, including proxy rotation techniques
- Interest or experience in using Large Language Models (LLMs) for data enrichment, semantic structuring, redaction, and Named Entity Recognition (NER)
Responsibilities:
- Design and implement data pipelines to ingest raw data from APIs, ensuring they are robust and quality-enforcing
- Develop data APIs using Python frameworks like FASTAPI or Django
- Deploy and maintain applications using Docker and manage CI/CD workflows with GitHub Actions
- Integrate new data sources into the system quickly to test for new insights
- Implement entity resolution and data matching processes
- Utilize advanced web scraping techniques to gather additional data sources
- Explore and apply LLMs in data pipelines for enrichment and analysis
We offer:
- Competitive salary with the regular review
- Medical Insurance after 3 months probation period (can be used in Ukraine)
- Vacation (up to 20 working days)
- Paid sick leaves (10 working days)
- National Holidays as paid time off (11 days)
- Online English courses
- Accountant assistance and legal support
- Flexible working schedule, remote, office-based or hybrid format
- Fully-equipped perfect office space located in the city center (ready for work in blackouts)
- Direct cooperation with the customer
- Dynamic environment with low level of bureaucracy and great team spirit
- Challenging projects in diverse business domains and a variety of tech stacks
- Communication with Top/Senior level specialists to strengthen your hard skills
- Online/offline teambuildings
- Volunteering culture development and support
Dear ,
Thank you for applying for the position at nCube. Your application has been successfully received and is currently under review by our recruitment team. We will be in touch soon to discuss your application further and to outline the next steps if your skills and expertise are a match with the requirements. In the meantime, feel free to browse our Company Blog for the latest updates and insights.
Looking forward to connecting with you soon!
Best regards,
nCube Recruitment Team