Senior AI Data Engineer
🇽🇰 Pristina, Kosovo
EngineeringPythonSenior
Compare your skills with our requirements
We are seeking an AI Data Engineer to architect and scale intelligent data systems for powering a personalized venue and event recommendation platform. In this role, you will develop advanced ingestion and transformation pipelines that support multi-agent AI, large language model (LLM) enrichment, vector-based search, and real-time behavioral learning. You’ll work across structured and unstructured data domains including music, events, locations, and user preferences.
Must-have
- Bachelor’s or Master’s in Data Engineering, Computer Science, or related field.
- 3+ years of experience building production-grade data infrastructure for AI/ML systems.
- Deep proficiency in Python, SQL, and ETL frameworks (e.g., Airflow, dbt, Luigi).
- Experience with web scraping (Scrapy, Selenium, BeautifulSoup) and content normalization at scale.
- Strong familiarity with cloud-native data stacks (GCP BigQuery/Dataflow, AWS Redshift/Kinesis, etc.).
- Comfort working with LLMs (OpenAI, Hugging Face) for zero/few-shot data enrichment.
- Experience in real-time streaming systems like Apache Kafka, Apache Flink, Spark Streaming.
- Proficient in integrating RESTful and GraphQL APIs for third-party data aggregation.
Is a plus
- Prior experience building datasets for recommendation systems or user profiling.
- Familiarity with geospatial data formats and vector databases (e.g., PostGIS, Pinecone, Weaviate).
- Understanding of data governance, PII management, and regulatory compliance (e.g., GDPR).
- Exposure to agentic or multi-modal data ingestion pipelines (e.g., audio + image + text).
What you will do
- Develop web scraping systems to collect and parse structured and unstructured POI data.
- Integrate and unify third-party APIs and open-source geospatial data.
- Design and orchestrate ingestion pipelines applying venue type inference, tagging, and image parsing.
- Build a data hydration layer using LLM inference.
- Create scalable data pipelines and maintain real-time streams.
- Implement privacy-preserving ingestion.
- Adopt event-driven architectures.
- Collaborate with ML, MLOps, and Backend teams.