Website TeamPlus
Key Responsibilities:
Design and architect robust data pipelines for structured, semi-structured, and
unstructured data.
Develop, manage, and optimize databases, including RDMS (MySQL,
PostgreSQL), NoSQL (MongoDB), and data lakes (S3).
Implement efficient ETL processes using tools like PySpark and Hadoop to
transform and prepare data for analytics and AI use cases.
Optimize database performance, including query tuning, indexing, and caching
strategies using tools like Redis and AWS-specific caching databases.
Build and maintain CI/CD pipelines, manage YML files, and use GitHub for
version control and collaboration.
Leverage Docker for containerized deployment, with hands-on experience in
running Docker commands for database and pipeline management.
Ensure solutions adhere to best practices in system design, focusing on trade-
offs, security, performance, and efficiency.
• Monitor, maintain, and troubleshoot database infrastructure to ensure high
availability and performance.
Collaborate with engineering teams to design scalable solutions for large-scale
data processing.
Stay updated on the latest database technologies and implement best practices
for database design and management.
Skills
4+ years of experience in database architecture and optimization.
Expertise in RDMS, NoSQL, and semi-structured databases (MySQL,
PostgreSQL, MongoDB).
Proficiency in programming languages for database integration and
optimization (Python preferred).
Strong knowledge of distributed data processing tools like PySpark and
Hadoop.
Hands-on experience with AWS services for data storage and processing,
including S3.
Strong familiarity with Redis for caching and query optimization.
Proven experience with Docker for containerized deployments and writing
CI/CD pipelines using YML files.