● Design, train, and deploy ML models for OCR, NLP, and structured data
extraction from clinical source documents (PDFs, scanned forms, EHR
exports).
● Implement medical terminology recognition (SNOMED CT, MedDRA, LOINC).
● Develop algorithms for automated discrepancy detection between source data
and EDC.
● Build context-aware query generation modules for clinical data management.
● Collaborate with data engineers and backend developers to integrate ML pipelines
into production workflows.
Role Responsibilities
● Master’s/PhD in Computer Science, Data Science, or related field.
● Strong experience in LLMs, prompt engineering, and hyperparameter tuning.
● 3+ years of proven experience in ML/NLP, preferably in healthcare or clinical
data.
● Expertise in PyTorch/TensorFlow, Hugging Face transformers, and OCR
libraries (Tesseract, Amazon Textract).
● Knowledge of AWS Bedrock, SageMaker (experience with AgentCore is a
strong plus).
● Familiarity with HIPAA/GDPR and privacy-preserving ML practices.