Skip to main content

Command Palette

Search for a command to run...

Training an Embedding Encoder for Medical Data

Fine-tune an encoder model (MPNet) on a small medical dataset.

Updated
2 min read

Code: Github

1️⃣ Why Train a Medical Encoder?

In Retrieval-Augmented Generation (RAG) or semantic-search systems, an embedding encoder converts text into numerical vectors.
Texts with similar meaning should produce close vectors in this semantic space.

In the medical domain, this alignment is especially important:

  • Medical terms have many synonyms or abbreviations (e.g., “Hypertension” ≈ “High blood pressure”)

  • General encoders often miss clinical context

  • High precision is crucial for QA or document retrieval

We will fine-tune sentence-transformers/all-mpnet-base-v2 into a domain-aware encoder — MPNet-medical — using a small medical dataset.

2️⃣ Experiment Setup

ItemDescription
Base modelsentence-transformers/all-mpnet-base-v2
FrameworkSentence-Transformers (PyTorch)
DatasetSmall medical dataset(sentence-transformers/stsb)
HardwareSingle GPU (16 GB VRAM sufficient)(https://www.inference.ai/)

Training Phases:

  1. Sentence similarity learning → CosineSimilarityLoss

  2. QA retrieval learning → MultipleNegativesRankingLoss

  3. (Optional) Hard negative discrimination → TripletLoss

3️⃣ Data Formats

ExampleLossUse CaseAdvantagesDrawbacks
(sent1, sent2, score)“Patient has high blood pressure.” vs “The patient suffers from hypertension.”, score = 0.95CosineSimilarityLossSentence similarityContinuous supervisionRequires labeled scores
(query, pos)“What are the symptoms of diabetes?” → “Common symptoms include polyuria, thirst, and weight loss.”MultipleNegativesRankingLossQA or retrievalAutomatic negatives (batch-wise)No explicit hard negatives
(query, pos, neg)“Signs of myocardial infarction?” → pos: “Chest pain and sweating.”, neg: “Angina pain is brief.”TripletLossHard-negative discriminationImproves fine-grained accuracyRequires curated negatives

4️⃣ Loss Function Overview

LossObjectiveIntuitionWhen to Use
CosineSimilarityLossMinimize (pred_cos − label)²Align predicted similarity with human labelsWhen sentence-level similarity scores exist
MultipleNegativesRankingLossRank true pairs higher than in-batch negativesLearn retrieval relationshipsQA or semantic search tasks
TripletLossmax(0, margin + d(q,pos) − d(q,neg))Keep positives closer than negativesFor hard-negative fine-tuning

5️⃣ Results and Visualization

After fine-tuning, we compare Base (blue) vs Fine-tuned (orange) encoders on two tasks:

  • Sentence-Pair Spearman: Correlation between predicted and labeled similarity scores.

  • Retrieval Metrics (Recall@5 / nDCG@10): Ability to rank relevant passages correctly.

  1. [Base] Sentence-pair Spearman: 0.9038

  2. [Fine-tuned] Sentence-pair Spearman: 0.9431

  3. [FT+Triplet] Sentence-pair Spearman: 0.9431