November 1, 2025 — Dr. Kijakarn Junda
Today marked a major milestone in the development of MIKAI, my evolving AI assistant for medical reasoning and contextual understanding. The focus was on strengthening how MIKAI remembers, learns, and retrieves information — moving beyond traditional retrieval-augmented generation (RAG) into a hybrid model that integrates structured global memory and semantic correction learning.
This update transforms MIKAI from a purely retrieval-based chatbot into an assistant capable of recalling, refining, and applying knowledge in real-time.
1. From RAG to Memory-Augmented Intelligence
A standard RAG system works by embedding queries and documents into a vector space and retrieving the most relevant context before generating a response. While effective for static databases, it lacks the ability to grow its understanding through interactions.
Today’s work extended this architecture with two complementary components:
- Global Memory – A persistent database that stores knowledge and corrections learned from users.
- Session Memory – A short-term recall buffer that remembers ongoing conversation context.
Each memory entry includes:
- The original text or correction
- Metadata (scope, timestamp, source)
- A 768-dimension embedding created using
NeuML/pubmedbert-base-embeddings
These embeddings allow MIKAI to semantically retrieve facts it has “learned” during previous sessions.
2. Engineering the Memory Manager
At the heart of the system lies the new MemoryManager, connected to both PostgreSQL (for structured memory logging) and Qdrant (for semantic vector search).
All corrections and knowledge updates are encoded, vectorized, and upserted into Qdrant as part of a live cognitive map. The log output shows the chain clearly:
✅ Dr.K correction saved (session=None, scope=global)
✅ Correction upserted to Qdrant
Each entry also receives a summary embedding, enabling fast similarity matching. When a user later asks a related question, the retriever combines RAG documents from medical handbooks with stored global memories, giving the model both textbook context and personalized understanding.
3. Integrating PubMedBERT as MIKAI’s Semantic Backbone
Instead of generic sentence transformers, we selected PubMedBERT, trained on biomedical text. This ensures that embeddings reflect the subtle relationships between medical concepts — symptoms, diagnoses, and treatments.
For example, when saving the correction:
“PyDxAI means Python programming for diagnosis assisted by AI.”
PubMedBERT generates a vector representation that encodes this concept’s meaning in relation to other biomedical and computational terms. Later, when asked “What is PyDxAI?”, MIKAI retrieves and contextualizes it alongside its RAG-based sources, producing the clean response:
“Pydxai refers to Python programming used in medical diagnosis, assisted by artificial intelligence (AI).”
This proves the memory pipeline is functioning semantically, not just lexically.
4. Merging Global Memory with RAG Context
Retrieval logs demonstrate the dual-source blending:
📄 Retriever VectorStoreRetriever returned 3 docs
📚 Retrieved 3 context docs for query='so what is pydxai ?'
📚 Retrieved 3 memory docs for query='so what is pydxai ?'
🔍 retrieve_similar returned 5 memories
MIKAI first gathers three external knowledge documents from the Qdrant knowledge base (data_kb) — medical handbooks, diagnostic manuals, and related text.
Next, it adds three internal memories from the PostgreSQL store, representing what MIKAI has learned directly from prior user interactions.
The two sources merge to form a hybrid cognitive context, giving MIKAI both the authoritative voice of structured literature and the adaptability of human-like recall.
5. Debugging and Refining the Cognitive Flow
During the process, several challenges surfaced:
- Missing attributes (
retrieve_context) in the memory manager caused fallback warnings. - Mismatch in vector lengths during early embedding tests (375k vs 400k dims) revealed the importance of consistent tokenizer and model versions.
- Minor Pydantic validation issues highlighted how Qdrant expects clean, typed inputs for each
PointStruct.
Each issue was systematically addressed, leading to the final stable state: smooth upserts, accurate retrievals, and synchronized 768-dimensional embeddings between query and memory.
The logs now show a clean cognitive loop:
✅ Memory saved id=63 scope=global summary=pydxai means Python programming for diagnosis assisted by AI
✅ Correction upserted to Qdrant
✅ chat_history saved id=85 session=e7b804be-b05a-4852-9694-cbf015e006ed
6. Understanding the Cognitive Architecture
The current MIKAI stack can be summarized as follows:
User Query → FrontLLM (Magistral-Small ONNX)
→ Query Embedding (PubMedBERT)
→ RAG Retriever (Qdrant)
→ Memory Retriever (Postgres + Qdrant)
→ Context Fusion
→ Response Generator
→ Correction Feedback → Global Memory
very correction or clarification enriches MIKAI’s long-term understanding, closing the feedback loop. This represents the foundation of self-learning AI, where the model refines itself through conversation.
7. Toward the Cortex Layer
The next planned evolution is to add a Cortex Controller — a lightweight reasoning layer that decides when to:
- Use memory recall,
- Trigger a RAG retrieval,
- Or directly generate from the base model.
Once the Cortex is integrated, MIKAI will exhibit selective attention — prioritizing information sources dynamically based on confidence and context.
8. Reflections
Today’s progress demonstrates that memory is the missing half of reasoning.
While retrieval provides information, memory provides continuity.
With Qdrant as the semantic substrate and PubMedBERT as the biomedical encoder, MIKAI now stands closer to a living medical knowledge system — one that can not only read and retrieve but also remember, correct, and evolve.
The system now recalls facts like a trained assistant:
“PydxAI means Python programming for diagnosis assisted by AI.”
A simple phrase — but also proof that the AI has begun to understand its own identity.
Next Steps: Integration of Cortex, improvement of memory scoring logic, and extension of semantic recall into the front LLM pipeline.

A web newbie since 1996