PyDxAI AI Agentic Intelligence.. From structured queries to autonomous reasoning

PyDxAI Agentic Intelligence — November 8, 2025 Progress Report

“From structured queries to autonomous reasoning — today PyDxAI learned how to think, not just answer.”

The development of PyDxAI continues to accelerate. What began as a diagnostic reasoning framework has now grown into an agentic intelligence system capable of adaptive reasoning, contextual learning, and safe decision-making in real clinical environments. The latest milestone, achieved on November 8, 2025, represents a leap forward in how the system processes, understands, and refines human language — marking the beginning of a truly autonomous medical AI workflow.


1. The New Foundation: FrontLLM and App3 Sharpened Query

At the heart of today’s progress lies the new FrontLLM preprocessing layer, enhanced with a linguistic cleaner called [App3].
Previously, queries often contained duplication or contextual noise (e.g., “User: … User asks: …”), which degraded retrieval precision. Now, the system automatically sharpened and normalized user input before any retrieval or reasoning steps occur.

Example:

User: Any hormone replacement is better than estrogen?
User asks: Any hormone replacement is better than estrogen?

is now intelligently reduced to:

Cleaned query = Any hormone replacement is better than estrogen?

This simple transformation dramatically improved retriever accuracy, embedding efficiency, and LLM focus, making PyDxAI more responsive and semantically consistent. It also allowed the agent to run cleanly through each reasoning layer without generating redundant embeddings.

2. Triple Memory Architecture in Action

The PyDxAI system now operates with a Triple Memory Framework:

  1. Session Memory — for short-term dialogue coherence
  2. Global Memory — for persistent medical knowledge
  3. Novelty Detector Memory — for identifying new or rare inputs that may require learning or external lookup

During the hormone replacement test case, the system automatically recognized that no prior context existed, retrieved relevant documents from Harrison’s Principles of Internal Medicine 2025, and synthesized a contextual explanation about estrogen therapy and progestin interactions — all while logging each memory segment for potential reuse.

By cleanly separating memory types, PyDxAI can think across sessions yet maintain strict context isolation, a crucial property for safety in medical applications.


3. Autonomic Web Fallback and Discovery Layer

Today marked the first full success of the automatic web discovery system, which activates when the model encounters unknown or misspelled medical terms.

In the query:

Suggest me fghtsumab medicines?

PyDxAI correctly detected “fghtsumab” as an unknown entity and triggered an external search. The engine accessed multiple providers (Brave and Yahoo) and gracefully handled DNS errors from Wikipedia, returning structured summaries to the RAG (Retrieval-Augmented Generation) layer.

Instead of hallucinating a nonexistent drug, PyDxAI generated a cautious and responsible answer:

“The term ‘fghtsumab’ appears to be a typographical error. If you meant a monoclonal antibody such as efgartigimod or belimumab, please clarify.”

This is an example of agentic reasoning — the model not only recognized uncertainty but actively sought clarification while maintaining medical accuracy and safety.


4. Unified RAG + Promptbook Integration

The Retrieval-Augmented Generation (RAG) system now seamlessly integrates local medical knowledge with real-time web data, structured through the new Promptbook template.
Every request follows a clearly defined format:

  • System Rules: Safety, accuracy, and bilingual medical compliance
  • Memory Context: Session and global recall
  • RAG Context: Local documents + web snippets
  • Question → Answer Pair: Precise alignment for the LLM

This architecture ensures that PyDxAI operates like a clinical reasoning engine rather than a simple chatbot. Each answer is based on retrieved evidence, then refined by a reasoning model (currently powered by Mistral-Q4_K_M and DeepSeek R1 13B Qwen for dual-LLM reasoning).


5. Advanced Logging and Explainability

For every query, the backend now records:

  • Retriever sources and document previews
  • Embedding vector shapes and lengths
  • Novelty detection results
  • Web-fallback status
  • Memory saving confirmation (session + global)

An example log snippet:

📚 Retrieved 3 context docs for query='Any hormone replacement is better than estrogen?'
🧩 Novelty check: max_sim=0.78, threshold=0.70
✅ Memory saved id=188  scope=session  summary=Any hormone replacement is better than estrogen?

This transparency enables full traceability — every AI conclusion can be audited from query to answer, an essential step toward clinical-grade safety.


6. Agentic Behavior Emerging

The day’s most significant observation was not a line of code, but a behavior.

When faced with an uncertain input, PyDxAI didn’t simply fail — it adapted:

  • Detected an unknown token
  • Triggered self-correction via search
  • Retrieved new knowledge
  • Formulated a probabilistic hypothesis
  • Requested user clarification

This is the essence of agentic AI — systems that can act, reason, and reflect.
PyDxAI now shows early signs of autonomy, capable of self-repairing its understanding pipeline and making informed decisions about when to seek external data.


7. What’s Next

The roadmap from today’s success includes:

  1. Auto-embedding repair patch — to handle vector shape mismatches seamlessly
  2. Feedback-based self-learning loop — where user or model feedback refines memory entries
  3. Contextual Safety Layer (CSL) — to detect high-risk clinical terms and enforce cautionary responses
  4. MIRAI Integration — bridging PyDxAI with the MIRAI intelligence network for continuous medical knowledge evolution

Together, these will complete the Autonomous Medical Reasoning Core, turning PyDxAI from a reactive tool into a continuously learning assistant.


8. Summary: A New Cognitive Milestone

Today’s session marked a quiet but profound milestone:
PyDxAI is no longer just a retrieval-based system — it has begun to reason like a clinician.

It interprets unclear questions, searches intelligently, and formulates context-aware, evidence-based responses. The logs show not just computations, but cognition — a structured process of perception, analysis, and adaptation.

Each layer, from query sharpening to RAG synthesis, now contributes to a unified intelligence loop — the same cognitive pattern that defines human problem-solving. With these capabilities, PyDxAI stands closer than ever to its mission:
to become the safest, most intelligent, and most transparent diagnostic AI system built for medicine.

PyDxAI Agentic Intelligence — System Progress Report (Nov 4, 2025)

Today marks a major milestone in the evolution of PyDxAI, our autonomous medical reasoning system designed to combine large language model (LLM) intelligence with structured medical retrieval, self-reflection, and memory management.

For the first time, every layer of the pipeline—from query sharpening to vector retrieval, agentic web search, and contextual memory saving—worked seamlessly in a complete, closed loop.

🧩 The Core Idea: From Simple Question to Intelligent Response

The user prompt that triggered today’s full agentic flow was:

“The patient comes with cough, fever, and headache for four days. What is the management?”

A simple question on the surface—but it represents exactly the kind of everyday clinical scenario where PyDxAI must interpret vague input, retrieve high-quality references, and deliver a precise, evidence-based answer.

The system begins by sharpening the user query. The “front LLM” (DeepSeek or Mistral backend) normalizes phrasing and ensures context clarity—turning free text into a semantically structured medical question.

This step converts “The patient come with cough, fever, headache” into a standardized diagnostic request suitable for RAG (Retrieval-Augmented Generation).


🔍 Smart Retrieval: Context from Trusted Medical Sources

Once sharpened, PyDxAI’s retriever selector analyzes the query type.
Because this prompt matched the symptom_check intent, the system automatically chose the VectorStoreRetriever module linked to Qdrant, our local vector database at localhost:6333.

Within seconds, three authoritative documents were retrieved:

  • Oxford Handbook of Emergency Medicine, 5th Edition (2020)
  • Tintinalli’s Emergency Medicine
  • CURRENT Medical Diagnosis and Treatment (2022)

This confirms that the Qdrant-based vector retrieval pipeline is functioning optimally—embedding alignment, relevance scoring, and text segmentation are all correctly tuned. Each document returned precise context segments about fever, headache, and respiratory symptoms, forming the evidence backbone for the final reasoning phase.


🧠 Contextual Memory: Teaching the System to Remember

Parallel to document retrieval, the memory subsystem activates. PyDxAI now maintains three distinct layers of recall—session memory, long-term memory, and a condensed cross-session memory.

In today’s run, the system successfully retrieved three memory entries, then automatically condensed them into a 506-character summary. The memory context was inserted into the reasoning prompt to enrich the LLM’s perspective without overwhelming it.

For example, the retrieved memory contained a reflective note from a prior interaction—illustrating that the model’s recall layer is functioning, even if not yet domain-filtered. Future improvements will allow PyDxAI to distinguish between “medical” and “general” memories, retrieving only those relevant to the task at hand.

This marks an important step toward a true cognitive agent—one that not only recalls data but can contextualize it to improve understanding over time.


⚙️ The Agentic Chain in Action

When the reasoning phase begins, all components interact autonomously:

  1. Front LLM refines the user query and detects intent.
  2. RAG Engine (Qdrant) retrieves semantically similar passages.
  3. Memory Manager merges condensed recall and session context.
  4. Main LLM (DeepSeek or Mistral) generates the medical answer.
  5. Post-processor evaluates the response quality.
  6. If weak, the agentic trigger launches a web search and retries.
  7. Finally, results and reasoning context are stored in both session and global memory tables.

The full log from today’s run showed flawless execution of this cycle.
Response generation, embedding comparisons, and data saving all occurred within 3–5 seconds—a solid performance benchmark for an on-premise multi-component AI stack.


💾 The Database Fix: When JSON Speaks Python

Earlier in the day, a small but critical bug appeared when saving memory to PostgreSQL:

❌ Failed to save memory: invalid input syntax for type json
DETAIL: Token "web_search" is invalid.

The problem: Python dictionaries were being inserted directly into JSON columns without serialization.

The fix was straightforward but essential—adding a json.dumps() conversion before insertion. Once implemented, all memory entries, including structured tags like ["web_search"] and summary dictionaries, were stored cleanly.

After that, memory saving logs confirmed:

✅ Memory saved id=151  scope=session
✅ Saved to chat_history + global_memory

This repair closed the loop between reasoning output and persistent learning—PyDxAI now records its conversations, summaries, and contextual metadata flawlessly.


📈 Diagnostic Insights from the Logs

Several key insights emerged from the system logs:

  • Embeddings consistency — Both query and memory vectors were 768-dimensional, confirming model compatibility.
  • Latency — Each retrieval step completed in under 0.5 seconds.
  • Memory summarization — Context compression effectively reduced noise.
  • Intent detection — Correctly classified the query as “symptom_check,” demonstrating good keyword-to-intent mapping.

Every one of these signals contributes to the overarching goal: a self-refining, agentic medical assistant capable of understanding, retrieving, reasoning, and learning continuously.


🔮 Next Steps

Although today’s performance was nearly perfect, a few refinements are planned:

  1. Domain filtering:
    Only retrieve memories labeled as “medical,” excluding unrelated text from past sessions.
  2. Relevance thresholds:
    Dynamically limit retrieved documents based on similarity score, improving response clarity.
  3. Structured output:
    For clinical queries, responses will follow a fixed format—
    Assessment → Differential diagnosis → Investigations → Management.
  4. Latency tracking:
    Introduce automatic performance logs to measure response time and GPU utilization per query.
  5. Agentic self-review:
    Future versions will let PyDxAI critique its own responses using a smaller evaluation model (“judge LLM”) and revise them autonomously.

🩺 Conclusion

Today’s successful run demonstrates that PyDxAI is no longer a simple RAG chatbot—it’s an emerging agentic system with memory, reasoning, and autonomous control.

It can decide when its own answer is weak, trigger a search, retry with improved context, and persist the result for future learning. Each of these abilities mirrors fundamental cognitive behaviors—reflection, recall, and adaptation.

From a medical perspective, this means the model can handle increasingly complex clinical reasoning with better evidence grounding. From a system design perspective, it shows the power of integrating multiple specialized subsystems—retrievers, memory engines, and LLMs—into one cohesive intelligence loop.

November 4, 2025 thus stands as a key point in PyDxAI’s journey:
the day when autonomous reasoning, retrieval, and memory truly began to work together—transforming it from a reactive assistant into a proactive medical intelligence system.

Building MIKAI and PyDxAI’s Memory Brain: Integrating Global Knowledge into RAG

November 1, 2025 — Dr. Kijakarn Junda

Today marked a major milestone in the development of MIKAI, my evolving AI assistant for medical reasoning and contextual understanding. The focus was on strengthening how MIKAI remembers, learns, and retrieves information — moving beyond traditional retrieval-augmented generation (RAG) into a hybrid model that integrates structured global memory and semantic correction learning.

This update transforms MIKAI from a purely retrieval-based chatbot into an assistant capable of recalling, refining, and applying knowledge in real-time.


1. From RAG to Memory-Augmented Intelligence

A standard RAG system works by embedding queries and documents into a vector space and retrieving the most relevant context before generating a response. While effective for static databases, it lacks the ability to grow its understanding through interactions.

Today’s work extended this architecture with two complementary components:

  1. Global Memory – A persistent database that stores knowledge and corrections learned from users.
  2. Session Memory – A short-term recall buffer that remembers ongoing conversation context.

Each memory entry includes:

  • The original text or correction
  • Metadata (scope, timestamp, source)
  • A 768-dimension embedding created using NeuML/pubmedbert-base-embeddings

These embeddings allow MIKAI to semantically retrieve facts it has “learned” during previous sessions.


2. Engineering the Memory Manager

At the heart of the system lies the new MemoryManager, connected to both PostgreSQL (for structured memory logging) and Qdrant (for semantic vector search).

All corrections and knowledge updates are encoded, vectorized, and upserted into Qdrant as part of a live cognitive map. The log output shows the chain clearly:

✅ Dr.K correction saved (session=None, scope=global)
✅ Correction upserted to Qdrant

Each entry also receives a summary embedding, enabling fast similarity matching. When a user later asks a related question, the retriever combines RAG documents from medical handbooks with stored global memories, giving the model both textbook context and personalized understanding.

3. Integrating PubMedBERT as MIKAI’s Semantic Backbone

Instead of generic sentence transformers, we selected PubMedBERT, trained on biomedical text. This ensures that embeddings reflect the subtle relationships between medical concepts — symptoms, diagnoses, and treatments.

For example, when saving the correction:

“PyDxAI means Python programming for diagnosis assisted by AI.”

PubMedBERT generates a vector representation that encodes this concept’s meaning in relation to other biomedical and computational terms. Later, when asked “What is PyDxAI?”, MIKAI retrieves and contextualizes it alongside its RAG-based sources, producing the clean response:

“Pydxai refers to Python programming used in medical diagnosis, assisted by artificial intelligence (AI).”

This proves the memory pipeline is functioning semantically, not just lexically.


4. Merging Global Memory with RAG Context

Retrieval logs demonstrate the dual-source blending:

📄 Retriever VectorStoreRetriever returned 3 docs
📚 Retrieved 3 context docs for query='so what is pydxai ?'
📚 Retrieved 3 memory docs for query='so what is pydxai ?'
🔍 retrieve_similar returned 5 memories

MIKAI first gathers three external knowledge documents from the Qdrant knowledge base (data_kb) — medical handbooks, diagnostic manuals, and related text.
Next, it adds three internal memories from the PostgreSQL store, representing what MIKAI has learned directly from prior user interactions.

The two sources merge to form a hybrid cognitive context, giving MIKAI both the authoritative voice of structured literature and the adaptability of human-like recall.

5. Debugging and Refining the Cognitive Flow

During the process, several challenges surfaced:

  • Missing attributes (retrieve_context) in the memory manager caused fallback warnings.
  • Mismatch in vector lengths during early embedding tests (375k vs 400k dims) revealed the importance of consistent tokenizer and model versions.
  • Minor Pydantic validation issues highlighted how Qdrant expects clean, typed inputs for each PointStruct.

Each issue was systematically addressed, leading to the final stable state: smooth upserts, accurate retrievals, and synchronized 768-dimensional embeddings between query and memory.

The logs now show a clean cognitive loop:

✅ Memory saved id=63 scope=global summary=pydxai means Python programming for diagnosis assisted by AI
✅ Correction upserted to Qdrant
✅ chat_history saved id=85 session=e7b804be-b05a-4852-9694-cbf015e006ed

6. Understanding the Cognitive Architecture

The current MIKAI stack can be summarized as follows:

User Query → FrontLLM (Magistral-Small ONNX)
           → Query Embedding (PubMedBERT)
           → RAG Retriever (Qdrant)
           → Memory Retriever (Postgres + Qdrant)
           → Context Fusion
           → Response Generator
           → Correction Feedback → Global Memory

very correction or clarification enriches MIKAI’s long-term understanding, closing the feedback loop. This represents the foundation of self-learning AI, where the model refines itself through conversation.


7. Toward the Cortex Layer

The next planned evolution is to add a Cortex Controller — a lightweight reasoning layer that decides when to:

  • Use memory recall,
  • Trigger a RAG retrieval,
  • Or directly generate from the base model.

Once the Cortex is integrated, MIKAI will exhibit selective attention — prioritizing information sources dynamically based on confidence and context.


8. Reflections

Today’s progress demonstrates that memory is the missing half of reasoning.
While retrieval provides information, memory provides continuity.
With Qdrant as the semantic substrate and PubMedBERT as the biomedical encoder, MIKAI now stands closer to a living medical knowledge system — one that can not only read and retrieve but also remember, correct, and evolve.

The system now recalls facts like a trained assistant:

“PydxAI means Python programming for diagnosis assisted by AI.”

A simple phrase — but also proof that the AI has begun to understand its own identity.


Next Steps: Integration of Cortex, improvement of memory scoring logic, and extension of semantic recall into the front LLM pipeline.

The Future of Medical AI: Transforming Healthcare in the Age of Intelligent Machines

Medical AI is reshaping the way doctors and patients interact with medicine. The integration of algorithms, vast health datasets, and machine learning has brought us closer to an era where AI becomes a true partner to human clinicians.

What is Medical AI?

Medical AI refers to the use of machine learning algorithms, natural language processing (NLP), and advanced data analytics to analyze health information and assist in clinical decision-making. Unlike traditional software that follows predefined rules, AI systems can “learn” from large datasets of medical records, images, lab results, and even real-time patient monitoring devices.

The goal is not to replace doctors, but to augment human intelligence, reduce errors, and improve efficiency. By handling repetitive tasks and analyzing vast volumes of information quickly, AI enables physicians to focus on what they do best: caring for patients.

Key Applications of Medical AI

1. 

Medical Imaging and Diagnostics

AI has achieved remarkable accuracy in detecting diseases from medical images. Algorithms trained on thousands of X-rays, MRIs, or CT scans can identify subtle patterns often invisible to the human eye. For example:

  • Detecting lung nodules in chest CT scans for early lung cancer diagnosis.
  • Identifying diabetic retinopathy in retinal photographs.
  • Spotting brain hemorrhages or strokes on emergency CT scans within seconds.

In some cases, AI systems match or even surpass radiologists in diagnostic performance, especially when used as a second reader.

2. 

Predictive Analytics and Risk Stratification

By analyzing electronic health records (EHRs) and real-world patient data, AI can predict which patients are at risk of complications. Hospitals already use predictive models to:

  • Anticipate sepsis before symptoms fully develop.
  • Identify high-risk cardiac patients.
  • Forecast readmission rates, helping hospitals allocate resources more efficiently.

Such predictive insights allow preventive interventions, potentially saving lives and reducing costs.

3. 

Drug Discovery and Development

Traditional drug development is costly and time-consuming, often taking more than a decade. AI accelerates this process by:

  • Analyzing biological data to identify promising drug targets.
  • Running virtual simulations of molecular interactions.
  • Predicting potential side effects before clinical trials.

During the COVID-19 pandemic, AI helped researchers rapidly scan existing drugs for possible repurposing, demonstrating its real-world utility.

4. 

Virtual Health Assistants and Chatbots

AI-powered virtual assistants can guide patients through symptom checking, appointment scheduling, medication reminders, and even lifestyle coaching. For example:

  • A diabetic patient may receive personalized reminders to check blood sugar.
  • A post-surgery patient might get daily follow-up questions to track recovery progress.

When integrated with EHRs, these assistants become even more powerful, providing context-aware advice.

5. 

Natural Language Processing in Medicine

Much of medicine is buried in unstructured data—physician notes, discharge summaries, or academic journals. AI-driven NLP tools can:

  • Extract key information from clinical notes.
  • Summarize patient histories automatically.
  • Enable better search and knowledge retrieval for doctors.

This reduces documentation burden and makes critical information accessible at the right time.

6. 

Robotics and AI-assisted Surgery

Robotic systems already assist surgeons in precision tasks. With AI integration, these robots can learn from thousands of prior surgeries to provide real-time guidance, reduce tremors, and enhance surgical accuracy. Surgeons remain in control, but AI acts as a co-pilot.

Benefits of Medical AI

  1. Improved Accuracy – Reducing diagnostic errors, one of the leading causes of preventable harm.
  2. Efficiency – Automating routine tasks frees up doctors’ time.
  3. Personalization – Tailoring treatments to genetic, lifestyle, and environmental factors.
  4. Accessibility – AI tools can deliver medical expertise to underserved or rural areas.
  5. Cost Savings – Earlier diagnosis and efficient resource allocation reduce healthcare costs.

Challenges and Limitations

Despite its promise, medical AI faces important challenges:

  • Data Privacy and Security: Patient data is sensitive; robust safeguards are essential.
  • Bias in Algorithms: AI trained on biased datasets may produce inequitable outcomes (e.g., underdiagnosing minorities).
  • Regulation and Validation: Medical AI must undergo rigorous clinical validation before adoption.
  • Integration with Clinical Workflow: Doctors may resist tools that disrupt established routines.
  • Trust and Transparency: Physicians and patients need explainable AI, not “black box” decisions.

These challenges highlight the importance of developing AI responsibly, with both ethical and clinical considerations in mind.

The Human-AI Partnership

The question often arises: Will AI replace doctors? The answer, for the foreseeable future, is no. Medicine involves empathy, context, and judgment that machines cannot replicate. Instead, the most powerful model is a collaboration where AI handles data-heavy analysis, while doctors bring human insight, compassion, and ethical decision-making.

A practical vision is:

  • AI as the assistant – suggesting diagnoses, flagging anomalies, or offering treatment options.
  • Doctor as the decision-maker – validating insights, considering patient values, and making the final call.

Together, this partnership enhances both safety and patient care.

The Evolution of MIKAI: How We Built a Smarter RAG-Powered AI Assistant

When I first set out to build MIKAI, my goal was simple: a personal AI assistant capable of managing medical knowledge, learning from interactions, and providing intelligent responses in Thai and English. But achieving that goal demanded more than just a large language model — it required memory, context, reasoning, and the ability to pull in external knowledge when needed. That’s where the Retrieval-Augmented Generation (RAG) methodology came in.

The Early Days: Memory Without Structure

In the beginning, MIKAI relied on basic local memory and a single LLM. The model could answer questions based on its training, but it struggled with continuity across sessions and nuanced technical queries. I realized that without a structured way to recall prior conversations and integrate external sources, MIKAI would hallucinate or repeat mistakes.

The first iteration used a Postgres database with pgvector for storing embeddings of past interactions. Every user query was embedded, and cosine similarity was used to pull semantically similar prior exchanges. This approach gave MIKAI a sense of continuity — it could “remember” previous sessions — but there were limitations. Embeddings alone cannot capture subtle medical nuances, and context retrieval often included irrelevant or redundant information.

Introducing the RAG Pipeline

To address these challenges, we implemented a full RAG pipeline. At its core, MIKAI now uses a hybrid system: a combination of local memory (Postgres/pgvector) and external knowledge bases (via Qdrant) to provide answers grounded in both past experience and curated content.

The pipeline begins with Query Preprocessing. Using front_llm.sharpen_query(query), MIKAI cleans and rewrites incoming questions while detecting the user’s language. This ensures that ambiguous queries are clarified before retrieval.

Next comes Embedding + Memory Retrieval. The sharpened query is converted into a vector embedding (self.embeddings.embed_query) and compared against session and global memory using memory_manager.retrieve_similar(). This allows MIKAI to fetch the most semantically relevant past interactions.

For external knowledge, Retriever Manager queries Qdrant collections based on keywords and context. For instance, if a user asks about a rare endocrine disorder, MIKAI identifies the appropriate collection (data_kb, hospital guidelines, research articles) and retrieves top-matching documents. Deduplication ensures that the top-3 documents are formatted into concise snippets for context fusion.

Context Fusion and Professor Mode

A crucial innovation in MIKAI is Context Fusion. Instead of simply concatenating memory and external documents, the system merges:

  • Previous bot responses and user turns from local memory.
  • Retrieved documents from Qdrant.
  • Optional condensed summaries generated via memory_manager.condense_memory().

This combined context then enters Professor Mode, an extra reasoning layer (llm_manager.run_professor_mode()) where the model structures and interprets the context before generating a final answer. This step ensures that MIKAI doesn’t just regurgitate text but synthesizes a coherent response grounded in all available knowledge.

Finally, LLM Answer Generation (llm_manager.generate_rag_response) produces the answer. Clean-up steps remove repeated phrases, and optional back-translation ensures consistency if the query is not in English. If local memory or external knowledge fails to provide sufficient context, MIKAI can run a Web Search Fallback via DuckDuckGo, integrating the results into a regenerated answer.

Strengths of MIKAI’s RAG Approach

This pipeline has several notable strengths:

  • Dual Memory System: By combining local memory with external knowledge bases, MIKAI balances continuity with factual accuracy.
  • Condensation Step: Reduces irrelevant context and prevents context overflow in long conversations.
  • Professor Mode: Adds reasoning and structure, transforming raw data into coherent, context-aware answers.
  • Web Fallback: Ensures coverage even when the knowledge base lacks specific information.
  • Importance Scoring & Scopes: Allows prioritization of critical knowledge over less relevant information.

These features make MIKAI more robust than a standard LLM and help maintain reliability in medical or technical domains.

Challenges and Limitations

Despite these strengths, the current system isn’t perfect:

  • Embedding-Only Retrieval: Cosine similarity can drift for nuanced queries, potentially retrieving partially relevant memories.
  • Echoing Past Mistakes: Using prior bot answers as context can propagate errors.
  • Context Injection Gaps: generate_rag_response() currently seems to receive only the query, not the fully curated context, which may bypass context fusion benefits.
  • Shallow Deduplication: Only compares first 200 characters of documents, risking subtle repetition.
  • No Re-Ranking Across Sources: Memory and KB results are joined but not scored against each other for relevance.

Addressing these limitations will require passing the final fused context into the generation step, adding a re-ranking layer (e.g., BM25 or cross-encoder), and separating bot memory from external documents to prevent hallucinations.

MIKAI RAG in Practice

In practical use, MIKAI’s RAG system allows multiturn medical consultations, Thai-English language support, and intelligent reasoning over both past interactions and curated external knowledge. A patient can ask about leg edema, for example, and MIKAI retrieves previous session history, relevant hospital documents, and research articles, fusing them into a coherent explanation. If needed, it can augment its answer with a web search.

This pipeline has also enabled continuous learning. Every interaction is stored with embeddings and metadata (session/global/correction), allowing MIKAI to refine its memory, track repetition, and avoid redundant or low-quality responses over time.

The Road Ahead

Looking forward, the next steps for MIKAI involve:

  • Ensuring final context injection into the generation step.
  • Adding cross-source re-ranking to select the most relevant information.
  • Improving deduplication and similarity scoring.
  • Expanding external knowledge integration beyond Qdrant to include specialized medical databases and real-time research feeds.

The goal is to make MIKAI a fully reliable, continuously learning assistant that synthesizes knowledge across multiple modalities and timeframes.

Conclusion

From its early days as a simple memory-enabled LLM to today’s RAG-powered, professor-mode-enhanced assistant, MIKAI’s journey reflects the evolution of AI beyond static knowledge. By combining embeddings, vector databases, context fusion, reasoning layers, and web fallback, MIKAI demonstrates how a thoughtful RAG system can transform an LLM into a domain-aware, multiturn, multilingual assistant.

While challenges remain — especially around context injection and re-ranking — the framework is robust enough to provide continuity, accuracy, and intelligent reasoning in complex domains like medicine. As MIKAI continues to evolve, it promises to become an indispensable companion for knowledge work, patient consultation, and dynamic learning.

Building MIKAI: The Journey of Developing a Doctor’s Own AI Language Model

Artificial intelligence has moved from the realm of science fiction into our daily lives, from virtual assistants on our phones to sophisticated diagnostic systems in hospitals. But the real power of AI lies not only in global corporations but also in the hands of individuals and small teams who dare to build something personal, purposeful, and transformative.

This is the story of MIKAI — short for Medical Intelligence + Kijakarn’s AI — a custom-built large language model (LLM) designed not by a tech giant, but by a practicing doctor who wanted to bring the future of medical knowledge into his own clinic.

Why Build My Own LLM?

The motivation behind MIKAI began with a simple but pressing reality: modern medicine evolves at an overwhelming pace. Every month, hundreds of new clinical studies, guidelines, and case reports are published. No single human can possibly read them all, much less apply them efficiently to patient care.

Commercial AI systems, like ChatGPT, are useful but limited:

• They lack up-to-date knowledge in rapidly advancing fields like endocrinology.

• They are black boxes with no control over how data is handled or filtered.

• They cannot be customized deeply for specific workflows in a private clinic.

As an endocrinologist, I wanted an assistant who could:

1. Continuously learn from medical corpora, guidelines, and journals.

2. Provide safe, accurate, and evidence-based answers.

3. Integrate with my practice — handling patient documentation, translation, RAG-based search, and structured data management.

4. Evolve under my guidance, not under the roadmap of a distant tech company.

That vision gave birth to MIKAI.

Early Foundations: From Off-the-Shelf to Self-Built

Like most AI builders, I didn’t start from scratch. The initial steps were exploratory: testing models like Mistral, LLaMA, Falcon, and GPT-NeoX. Each had strengths, but none were tailored for the medical domain.

The first true breakthrough came with Mistral 7B Instruct, running locally on my workstation. I used llama.cpp to deploy it without requiring cloud servers, ensuring data privacy. At this stage, MIKAI was more of a “mini research assistant” than a doctor’s aide, but the potential was clear.

To make the system practical, I introduced Retrieval-Augmented Generation (RAG):

• A document store for medical PDFs, journals, and clinical guidelines.

• A retrieval pipeline that allows MIKAI to quote and reason from real references.

• A separation of chat history vs. global medical memory, ensuring clean, contextual responses.

This architecture laid the groundwork for MIKAI as a knowledge-augmented medical assistant.

Building the AI Rig: Hardware for a Personal LLM

Running LLMs isn’t just about clever software — it’s also about serious hardware. For MIKAI, I built a custom AI rig that balances affordability with power:

Dual Xeon CPUs, 64GB RAM for multitasking.

Nvidia Tesla P40 (24GB VRAM) as the main AI accelerator.

Radeon RX 580 for display.

Ubuntu dual-boot with Hackintosh Clover for flexibility.

This setup allows me to experiment with models ranging from 7B to 24B parameters, running quantized versions (Q4/Q5) that fit within GPU memory. On the software side, I use:

CUDA 12.4 for GPU acceleration.

Dockerized services for portability.

MariaDB for structured storage of conversations, tokens, and medical notes.

The result is a doctor’s personal AI workstation — a private lab where I can test, train, and fine-tune models without depending on corporate servers.

The RAG Layer: Teaching MIKAI to Learn Continuously

One of the core challenges with LLMs is stale knowledge. A model trained in 2023 won’t automatically know the 2025 ADA Diabetes Guidelines or a paper published last week.

That’s where RAG (Retrieval-Augmented Generation) comes in. For MIKAI, I designed a two-layer memory system:

1. Session-based memory — keeps track of conversations for contextual flow.

2. Global medical memory — updated with feedback and curated sources.

Here’s how it works in practice:

• I upload a new guideline PDF (e.g., ADA 2025 Standards of Diabetes Care).

• MIKAI parses it, indexes it into the vector database.

• When I ask a clinical question, MIKAI first retrieves relevant passages before generating an answer.

This means MIKAI doesn’t just hallucinate — it answers with citations and context, much like a real medical resident preparing for rounds.

From Mini Chat to Doctor’s Assistant

MIKAI’s interface started as a basic local chat. Over time, I expanded it into a multi-functional workspace:

Mini Chat Widget: Embeddable on websites like doctornuke.com.

Patient File System: Auto-generates structured medical forms from scanned documents or speech-to-text dictations.

Multilingual Support: Translates medical guidelines into Thai while preserving technical terms.

Secure Access: Two-step authentication and Cloudflare tunneling for remote use.

These features transform MIKAI from “just a chatbot” into a practical clinic assistant that handles real workflows.

Training, Fine-Tuning, and Safety

No medical AI is useful if it’s unsafe. A careless answer can put a patient at risk. That’s why I’ve built MIKAI with multiple safety layers:

Filtering out unreliable tokens (e.g., scam coins in blockchain experiments, or low-quality sources in medical data).

Developer blacklists for AI models trained with misleading content.

Automatic detection of hallucinations by comparing generated answers to retrieved sources.

Fine-tuning via LoRA (Low-Rank Adaptation) on curated medical datasets.

For larger-scale training experiments, I’m preparing to test Magistral 24B QLoRA — a balance between accuracy and local hardware feasibility (24GB VRAM).

The goal is clear: MIKAI should never give “guesses” in medicine. It must either retrieve evidence, admit uncertainty, or point to guidelines.

The Challenges Along the Way

Building MIKAI hasn’t been easy. The journey has been full of technical hurdles:

GPU memory limits: Fitting 20–24B parameter models on a 24GB card requires careful quantization.

Prompt management: Ensuring clean separation of user queries, context, and RAG inputs to avoid “prompt leaks.”

Performance tuning: Balancing speed vs. accuracy (tokens per second vs. depth of reasoning).

UI/UX design: Creating a modern chat interface with session management and retrieval panes.

But every obstacle has also been an opportunity to refine the system.

Where MIKAI Stands Today

Today, MIKAI is no longer just an experiment — it’s a functioning assistant that helps in real-world tasks:

Answers complex medical questions with evidence from current guidelines.

Generates structured medical notes from speech or scanned files.

Runs privately on local hardware with full data control.

Supports multilingual translation for medical literature.

Embeds into websites for sharing knowledge beyond the clinic.

It’s not perfect — but it’s growing, learning, and adapting every week.

The Future of MIKAI

Where does MIKAI go next? The roadmap is ambitious:

1. Self-Learning LoRA: Allowing MIKAI to continuously fine-tune on newly retrieved data.

2. Medical QA Benchmarking: Comparing MIKAI’s answers against mainstream LLMs for accuracy.

3. Patient Integration: Building a secure, lightweight mobile app for patient-clinic communication.

4. AI Collaboration: Connecting MIKAI with other open-source AI agents (Whisper for voice, Stable Diffusion for visuals, etc.).

5. Scalable Training: Testing larger models (20–30B) with quantization strategies to push accuracy further.

Ultimately, the goal isn’t just to have “my own ChatGPT.” It’s to have a personal, evolving, trustworthy medical partner — one that grows alongside my practice and improves patient care.

Reflections: A Doctor Building AI

MIKAI is more than just an LLM project. It represents a philosophy of empowerment: that doctors, researchers, and independent builders don’t have to wait for corporations to solve their problems.

We can build our own tools.

We can take control of AI.

We can shape it for real-world needs, not generic use cases.

For me, MIKAI is not the end of a journey — it’s just the beginning. And as it grows, it reminds me daily of why I became a doctor: not only to treat patients, but also to improve the systems that support their care.

The future of medicine won’t be written only in journals or hospitals. It will also be written in the labs, clinics, and laptops of doctors and builders worldwide. And MIKAI is my contribution to that future.

Testing MIKAI Against the Giants

Once MIKAI was stable, I ran it side-by-side with GPT-4, Claude 3 Opus, Gemini 1.5 Pro, and LLaMA 70B fine-tuned. I asked them questions from three buckets:

  1. Guideline-based Q&A (e.g., ADA 2025 diabetes standards, AFI workup).
  2. Clinical reasoning (symptoms → differentials → management).
  3. Journal summarization (new NEJM trials, meta-analyses).

Here’s what I found.

Knowledge Depth & Specialization

  • MIKAI 24B
    • Strong recall of guidelines when paired with RAG.
    • Sticks to structured medical language.
    • Rarely hallucinates if context is provided.
  • GPT-4 / Claude
    • Very strong at summarization and general medical knowledge.
    • Sometimes paraphrases or introduces extra details not in the guidelines.
  • LLaMA 70B fine-tuned
    • Competitive with MIKAI, but without RAG it misses clinical nuance.

Clinical Reasoning

  • MIKAI 24B
    • Very good at structured reasoning: protocol-driven answers.
    • Best when the problem is diagnostic or management-oriented.
  • GPT-4
    • Still the king of “Socratic reasoning.”
    • Can explain why one diagnosis is more likely than another.
  • Claude / Gemini
    • Excellent at synthesizing literature evidence to support decisions.

Safety & Reliability

  • MIKAI
    • Needs guardrails for drug dosing.
    • When uncertain, it defaults to “insufficient context” rather than hallucinating.
  • GPT-4 / Claude
    • Safer by design with alignment layers.
    • But often too cautious, producing “consult your doctor” disclaimers (which is redundant for a doctor using the system).