AI landscape and models 2025

AI landscape and models 2025 ,this diagram is a comprehensive 2025 AI model landscape map (originally compiled by Sebastian Raschka). It illustrates how different neural architecture families evolved from transformer-based LLMs into emerging hybrid, recursive, and state-space models (SSMs) — representing the major trends in AI foundation model design from 2021–2025.

1. Decoder-Style Transformers (Top Red Box)

Core lineage: GPT → OLMo → Mistral → LLaMA → DeepSeek → Qwen → Gemma → Kimi → GLM → MiniMax → SmolLM

These represent the mainstream path of modern LLMs based on the decoder-only transformer architecture introduced by GPT in 2018.
Each successive model improves efficiency, reasoning, and multilingual performance.

Key members (chronological order)

  • GPT (2018): Origin of the transformer-based LLM era.
  • OLMo 2 (2024): Open Language Model project emphasizing transparency and open training data.
  • Mistral 3.1 / LLaMA 4 (2025): Highly efficient open-source models optimized for long-context and high reasoning.
  • Gemma 3 / SmolLM 3 (2025): Google and Hugging Face’s lightweight models for edge devices.
  • DeepSeek V3 / R1 (2025): Advanced reasoning LLM with multi-stage training; emphasizes efficiency and reasoning alignment.
  • Qwen 3 (2025): Alibaba’s high-performing multilingual model.
  • Kimi K2 (2025): Focused on bilingual and code reasoning.
  • GLM 4.6 / MiniMax-M2 (2025): Chinese-origin models optimized for dialogue and domain adaptability.

These remain decoder-style transformers — single-direction, attention-based architectures that dominate commercial and open-source ecosystems.

2. Attention Hybrids (Purple Branch, 2025)

Emerging 2025 trend blending transformers with new attention or memory mechanisms for improved long-context reasoning.

Includes:

  • MiniMax-M1
  • Qwen3-Next
  • Kimi Linear
  • DeepSeek V3.2-Exp

These use techniques like:

  • Linear attention / rotary attention optimizations
  • Dynamic context compression
  • Mixture-of-Experts + memory caching

Goal: preserve transformer flexibility while reducing quadratic attention cost and improving interpretability.

State Space Models (Left Pink Branch)

The alternative to attention-based transformers, focusing on sequential processing rather than global attention.

  • S4 (2021): Original Structured State Space Sequence model — efficient for very long sequences.
  • Mamba (2023): Introduced selective state-space updates, enabling longer memory with transformer-like performance but lower cost.

→ Transformer–SSM Hybrids (2024–2025)

Bridging SSMs and transformers:

  • Jamba (2024): Combines Mamba + attention.
  • Samba (2024): Lightweight open model using hybrid mechanisms.
  • Hunyuan-T1, Nemotron Nano 2, IBM Granite 4.0 (2025): Advanced hybrids integrating memory and recurrent capabilities.

These hybridize attention and state-space updates to handle longer context, streaming data, and structured reasoning.


4. Transformer–RNN Hybrids

Recurrent-like LLMs re-emerge:

  • RWKV (2023): Replaces self-attention with time-mixed recurrence — transformer-level quality at lower compute cost.
  • RWKV-7 (2025): Adds gating and fine-tuned recurrence to support multimodal reasoning.

Goal: transformer power with RNN efficiency — better for edge AI and real-time systems.

5. Liquid Foundation Models (2024–2025)

  • LFM 1 (2024) and LFM MoE (2025)
    Use continuous-time dynamics (inspired by liquid neural networks).
    These adaptively change structure based on input flow — promising for robotics, embodied agents, and self-organizing AI.

6. World Models (2025)

  • Code World Mode (2025):
    Integrates symbolic reasoning and world simulation — essential for autonomous agents that simulate possible futures and outcomes.

7. Text Diffusion Models

Extending diffusion architectures (originally for images) into text generation and reasoning:

  • DiffuSeq (2022)
  • LLaDa (2025)
  • Dream 7B (2025)

These generate text through iterative denoising steps rather than direct token prediction — potentially more controllable and less biased.

8. LSTMs and Recursive Models

xLSTM (2024):

Next-gen long short-term memory architecture, merging recurrence with transformer efficiency — suited for small and adaptive models.

Small Recursive Transformers (2025):

A new family featuring:

  • Hierarchical reasoning
  • Mixture of recursions
  • Tiny reasoning submodels

These aim to mimic meta-cognition — reasoning about reasoning — with lightweight, modular recursion layers.

9. General Trends Illustrated

  • 2018–2023: Transformer dominance (GPT to Mistral)
  • 2024–2025: Diversification — hybrids, state-space, diffusion, recursive reasoning
  • Goal: Move beyond static token prediction → structured, efficient, and agentic reasoning

Summary Insight

This map shows how the AI field is transitioning from single-architecture LLMs to a diversified, hybrid ecosystem.
The direction aligns closely with PyDxAI’s design philosophy — modular intelligence built on top of static LLMs but enhanced through memory, reasoning, and dynamic learning, not retraining.

PyDxAI AI Agentic Intelligence.. From structured queries to autonomous reasoning

PyDxAI Agentic Intelligence — November 8, 2025 Progress Report

“From structured queries to autonomous reasoning — today PyDxAI learned how to think, not just answer.”

The development of PyDxAI continues to accelerate. What began as a diagnostic reasoning framework has now grown into an agentic intelligence system capable of adaptive reasoning, contextual learning, and safe decision-making in real clinical environments. The latest milestone, achieved on November 8, 2025, represents a leap forward in how the system processes, understands, and refines human language — marking the beginning of a truly autonomous medical AI workflow.


1. The New Foundation: FrontLLM and App3 Sharpened Query

At the heart of today’s progress lies the new FrontLLM preprocessing layer, enhanced with a linguistic cleaner called [App3].
Previously, queries often contained duplication or contextual noise (e.g., “User: … User asks: …”), which degraded retrieval precision. Now, the system automatically sharpened and normalized user input before any retrieval or reasoning steps occur.

Example:

User: Any hormone replacement is better than estrogen?
User asks: Any hormone replacement is better than estrogen?

is now intelligently reduced to:

Cleaned query = Any hormone replacement is better than estrogen?

This simple transformation dramatically improved retriever accuracy, embedding efficiency, and LLM focus, making PyDxAI more responsive and semantically consistent. It also allowed the agent to run cleanly through each reasoning layer without generating redundant embeddings.

2. Triple Memory Architecture in Action

The PyDxAI system now operates with a Triple Memory Framework:

  1. Session Memory — for short-term dialogue coherence
  2. Global Memory — for persistent medical knowledge
  3. Novelty Detector Memory — for identifying new or rare inputs that may require learning or external lookup

During the hormone replacement test case, the system automatically recognized that no prior context existed, retrieved relevant documents from Harrison’s Principles of Internal Medicine 2025, and synthesized a contextual explanation about estrogen therapy and progestin interactions — all while logging each memory segment for potential reuse.

By cleanly separating memory types, PyDxAI can think across sessions yet maintain strict context isolation, a crucial property for safety in medical applications.


3. Autonomic Web Fallback and Discovery Layer

Today marked the first full success of the automatic web discovery system, which activates when the model encounters unknown or misspelled medical terms.

In the query:

Suggest me fghtsumab medicines?

PyDxAI correctly detected “fghtsumab” as an unknown entity and triggered an external search. The engine accessed multiple providers (Brave and Yahoo) and gracefully handled DNS errors from Wikipedia, returning structured summaries to the RAG (Retrieval-Augmented Generation) layer.

Instead of hallucinating a nonexistent drug, PyDxAI generated a cautious and responsible answer:

“The term ‘fghtsumab’ appears to be a typographical error. If you meant a monoclonal antibody such as efgartigimod or belimumab, please clarify.”

This is an example of agentic reasoning — the model not only recognized uncertainty but actively sought clarification while maintaining medical accuracy and safety.


4. Unified RAG + Promptbook Integration

The Retrieval-Augmented Generation (RAG) system now seamlessly integrates local medical knowledge with real-time web data, structured through the new Promptbook template.
Every request follows a clearly defined format:

  • System Rules: Safety, accuracy, and bilingual medical compliance
  • Memory Context: Session and global recall
  • RAG Context: Local documents + web snippets
  • Question → Answer Pair: Precise alignment for the LLM

This architecture ensures that PyDxAI operates like a clinical reasoning engine rather than a simple chatbot. Each answer is based on retrieved evidence, then refined by a reasoning model (currently powered by Mistral-Q4_K_M and DeepSeek R1 13B Qwen for dual-LLM reasoning).


5. Advanced Logging and Explainability

For every query, the backend now records:

  • Retriever sources and document previews
  • Embedding vector shapes and lengths
  • Novelty detection results
  • Web-fallback status
  • Memory saving confirmation (session + global)

An example log snippet:

📚 Retrieved 3 context docs for query='Any hormone replacement is better than estrogen?'
🧩 Novelty check: max_sim=0.78, threshold=0.70
✅ Memory saved id=188  scope=session  summary=Any hormone replacement is better than estrogen?

This transparency enables full traceability — every AI conclusion can be audited from query to answer, an essential step toward clinical-grade safety.


6. Agentic Behavior Emerging

The day’s most significant observation was not a line of code, but a behavior.

When faced with an uncertain input, PyDxAI didn’t simply fail — it adapted:

  • Detected an unknown token
  • Triggered self-correction via search
  • Retrieved new knowledge
  • Formulated a probabilistic hypothesis
  • Requested user clarification

This is the essence of agentic AI — systems that can act, reason, and reflect.
PyDxAI now shows early signs of autonomy, capable of self-repairing its understanding pipeline and making informed decisions about when to seek external data.


7. What’s Next

The roadmap from today’s success includes:

  1. Auto-embedding repair patch — to handle vector shape mismatches seamlessly
  2. Feedback-based self-learning loop — where user or model feedback refines memory entries
  3. Contextual Safety Layer (CSL) — to detect high-risk clinical terms and enforce cautionary responses
  4. MIRAI Integration — bridging PyDxAI with the MIRAI intelligence network for continuous medical knowledge evolution

Together, these will complete the Autonomous Medical Reasoning Core, turning PyDxAI from a reactive tool into a continuously learning assistant.


8. Summary: A New Cognitive Milestone

Today’s session marked a quiet but profound milestone:
PyDxAI is no longer just a retrieval-based system — it has begun to reason like a clinician.

It interprets unclear questions, searches intelligently, and formulates context-aware, evidence-based responses. The logs show not just computations, but cognition — a structured process of perception, analysis, and adaptation.

Each layer, from query sharpening to RAG synthesis, now contributes to a unified intelligence loop — the same cognitive pattern that defines human problem-solving. With these capabilities, PyDxAI stands closer than ever to its mission:
to become the safest, most intelligent, and most transparent diagnostic AI system built for medicine.

PyDxAI Achieves Successful Agentic RAG Integration with Intelligent Search Intent

Today marks a major breakthrough in the development of PyDxAI, our agentic medical knowledge system designed to combine reasoning, retrieval, and autonomous learning. After weeks of refinement, debugging, and optimization, the system has achieved a fully functional Agentic Retrieval-Augmented Generation (RAG) workflow — now capable of dynamically detecting search intent, fetching relevant documents, and integrating live web search results into coherent medical summaries.

This successful test represents a key step toward building a self-sustaining, reasoning-driven AI that learns continuously from medical data, guidelines, and real-world context.

🧠 What Is Agentic RAG?

In traditional RAG systems, the model retrieves information from a static database and integrates it with an LLM’s reasoning before generating a final answer. However, the Agentic RAG framework extends this concept. It adds decision-making ability — allowing the AI to determine when to search, what to retrieve, and how to combine contextual knowledge from multiple layers of memory and web data.

PyDxAI’s agentic structure includes:

  • FrontLLM: The conversational reasoning engine that analyzes user queries.
  • Triple Memory System: A structured memory composed of short-term chat history, session memory, and global medical knowledge.
  • Retriever Layer: A hybrid retriever that connects to Qdrant for vector search and to external search engines like Bing, Brave, or PubMed when local results are insufficient.
  • PromptBook Engine: A YAML-based modular prompt system that defines domain roles, reasoning modes, and fallback prompts.

With these components working together, the system can perform autonomous query refinement, retrieve both local and web data, and generate concise, evidence-based medical responses — all without manual supervision.


🔍 The Test Case: “Search for COVID Vaccine Adverse Effects”

To evaluate the integrated system, a real-world query was chosen:

“Search for COVID vaccine adverse effects.”

This test was ideal because it requires multi-source synthesis — combining current scientific understanding with structured clinical data from guidelines and textbooks.

Here’s how the system performed step-by-step:

  1. Query Sharpening:
    The front LLM refined the user query automatically:
    Sharpened query: “COVID vaccine adverse effects.”
  2. Retriever Activation:
    The system selected the VectorStoreRetriever and fetched three context documents from the local Qdrant database, including excerpts from:
    • NIH COVID-19 Treatment Guidelines (2025)
    • CURRENT Medical Diagnosis and Treatment (2022)
    • Harrison’s Principles of Internal Medicine (2025)
  3. Intent Recognition:
    The agent analyzed the query and flagged it as a search-type intent (verified by the second check).
    It then forced a web search, querying multiple sources (Wikipedia, Bing, Brave, etc.) to ensure up-to-date information.
  4. Web Integration:
    The system retrieved five live results from the web, merged them with internal medical data, and produced a unified summary emphasizing both safety and rare adverse events associated with COVID-19 vaccines.
  5. Memory Consolidation:
    After generating the answer, the session’s memory and embeddings were automatically saved into both the chat history and the global memory.
    Although a JSON syntax error occurred in one field (invalid input syntax for type json), the overall memory write was successful — confirming both redundancy and resilience of the data-saving mechanism.

🧩 The Output: Medical-Grade Summary

The generated response was not only accurate but also aligned with current clinical evidence:

“COVID-19 vaccines are generally safe and effective, but like any medical intervention, they can have side effects. Common local reactions include pain, redness, and swelling at the injection site. Systemic symptoms such as fever, fatigue, and headache may occur. Rare events include anaphylaxis, thrombosis, and myocarditis, particularly in young males after mRNA vaccines. Most side effects are mild and self-limited.”

The response also provided references (CDC and PubMed Central), reflecting the system’s ability to automatically cite reputable medical sources — a core requirement for responsible AI in healthcare.


⚙️ Technical Milestones

Key success points from today’s implementation:

  • Search Intent Detection: Correctly classified and triggered web search mode.
  • RAG Document Retrieval: Retrieved 3 relevant documents from local vector database.
  • Web Context Fusion: Combined local and external results seamlessly.
  • Memory Update System: Stored new knowledge entries into both session and global memory tables.
  • Autonomous Reasoning: Generated coherent, medically consistent summary without explicit instructions.

The only remaining issue was a minor JSON formatting bug during memory insertion ({web_search...} token not enclosed in quotes). This is a simple fix — ensuring all metadata keys are stringified before passing to PostgreSQL/MariaDB insertion.


🧭 Why This Matters

This milestone proves that PyDxAI is evolving beyond a static chatbot or RAG prototype. It’s becoming an autonomous medical reasoning system — capable of:

  • Recognizing when it doesn’t know an answer.
  • Searching intelligently using real-time data sources.
  • Integrating retrieved evidence into structured medical responses.
  • Learning continuously through memory reinforcement.

Such a system lays the foundation for a next-generation AI medical assistant that can stay current with rapidly evolving clinical knowledge, from new antiviral drugs to emerging vaccine data.


🌐 The Road Ahead

Next steps for PyDxAI development include:

  1. Fix JSON encoding during memory saving.
  2. Enhance confidence scoring between local vs. web-sourced data.
  3. Add summarization weighting — giving higher priority to peer-reviewed medical documents.
  4. Integrate PubMed API retrieval for direct evidence-based references.
  5. Enable agentic self-evaluation, where the system critiques and improves its own answers based on retrieved context.

With these improvements, PyDxAI will approach a truly autonomous agentic medical knowledge engine, bridging the gap between AI reasoning and clinical reliability.


In summary, today’s success demonstrates that PyDxAI’s Agentic RAG pipeline — equipped with reasoning, retrieval, and adaptive learning — can now perform as a self-sufficient intelligent assistant for medical knowledge exploration.

Each successful query brings it one step closer to the vision of MIRAI, the evolving AI ecosystem for autonomous, evidence-based medical reasoning.

PyDxAI Agentic Intelligence — System Progress Report (Nov 4, 2025)

Today marks a major milestone in the evolution of PyDxAI, our autonomous medical reasoning system designed to combine large language model (LLM) intelligence with structured medical retrieval, self-reflection, and memory management.

For the first time, every layer of the pipeline—from query sharpening to vector retrieval, agentic web search, and contextual memory saving—worked seamlessly in a complete, closed loop.

🧩 The Core Idea: From Simple Question to Intelligent Response

The user prompt that triggered today’s full agentic flow was:

“The patient comes with cough, fever, and headache for four days. What is the management?”

A simple question on the surface—but it represents exactly the kind of everyday clinical scenario where PyDxAI must interpret vague input, retrieve high-quality references, and deliver a precise, evidence-based answer.

The system begins by sharpening the user query. The “front LLM” (DeepSeek or Mistral backend) normalizes phrasing and ensures context clarity—turning free text into a semantically structured medical question.

This step converts “The patient come with cough, fever, headache” into a standardized diagnostic request suitable for RAG (Retrieval-Augmented Generation).


🔍 Smart Retrieval: Context from Trusted Medical Sources

Once sharpened, PyDxAI’s retriever selector analyzes the query type.
Because this prompt matched the symptom_check intent, the system automatically chose the VectorStoreRetriever module linked to Qdrant, our local vector database at localhost:6333.

Within seconds, three authoritative documents were retrieved:

  • Oxford Handbook of Emergency Medicine, 5th Edition (2020)
  • Tintinalli’s Emergency Medicine
  • CURRENT Medical Diagnosis and Treatment (2022)

This confirms that the Qdrant-based vector retrieval pipeline is functioning optimally—embedding alignment, relevance scoring, and text segmentation are all correctly tuned. Each document returned precise context segments about fever, headache, and respiratory symptoms, forming the evidence backbone for the final reasoning phase.


🧠 Contextual Memory: Teaching the System to Remember

Parallel to document retrieval, the memory subsystem activates. PyDxAI now maintains three distinct layers of recall—session memory, long-term memory, and a condensed cross-session memory.

In today’s run, the system successfully retrieved three memory entries, then automatically condensed them into a 506-character summary. The memory context was inserted into the reasoning prompt to enrich the LLM’s perspective without overwhelming it.

For example, the retrieved memory contained a reflective note from a prior interaction—illustrating that the model’s recall layer is functioning, even if not yet domain-filtered. Future improvements will allow PyDxAI to distinguish between “medical” and “general” memories, retrieving only those relevant to the task at hand.

This marks an important step toward a true cognitive agent—one that not only recalls data but can contextualize it to improve understanding over time.


⚙️ The Agentic Chain in Action

When the reasoning phase begins, all components interact autonomously:

  1. Front LLM refines the user query and detects intent.
  2. RAG Engine (Qdrant) retrieves semantically similar passages.
  3. Memory Manager merges condensed recall and session context.
  4. Main LLM (DeepSeek or Mistral) generates the medical answer.
  5. Post-processor evaluates the response quality.
  6. If weak, the agentic trigger launches a web search and retries.
  7. Finally, results and reasoning context are stored in both session and global memory tables.

The full log from today’s run showed flawless execution of this cycle.
Response generation, embedding comparisons, and data saving all occurred within 3–5 seconds—a solid performance benchmark for an on-premise multi-component AI stack.


💾 The Database Fix: When JSON Speaks Python

Earlier in the day, a small but critical bug appeared when saving memory to PostgreSQL:

❌ Failed to save memory: invalid input syntax for type json
DETAIL: Token "web_search" is invalid.

The problem: Python dictionaries were being inserted directly into JSON columns without serialization.

The fix was straightforward but essential—adding a json.dumps() conversion before insertion. Once implemented, all memory entries, including structured tags like ["web_search"] and summary dictionaries, were stored cleanly.

After that, memory saving logs confirmed:

✅ Memory saved id=151  scope=session
✅ Saved to chat_history + global_memory

This repair closed the loop between reasoning output and persistent learning—PyDxAI now records its conversations, summaries, and contextual metadata flawlessly.


📈 Diagnostic Insights from the Logs

Several key insights emerged from the system logs:

  • Embeddings consistency — Both query and memory vectors were 768-dimensional, confirming model compatibility.
  • Latency — Each retrieval step completed in under 0.5 seconds.
  • Memory summarization — Context compression effectively reduced noise.
  • Intent detection — Correctly classified the query as “symptom_check,” demonstrating good keyword-to-intent mapping.

Every one of these signals contributes to the overarching goal: a self-refining, agentic medical assistant capable of understanding, retrieving, reasoning, and learning continuously.


🔮 Next Steps

Although today’s performance was nearly perfect, a few refinements are planned:

  1. Domain filtering:
    Only retrieve memories labeled as “medical,” excluding unrelated text from past sessions.
  2. Relevance thresholds:
    Dynamically limit retrieved documents based on similarity score, improving response clarity.
  3. Structured output:
    For clinical queries, responses will follow a fixed format—
    Assessment → Differential diagnosis → Investigations → Management.
  4. Latency tracking:
    Introduce automatic performance logs to measure response time and GPU utilization per query.
  5. Agentic self-review:
    Future versions will let PyDxAI critique its own responses using a smaller evaluation model (“judge LLM”) and revise them autonomously.

🩺 Conclusion

Today’s successful run demonstrates that PyDxAI is no longer a simple RAG chatbot—it’s an emerging agentic system with memory, reasoning, and autonomous control.

It can decide when its own answer is weak, trigger a search, retry with improved context, and persist the result for future learning. Each of these abilities mirrors fundamental cognitive behaviors—reflection, recall, and adaptation.

From a medical perspective, this means the model can handle increasingly complex clinical reasoning with better evidence grounding. From a system design perspective, it shows the power of integrating multiple specialized subsystems—retrievers, memory engines, and LLMs—into one cohesive intelligence loop.

November 4, 2025 thus stands as a key point in PyDxAI’s journey:
the day when autonomous reasoning, retrieval, and memory truly began to work together—transforming it from a reactive assistant into a proactive medical intelligence system.

Building MIKAI and PyDxAI’s Memory Brain: Integrating Global Knowledge into RAG

November 1, 2025 — Dr. Kijakarn Junda

Today marked a major milestone in the development of MIKAI, my evolving AI assistant for medical reasoning and contextual understanding. The focus was on strengthening how MIKAI remembers, learns, and retrieves information — moving beyond traditional retrieval-augmented generation (RAG) into a hybrid model that integrates structured global memory and semantic correction learning.

This update transforms MIKAI from a purely retrieval-based chatbot into an assistant capable of recalling, refining, and applying knowledge in real-time.


1. From RAG to Memory-Augmented Intelligence

A standard RAG system works by embedding queries and documents into a vector space and retrieving the most relevant context before generating a response. While effective for static databases, it lacks the ability to grow its understanding through interactions.

Today’s work extended this architecture with two complementary components:

  1. Global Memory – A persistent database that stores knowledge and corrections learned from users.
  2. Session Memory – A short-term recall buffer that remembers ongoing conversation context.

Each memory entry includes:

  • The original text or correction
  • Metadata (scope, timestamp, source)
  • A 768-dimension embedding created using NeuML/pubmedbert-base-embeddings

These embeddings allow MIKAI to semantically retrieve facts it has “learned” during previous sessions.


2. Engineering the Memory Manager

At the heart of the system lies the new MemoryManager, connected to both PostgreSQL (for structured memory logging) and Qdrant (for semantic vector search).

All corrections and knowledge updates are encoded, vectorized, and upserted into Qdrant as part of a live cognitive map. The log output shows the chain clearly:

✅ Dr.K correction saved (session=None, scope=global)
✅ Correction upserted to Qdrant

Each entry also receives a summary embedding, enabling fast similarity matching. When a user later asks a related question, the retriever combines RAG documents from medical handbooks with stored global memories, giving the model both textbook context and personalized understanding.

3. Integrating PubMedBERT as MIKAI’s Semantic Backbone

Instead of generic sentence transformers, we selected PubMedBERT, trained on biomedical text. This ensures that embeddings reflect the subtle relationships between medical concepts — symptoms, diagnoses, and treatments.

For example, when saving the correction:

“PyDxAI means Python programming for diagnosis assisted by AI.”

PubMedBERT generates a vector representation that encodes this concept’s meaning in relation to other biomedical and computational terms. Later, when asked “What is PyDxAI?”, MIKAI retrieves and contextualizes it alongside its RAG-based sources, producing the clean response:

“Pydxai refers to Python programming used in medical diagnosis, assisted by artificial intelligence (AI).”

This proves the memory pipeline is functioning semantically, not just lexically.


4. Merging Global Memory with RAG Context

Retrieval logs demonstrate the dual-source blending:

📄 Retriever VectorStoreRetriever returned 3 docs
📚 Retrieved 3 context docs for query='so what is pydxai ?'
📚 Retrieved 3 memory docs for query='so what is pydxai ?'
🔍 retrieve_similar returned 5 memories

MIKAI first gathers three external knowledge documents from the Qdrant knowledge base (data_kb) — medical handbooks, diagnostic manuals, and related text.
Next, it adds three internal memories from the PostgreSQL store, representing what MIKAI has learned directly from prior user interactions.

The two sources merge to form a hybrid cognitive context, giving MIKAI both the authoritative voice of structured literature and the adaptability of human-like recall.

5. Debugging and Refining the Cognitive Flow

During the process, several challenges surfaced:

  • Missing attributes (retrieve_context) in the memory manager caused fallback warnings.
  • Mismatch in vector lengths during early embedding tests (375k vs 400k dims) revealed the importance of consistent tokenizer and model versions.
  • Minor Pydantic validation issues highlighted how Qdrant expects clean, typed inputs for each PointStruct.

Each issue was systematically addressed, leading to the final stable state: smooth upserts, accurate retrievals, and synchronized 768-dimensional embeddings between query and memory.

The logs now show a clean cognitive loop:

✅ Memory saved id=63 scope=global summary=pydxai means Python programming for diagnosis assisted by AI
✅ Correction upserted to Qdrant
✅ chat_history saved id=85 session=e7b804be-b05a-4852-9694-cbf015e006ed

6. Understanding the Cognitive Architecture

The current MIKAI stack can be summarized as follows:

User Query → FrontLLM (Magistral-Small ONNX)
           → Query Embedding (PubMedBERT)
           → RAG Retriever (Qdrant)
           → Memory Retriever (Postgres + Qdrant)
           → Context Fusion
           → Response Generator
           → Correction Feedback → Global Memory

very correction or clarification enriches MIKAI’s long-term understanding, closing the feedback loop. This represents the foundation of self-learning AI, where the model refines itself through conversation.


7. Toward the Cortex Layer

The next planned evolution is to add a Cortex Controller — a lightweight reasoning layer that decides when to:

  • Use memory recall,
  • Trigger a RAG retrieval,
  • Or directly generate from the base model.

Once the Cortex is integrated, MIKAI will exhibit selective attention — prioritizing information sources dynamically based on confidence and context.


8. Reflections

Today’s progress demonstrates that memory is the missing half of reasoning.
While retrieval provides information, memory provides continuity.
With Qdrant as the semantic substrate and PubMedBERT as the biomedical encoder, MIKAI now stands closer to a living medical knowledge system — one that can not only read and retrieve but also remember, correct, and evolve.

The system now recalls facts like a trained assistant:

“PydxAI means Python programming for diagnosis assisted by AI.”

A simple phrase — but also proof that the AI has begun to understand its own identity.


Next Steps: Integration of Cortex, improvement of memory scoring logic, and extension of semantic recall into the front LLM pipeline.

From MIKAI to PydxAI: The Evolution of Intelligent Medicine

Artificial intelligence in medicine has moved beyond the experimental stage. It now sits at the heart of modern diagnostics, research, and patient care — quietly reshaping how physicians access knowledge, process data, and make decisions.

From this transformation came MIKAI — a local, privacy-first medical AI built to reason, learn, and assist clinicians. Today, that system is evolving into something more capable, modular, and forward-looking: PydxAI.

The Beginning: MIKAI’s Mission

The journey began with MIKAI (Medical Intelligence + Kijakarn’s AI) — a local large language model system designed for doctors who wanted autonomy, security, and precision. MIKAI ran on local hardware (Tesla P40 + RX580), processed medical texts, learned from new journals, and integrated with a Retrieval-Augmented Generation (RAG) pipeline to provide evidence-based answers.

Its purpose was simple yet powerful:

  • To understand complex clinical questions.
  • To retrieve verified knowledge from curated sources.
  • To reason based on established medical logic.
  • And to learn continually from new data.

Unlike cloud-based AI assistants, MIKAI never sent data outside the user’s network. Every medical conversation, every analysis, stayed secure — an important principle for healthcare professionals who handle sensitive patient information.

Why Evolve? The Birth of PydxAI

As MIKAI matured, new challenges appeared. Medicine is not static; new drugs, diseases, and discoveries emerge daily. The model needed to evolve beyond being a “local assistant” — it needed to become a dynamic diagnostic intelligence.

Hence, PydxAI was born.

The name PydxAI combines three core ideas:

  • Py → the Python ecosystem that powers flexibility and open development.
  • Dx → the universal shorthand for “diagnosis,” symbolizing clinical reasoning.
  • AI → the intelligence layer that bridges computation and care.

PydxAI represents not just a rebrand, but a new architecture — a system built for self-learning, multi-modal reasoning, and open research collaboration.

Core Philosophy: Intelligence with Integrity

Healthcare demands trust, and that means every AI system must be transparent, explainable, and secure. PydxAI is built on three pillars:

1. 

Local Intelligence, Not Cloud Dependency

All models, embeddings, and RAG databases run locally or on secure servers under full user control. Physicians or institutions can deploy PydxAI without sending any patient data to third-party APIs.

2. 

Explainable Diagnostic Reasoning

Every inference, every answer, and every decision can be traced back to the supporting evidence. PydxAI’s reasoning engine doesn’t just give results — it explains why and how those results were generated.

3. 

Adaptive Medical Learning

PydxAI continuously refines its knowledge through structured ingestion pipelines — adding new clinical studies, guidelines, and textbooks. This allows it to evolve in real-time without retraining from scratch.

Architectural Evolution

MIKAI laid the foundation — a system that combined LLM inference with RAG-based retrieval and MariaDB knowledge management.

PydxAI extends that architecture into a more robust, modular structure:

This modular approach allows each layer to evolve independently — new embeddings, better fine-tunes, or secure federated updates — without disrupting the rest of the system.

The Technology Stack

PydxAI is grounded in open technologies that support long-term scalability:

  • Core Engine: Python 3.11 with FastAPI backend
  • Inference Models: Magistral 24B, Mistral 7B, and custom medical LoRA layers
  • Database: MariaDB for structured medical knowledge
  • Document Storage: Encrypted RAG-based vector store
  • Hardware: Optimized for hybrid setups (NVIDIA + AMD)
  • Frontend: Responsive chat interface with iframe support and cloudflare tunnel

This setup ensures the system can operate efficiently even on affordable GPU hardware — empowering clinics and researchers to run private AI without massive cloud costs.

From Chatbot to Clinical Companion

MIKAI started as a medical chatbot. PydxAI becomes a clinical companion.

It doesn’t just answer — it collaborates.

Imagine a physician uploading a scanned medical record. PydxAI extracts structured fields (name, DOB, diagnosis, medication), analyzes lab trends, and generates a brief summary for documentation — all offline.

Or a researcher querying for the latest insights on thyroid cancer genetics. PydxAI searches, summarizes, and cites verified medical literature.

In both cases, the AI acts as an intelligent partner, not just a language model.

Privacy by Design

In healthcare, security isn’t optional — it’s foundational.

That’s why PydxAI inherits MIKAI’s strict privacy standards:

  • All patient data is processed locally.
  • No cloud logging or telemetry.
  • Full control over encryption keys and access permissions.

For hospital deployment, PydxAI can integrate with existing EHR or PACS systems through secure APIs, ensuring compliance with data protection laws like HIPAA and Thailand PDPA.

Learning from the Field

One of MIKAI’s most successful experiments was RAG-based medical summarization — using a curated corpus of peer-reviewed sources to generate structured medical knowledge. PydxAI builds upon this by adding feedback learning, where user validation improves its accuracy over time.

For instance, if a doctor marks an answer as “verified,” that context is prioritized in future retrievals. Over weeks, the model learns the preferences, style, and reasoning habits of its users — becoming more aligned with their clinical workflow.

Toward the Future of Intelligent Healthcare

The long-term roadmap for PydxAI includes several ambitious goals:

  • Multimodal Intelligence: integrating radiology images, lab data, and EHR text.
  • Voice-to-Text Integration: real-time clinical dictation with structured summaries.
  • Federated Training: enabling hospitals to contribute to shared model improvements without sharing raw data.
  • Explainable Visual Output: flowcharts, lab graphs, and pathophysiological reasoning trees.

Each goal moves toward a central vision: a learning system that grows with medicine, understands context, and supports every clinician, researcher, and student.

A Message from the Developer

“MIKAI was my first step toward building an AI that truly understands medicine — not as data, but as care. PydxAI is the next evolution of that dream: to make intelligent diagnosis, adaptive reasoning, and continuous learning part of everyday medical life.”

— Dr. Kijakarn Junda, Developer of PydxAI

Inside MIKAI: How Retrieval-Augmented Generation (RAG) Makes Medical AI Reliable

Artificial intelligence in medicine must never guess.

When a doctor asks a question about a new diabetes guideline or a rare endocrine disorder, the answer must be accurate, transparent, and backed by data.

This is where Retrieval-Augmented Generation (RAG) becomes the foundation of reliability in MIKAI, our evolving medical AI system.

MIKAI’s mission is to learn continuously from trusted sources — guidelines, journals, and local knowledge — while always showing where the information comes from.

RAG is the method that allows this to happen.

🧠 What Is RAG?

RAG stands for Retrieval-Augmented Generation, a hybrid approach that combines two powerful components:

1. Retrieval — Searching a curated knowledge base for the most relevant documents.

2. Generation — Using a language model (like Mistral, LLaMA, or Magistral) to synthesize a natural-language answer from those retrieved facts.

Instead of relying purely on what’s inside the model’s parameters, RAG adds a real-time “memory” layer — a document store — where verified information is indexed and retrieved when needed.

This is crucial in medical use: guidelines update yearly, research evolves monthly, and each case may depend on local context.

With RAG, MIKAI can stay current without retraining the entire model.

🩺 Why RAG Matters for Medical Reliability

Traditional large language models (LLMs) like GPT or Mistral learn by pattern recognition — they generate fluent text but can “hallucinate” if the information isn’t in their training data.

In medicine, that’s unacceptable.

If an AI suggests an incorrect insulin dose or confuses a diagnostic criterion, it could cause harm.

In MIKAI, every response — from “best management of diabetic ketoacidosis” to “latest thyroid cancer guidelines” — is backed by retrieved excerpts from the medical literature stored locally and encrypted for security.

⚙️ The MIKAI RAG Pipeline: Step-by-Step

Here’s a simplified version of how RAG works inside MIKAI.

    +----------------+
    |  User Query    |
    +--------+-------+
             |
             v
  +----------+-----------+
  |  Retrieval Component |
  | (Vector DB / RAG DB) |
  +----------+-----------+
             |
             v
  +----------+-----------+
  |  LLM Generator (Mistral,|
  |  Magistral, or Llama)   |
  +----------+-----------+
             |
             v
     +-------+-------+
     | Final Answer  |
     | + Sources     |
     +---------------+

Let’s break down each part as implemented in MIKAI.

1. Document Ingestion

The ingestion pipeline is where MIKAI learns from trusted data.

Medical sources — PDF guidelines, research articles, textbooks, or hospital documents — are scanned, chunked, vectorized, and indexed.

Example:

When you upload the “2025 ADA Standards of Care in Diabetes”, MIKAI automatically:

• Extracts text using PyMuPDF or LangChain PDF loader

• Splits long paragraphs into manageable chunks (e.g., 512–1024 tokens)

• Embeds each chunk into a high-dimensional vector using SentenceTransformers or InstructorXL

• Stores the vectors in a Qdrant or FAISS database, linked with metadata (title, author, source date)

Each chunk becomes a searchable “knowledge atom” — small, precise, and encrypted

MIKAI’s local setup on Linux (with /opt/mikai/ SSD storage) keeps all ingested documents physically separated from the LLM runtime — ensuring data integrity and portability.

2. 

Retrieval

When a user asks, for example,

“What is the recommended HbA1c target for elderly diabetic patients according to ADA 2025?”

MIKAI doesn’t guess.

The retriever converts this query into a vector embedding and compares it to all stored chunks in the database using cosine similarity.

The top-ranked results (usually 3–5 chunks) are passed to the LLM as context.

This is the “grounding” process — the LLM only generates text based on verified, retrieved facts.

3. 

Generation

Once the context is retrieved, it’s injected into the prompt template.

Answer:

According to the ADA 2025 Standards, HbA1c targets for elderly patients should be individualized.

• Healthy older adults: <7.5%

• Frail or limited life expectancy: <8%

Sources: ADA 2025 Standards of Care, Section 13.

That’s RAG in action — retrieval ensures reliability, and generation ensures readability.

🔒 Encryption and Security

Medical AI must safeguard data as strongly as it serves it.

MIKAI employs multi-layer encryption across its RAG pipeline:

1. Database Encryption

• All vector stores and metadata in MariaDB/Qdrant are encrypted using AES-256.

• Access keys are stored in a local .env file not exposed via the web tunnel.

2. Transport Encryption

• When MIKAI communicates through a Cloudflare tunnel or API, all traffic is TLS 1.3 secured.

• No raw data or vector payloads are ever sent to public endpoints.

3. Local Sandboxing

• MIKAI runs its ingestion and inference services in Docker containers under –privileged=false mode.

• User-uploaded files never leave the /opt/mikai/ingest directory.

4. Optional Hash Verification

• Each ingested document is SHA-256 hashed.

• On retrieval, MIKAI verifies the hash to confirm that no tampering occurred.

This ensures data authenticity, a core principle for medical compliance and trustworthiness.

🧩 The Memory and Feedback Layer

In addition to the RAG database, MIKAI integrates a memory manager that records interactions and feedback.

Conversations are stored in two layers:

  • Session memory – temporary chat history within the active conversation.
  • Global memory – only high-rated or “approved” responses are promoted here.

This dual memory system lets MIKAI gradually learn from verified human feedback while maintaining strict separation between transient chat and permanent knowledge.

If a doctor flags an answer as correct (feedback = 5), that response is re-indexed into the RAG database — expanding MIKAI’s contextual reliability.

🧩 Example: Endocrine Case Consultation

Let’s imagine a real clinical scenario inside MIKAI’s chat:

Doctor:

A 68-year-old male with type 2 diabetes and mild cognitive impairment.

What is the ADA 2025 recommendation for HbA1c target?

Step 1:

Query embedding → Retrieval from ADA 2025 document store.

Step 2:

Top 3 text chunks retrieved from “Older Adults” section.

Step 3:

Prompt + context fed into Magistral-24B model.

Step 4:

Generated response (grounded in sources) displayed in the chat UI.

Step 5:

Doctor clicks 👍 “reliable” → stored into global_memory.

Later, another user’s query on the same topic retrieves both the ADA citation and MIKAI’s own verified explanation — forming a dynamic, ever-improving knowledge graph.

💽 Continuous Ingestion and Update

Medical science evolves daily, and MIKAI’s ingestion pipeline is built for continuous learning.

Every week or month, new PDFs or journal summaries can be placed into /opt/mikai/new_docs/.

RAG Reliability Metrics

To quantify reliability, MIKAI tracks several internal metrics:

  • Context precision: How many retrieved chunks are relevant
  • Answer faithfulness: Whether the LLM introduces unverified claims
  • Source transparency: Whether all statements cite retrievable sources
  • User feedback scores: Average confidence rating from doctors

For example, MIKAI’s current test on ADA-based queries yields:

MetricScore
Context precision94%
Faithfulness97%
Source transparency100%
User confidence4.8 / 5

These results show how retrieval + encryption + human feedback together make RAG trustworthy in clinical environments.

🌐 Deployment: From Local to Cloud-Linked

MIKAI primarily runs locally on Linux, with GPU acceleration via Tesla P40 and an RX 580 display card.

However, through Cloudflare Tunnels, it can safely expose a mini chat interface to the web for remote testing.

The system’s modular architecture keeps critical components separate

This separation supports high performance, strong privacy, and quick debugging when new models or sources are added.

🧭 The Philosophy: Reliable AI Through Grounded Knowledge

RAG isn’t just a technique — it’s a philosophy.

For medical AI like MIKAI, reliability doesn’t come from bigger models alone.

It comes from:

1. Grounded data – each answer built upon verified context.

2. Transparency – every citation traceable.

3. Security – encryption and local control.

4. Adaptability – continuous ingestion and feedback learning.

In this sense, MIKAI is more than a chatbot — it’s a digital medical librarian fused with a reasoning engine.

It remembers, retrieves, reasons, and respects confidentiality — the same way a good physician treats knowledge and patient trust.

The Future of Medical AI: Transforming Healthcare in the Age of Intelligent Machines

Medical AI is reshaping the way doctors and patients interact with medicine. The integration of algorithms, vast health datasets, and machine learning has brought us closer to an era where AI becomes a true partner to human clinicians.

What is Medical AI?

Medical AI refers to the use of machine learning algorithms, natural language processing (NLP), and advanced data analytics to analyze health information and assist in clinical decision-making. Unlike traditional software that follows predefined rules, AI systems can “learn” from large datasets of medical records, images, lab results, and even real-time patient monitoring devices.

The goal is not to replace doctors, but to augment human intelligence, reduce errors, and improve efficiency. By handling repetitive tasks and analyzing vast volumes of information quickly, AI enables physicians to focus on what they do best: caring for patients.

Key Applications of Medical AI

1. 

Medical Imaging and Diagnostics

AI has achieved remarkable accuracy in detecting diseases from medical images. Algorithms trained on thousands of X-rays, MRIs, or CT scans can identify subtle patterns often invisible to the human eye. For example:

  • Detecting lung nodules in chest CT scans for early lung cancer diagnosis.
  • Identifying diabetic retinopathy in retinal photographs.
  • Spotting brain hemorrhages or strokes on emergency CT scans within seconds.

In some cases, AI systems match or even surpass radiologists in diagnostic performance, especially when used as a second reader.

2. 

Predictive Analytics and Risk Stratification

By analyzing electronic health records (EHRs) and real-world patient data, AI can predict which patients are at risk of complications. Hospitals already use predictive models to:

  • Anticipate sepsis before symptoms fully develop.
  • Identify high-risk cardiac patients.
  • Forecast readmission rates, helping hospitals allocate resources more efficiently.

Such predictive insights allow preventive interventions, potentially saving lives and reducing costs.

3. 

Drug Discovery and Development

Traditional drug development is costly and time-consuming, often taking more than a decade. AI accelerates this process by:

  • Analyzing biological data to identify promising drug targets.
  • Running virtual simulations of molecular interactions.
  • Predicting potential side effects before clinical trials.

During the COVID-19 pandemic, AI helped researchers rapidly scan existing drugs for possible repurposing, demonstrating its real-world utility.

4. 

Virtual Health Assistants and Chatbots

AI-powered virtual assistants can guide patients through symptom checking, appointment scheduling, medication reminders, and even lifestyle coaching. For example:

  • A diabetic patient may receive personalized reminders to check blood sugar.
  • A post-surgery patient might get daily follow-up questions to track recovery progress.

When integrated with EHRs, these assistants become even more powerful, providing context-aware advice.

5. 

Natural Language Processing in Medicine

Much of medicine is buried in unstructured data—physician notes, discharge summaries, or academic journals. AI-driven NLP tools can:

  • Extract key information from clinical notes.
  • Summarize patient histories automatically.
  • Enable better search and knowledge retrieval for doctors.

This reduces documentation burden and makes critical information accessible at the right time.

6. 

Robotics and AI-assisted Surgery

Robotic systems already assist surgeons in precision tasks. With AI integration, these robots can learn from thousands of prior surgeries to provide real-time guidance, reduce tremors, and enhance surgical accuracy. Surgeons remain in control, but AI acts as a co-pilot.

Benefits of Medical AI

  1. Improved Accuracy – Reducing diagnostic errors, one of the leading causes of preventable harm.
  2. Efficiency – Automating routine tasks frees up doctors’ time.
  3. Personalization – Tailoring treatments to genetic, lifestyle, and environmental factors.
  4. Accessibility – AI tools can deliver medical expertise to underserved or rural areas.
  5. Cost Savings – Earlier diagnosis and efficient resource allocation reduce healthcare costs.

Challenges and Limitations

Despite its promise, medical AI faces important challenges:

  • Data Privacy and Security: Patient data is sensitive; robust safeguards are essential.
  • Bias in Algorithms: AI trained on biased datasets may produce inequitable outcomes (e.g., underdiagnosing minorities).
  • Regulation and Validation: Medical AI must undergo rigorous clinical validation before adoption.
  • Integration with Clinical Workflow: Doctors may resist tools that disrupt established routines.
  • Trust and Transparency: Physicians and patients need explainable AI, not “black box” decisions.

These challenges highlight the importance of developing AI responsibly, with both ethical and clinical considerations in mind.

The Human-AI Partnership

The question often arises: Will AI replace doctors? The answer, for the foreseeable future, is no. Medicine involves empathy, context, and judgment that machines cannot replicate. Instead, the most powerful model is a collaboration where AI handles data-heavy analysis, while doctors bring human insight, compassion, and ethical decision-making.

A practical vision is:

  • AI as the assistant – suggesting diagnoses, flagging anomalies, or offering treatment options.
  • Doctor as the decision-maker – validating insights, considering patient values, and making the final call.

Together, this partnership enhances both safety and patient care.