⬅️ Back to Blog

LiquidAI Foundation Models (LFMs): A Master Guide to Efficiency by Design

LiquidAI's LFMs use a hybrid convolution+attention architecture to deliver faster inference and lower memory than pure transformers. Full model catalog, tier-by-tier benchmarks, and ecosystem trade-offs.

AI/ML10 min readAuthor: Kukil Kashyap Borgohain
LiquidAI LFM foundation models compared against Qwen, Phi, Gemma, and Llama with benchmark scores and inference engine support

LiquidAI's Liquid Foundation Models use a hybrid architecture: gated short convolutions alternating with grouped query attention blocks. The result is lower KV cache usage, faster decode, and smaller memory footprint than equivalent pure-transformer models.

This post covers the full catalog from 350M to 8.3B parameters, tier-by-tier benchmark comparisons against Qwen, Phi, Gemma, and Llama, and the honest trade-offs.


1. Company & Background

Founded in 2023 by MIT computer scientists working on dynamical systems and neural network theory. The company spun out of Liquid Neural Network (LNN) research. LNNs are built on continuous-time differential equations, not the standard transformer self-attention mechanism.

Core argument: the dense transformer is over-parameterized for most edge workloads. Replacing most attention layers with lightweight gated convolutions gives comparable performance with lower memory and decode latency.

[!NOTE] "Liquid" refers to Liquid Neural Networks, a specific architecture class from MIT's CSAIL lab. Not a marketing term.


2. The Hybrid Architecture

LFMs alternate Grouped Query Attention (GQA) blocks with gated short convolutional layers (LIV layers). In LFM2.5-8B, the ratio is 18 convolution layers to 6 GQA layers.

Loading diagram...

Why the ratio matters:

  • KV cache: Only the 6 GQA blocks generate KV pairs. Memory scales with 6 attention layers, not 24.
  • Prefill speed: Convolutions parallelize across the sequence dimension. Long prompts are processed faster than pure self-attention.
  • Constant memory at long context: Convolution layers use fixed memory regardless of sequence length, unlike transformers where KV cache grows linearly.

3. Model Catalog

Text (LFM2 / LFM2.5)

ModelParamsTrainingKey ScoresInferenceLicense
LFM2-350M350M10T tokensBrowser control fine-tuningTransformers, LEAP SDKLFM Open
LFM2-700M700M10T tokensEdge text inferenceTransformers, LEAP SDKLFM Open
LFM2-1.2B1.2B10T tokensCompetitive with Llama 3.2-1BTransformers, ExecuTorch, LEAP SDKLFM Open
LFM2-2.6B2.6B10T tokens82.41% GSM8K, 79.56% IFEvalTransformers, ExecuTorch, LEAP SDKLFM Open
LFM2.5-1.2B1.2B28T tokensRL-tuned: instruction, tool use, mathTransformers, ExecuTorch, LEAP SDKLFM Open

[!TIP] LFM2.5-1.2B was trained on 28T tokens. Nearly 3x the LFM2 budget. One of the most heavily trained models at the 1B scale. Japanese variant available (LFM2.5-1.2B-JP).

Vision-Language

ModelParamsVision EncoderKey FeatureLicense
LFM2.5-VL-1.6B~1.6BLFM2.5 nativeNative 512x512 resolution, WebGPU video captioning, multi-image OCRLFM Open

Audio

ModelParamsModalityKey FeatureLicense
LFM2.5-Audio-1.5B~1.5BAudio >> Text/AudioEdge voice agent, speech understandingLFM Open

Mixture-of-Experts

ModelTotal ParamsActive ParamsContextArchitecture
LFM2.5-8B-A1B8.3B1.5B128K tokensMoE with hybrid conv+GQA

8.3B total parameters, 1.5B active per token. 8B-class knowledge at 1.5B-class decode speed.


4. Head-to-Head by Parameter Tier

Tier 1: Under 500M

LiquidAI has no serious production-grade competition here. SmolLM and SmolVLM are research baselines.

ModelProviderParamsModalityUse Case
LFM2-350MLiquid AI350MTextBrowser control, edge text
LFM2-VL-450MLiquid AI450MImage+TextHyper-efficient edge VLM
SmolVLM-256MHugging Face256MImage+TextSmallest practical VLM
SmolLM2-360MHugging Face360MTextResearch baseline

Tier 2: 1B to 2B

ModelParamsModalityContextKey ScoreLicense
LFM2.5-1.2B1.2BText28T token pretrainingLFM Open
LFM2.5-VL-1.6B1.6BImage+TextWebGPU video captioningLFM Open
MiniCPM-V 4.61.3BImage+Text+Video262KIntelligence Index 13 (best under 2B)Apache 2.0
Qwen3.5-0.8B0.8BImage+Text+Video262KIntelligence Index 9Apache 2.0
Llama 3.2 1B1BText128KTool routing, ExecuTorchLlama Community

~1.3B VLM comparison:

FeatureMiniCPM-V 4.6 (1.3B)LFM2.5-VL-1.6BSmolVLM-2.2B
Vision encoderSigLIP2-400MLFM2.5 nativeSigLIP-400M
Video understandingYesYes (real-time)Yes
Mobile deploymentiOS, Android, HarmonyOSLEAP SDK onlyLimited
Visual token compressionMixed 4x/16xNative resolution81 tokens/patch
Intelligence Index13
Context window262K

[!WARNING] At the 1-2B VLM tier, MiniCPM-V 4.6 leads on every measurable benchmark. LFM2.5-VL-1.6B's strength is inference speed. It loses on context window, benchmark scores, and mobile SDK breadth.

Tier 3: 2.6B to 4B

ModelParamsMMLUGSM8KHumanEvalContextMultimodal
LFM2-2.6B2.6B82.41%No
Phi-4-mini3.8B67.3%88.6%55%128KNo
Gemma 3 4B4B65%89.2%71.3%128KImage
Qwen3.5-4B4B64%262KImage+Video
SmolLM3-3B3B63%80%128K (genuine)No
Llama 3.2 3B3B61.8%128KNo

LFM2-2.6B scores 82.41% on GSM8K. Phi-4-mini (88.6%) and Gemma 3 4B (89.2%) both beat it on math. LiquidAI's advantage at this tier is decode latency and memory, not benchmark scores.

Inference speed (3-4B, Q4_K_M):

ModelTok/s (M2 MacBook)Tok/s (RTX 4060)VRAM
Phi-4-mini40-50953 GB
Gemma 3 4B30-40703.5 GB
SmolLM3-3B35-45802.5 GB
Llama 3.2 3B35-45802.5 GB
LFM2-2.6BLower (hybrid arch)

[!NOTE] LiquidAI does not publish standardized tok/s numbers in the same format. Their edge is constant-memory decode at long context, which short-context benchmarks don't capture well.


5. Inference Engine Support

The biggest practical gap in the LFM ecosystem.

Model FamilyTransformersOllamavLLMllama.cppMLXSGLangONNXExecuTorch
Liquid AI LFM2
Qwen3.5
MiniCPM
Phi-4
Gemma 3/4
Llama 3.2

LFMs run only through HuggingFace Transformers, ExecuTorch, and the proprietary LEAP SDK. No Ollama, no vLLM, no llama.cpp, no MLX.

Practical implications:

  • Cannot use Ollama for local chat or API serving.
  • Cannot deploy through vLLM in production.
  • No GGUF quantization via llama.cpp.
  • No Apple Silicon acceleration via MLX.

[!CAUTION] If your stack depends on Ollama, vLLM, or llama.cpp, LFMs are not drop-in replacements for Qwen, Llama, or Phi today. This is the primary adoption barrier.


6. Community Sentiment

From r/LocalLLaMA and GitHub:

  • Speed holds up. The 1B and 1.2B variants consistently draw comments about generation feeling like streaming. The architecture's latency claims are real.
  • Quality above class. LFM2.5-1.2B output quality is frequently compared to 2-3B transformer models for summarization, filtering, and categorization tasks.
  • Reasoning is the weak spot. Multi-step logical reasoning is where LFMs lose to Qwen and Phi at equivalent sizes. The architecture is optimized for throughput, not chain-of-thought depth.
  • Ecosystem friction. Lack of Ollama and llama.cpp support is the most common complaint. Users who want to test LFMs often find their existing local setup simply doesn't load the model.

7. Decision Guide

Use CaseBest PickReason
Ultra-fast edge textLFM2.5-1.2B28T training, lowest latency at 1B scale
Edge VLM + real-time videoLFM2.5-VL-1.6BWebGPU captioning, native resolution
Edge voice agentLFM2.5-Audio-1.5BNo competing model at this size and modality
8B knowledge at 1.5B speedLFM2.5-8B-A1BMoE, 1.5B active params per token
Best multimodal under 2BMiniCPM-V 4.6Intelligence Index 13, 262K context, full mobile OS support
Best reasoning under 4BPhi-4-mini88.6% GSM8K, 70.4% BBH
Best code under 4BGemma 3 4B71.3% HumanEval
Broadest inference supportQwen3.5 (any size)Ollama, vLLM, llama.cpp, MLX, SGLang all supported

Conclusion

LiquidAI's hybrid architecture delivers on its core promise: lower memory and faster decode than pure transformers at equivalent parameter counts. The LFM2.5-Audio-1.5B has no direct competitor at its size and modality.

The blocker is ecosystem. Until LFMs get Ollama, vLLM, and llama.cpp support, they are a specialist choice. The architecture is production-ready. The tooling is not.


References

If the article helped you in some way, consider giving it a like. This will mean a lot to me. You can download the code related to the post using the download button below.

If you see any bug, have a question for me, or would like to provide feedback, please drop a comment below.