AI
Models

Explore and compare the high-performance models powering Augment. Organized by response speed and reasoning capabilities.

HardwareModels

Platforms that shape throughput, latency, and deployment options for AI workloads.

Wafer-scale AI compute

Cerebras Systems

WSE-3 + CS-3

Hardware profile

Builds wafer-scale processors packaged into CS-3 systems and sells access as on-prem hardware or cloud inference/training. Emphasizes high token throughput and low latency with an OpenAI-compatible API and partner distribution.

Highlights
  • WSE-3 specs: 4T transistors, 900,000 cores, 44GB SRAM, 125 PFLOPS peak.
  • Launch claims: about 1,800 tokens/sec on Llama 3.1 8B and 450 tokens/sec on 70B.
  • Clusters scale to 2,048 CS-3 systems; external memory up to 1.2PB for 24T-parameter models.
Go-to-market
  • OpenAI Chat Completions compatible API with on-prem and private cloud options.
  • Distribution via Hugging Face Hub (Jan 2025) and AWS Marketplace (2025).
  • Series G raised $1.1B at $8.1B valuation (reported 2025).
Risks

Customer concentration: G42 accounted for 83% of 2023 revenue and 87% of H1 2024; export controls and licensing remain material constraints.

Deterministic inference platform

Groq LPU + GroqCloud

GroqChip LPU + GroqRack + GroqCloud

Hardware profile

Groq focuses on deterministic AI inference with its GroqChip LPU, GroqRack clusters, and GroqCloud API. The platform emphasizes predictable execution and high on-die bandwidth while offering an OpenAI-compatible integration surface.

Highlights
  • Deterministic execution: Groq states no run-to-run variation with compile-time known execution time.
  • GroqChip v1.5: 14nm, 230MB SRAM, up to 80 TB/s on-die bandwidth, 16 RealScale links.
  • GroqRack: up to 64 interconnected chips with a stated 1.6 us rack latency.
Go-to-market
  • GroqCloud production API with model catalog, pricing, and OpenAI-style endpoints advertised in four regions.
  • Enterprise posture: Trust Center lists SOC 2 Type II; marketing claims GDPR and HIPAA alignment.
  • Published deprecations page plus data-retention policy (no retention beyond request except batch or fine-tuning up to 30 days).
Risks

Non-exclusive NVIDIA license and executive departures create roadmap continuity risk; the $20B estimate is unconfirmed. Data-retention exceptions and model deprecations require governance.

ENGINE

Sub-second Latency

Our Ultra-fast tier leverages Cerebras hardware for sub-second responses, enabling true real-time interaction without the wait.

LOGIC

Complex Reasoning

Premium models are optimized for zero-shot problem solving, deep strategic planning, and highly accurate instruction following.

Start Building
Augment Logo

Augment Intelligence Platform