Copenhagen AI
CPH.AI
Approach
Capabilities
Insights
Research Institute
Copenhagen AI
COPENHAGEN AI
ENGINEERING EXCELLENCECREATIVE RENAISSANCEHYPER OPTIMIZATION

We function as the strategic bridge between sovereign infrastructure and autonomous intelligence. Bridging the gap between frontier breakthroughs and systematic industrial execution.

The AI Suite

  • Runestone
  • Bedrock
  • Ledger
  • Vector
  • Aegis
  • Prism

Institute

  • Academic Partnerships
  • Open Source
  • Research Blog

Careers

  • Open Roles
  • The Residency
  • Interviewing
  • Culture
Global Offices
© 2026 Kæraa Group. All Rights Reserved.
Terms of Service|Privacy Policy|Responsible Disclosure|Accessibility Statement
Back to Intelligence Stream
RAGContext WindowBenchmarks

Beyond RAG: Long-Context Strategies

Benchmarking 1M+ context windows against vector retrieval.

ML Ops Team AUG 20, 2025 10 MIN READ

The Thesis: Is Vector DB Dead?

Retrieval Augmented Generation (RAG) was a patch for small context windows. With Gemini 1.5 Pro and GPT-5 offering 10M+ tokens, do we still need vector databases? The answer is nuanced.

The "Lost in the Middle" Problem

Our benchmarks indicate that while models can ingest 10M tokens, their ability to reason over facts buried in the middle of the context window degrades significantly. We define Recall Accuracy Ar as:

Ar(pos) = 1 - e-(pos - μ)² / 2σ²

GraphRAG: The Semantic Bridge

Standard Vector RAG fails at multi-hop reasoning. We are seeing massive gains with GraphRAG, which constructs a knowledge graph from the source documents. Instead of just retrieving similar chunks, the agent traverses the graph to find relationships between disparate entities.

Hub
Knowledge Graph Traversal

The Attention Sink

Filling the context window with irrelevant documents not only costs money; it dilutes the model's attention. Attention is a finite resource.

We advocate for a Hierarchical Retrieval strategy. Layer 1: Use metadata filtering (SQL) to narrow the search space. Layer 2: Use dense vector search (k-NN) to find semantic matches. Layer 3: Use a Cross-Encoder Reranker to grade the relevance of the top 50 chunks. Layer 4: Only feed the top 5 chunks into the LLM context. This "Funnel Architecture" ensures high precision and maximizes the model's reasoning capabilities on the data that actually matters.

Subscribe to The Transmission

Get these engineering field notes delivered directly to your inbox. Signal only.