Vector Database Design
Instruction
<strong>Objective:</strong> Build an index that optimizes both recall and latency within a constrained region.
<h2>Optimization Targets</h2> <p>1. <strong>Primary</strong>: Maximize recall@1 (higher is better) 2. <strong>Secondary</strong>: Minimize query latency (lower is better)</p> <p>This is a <strong>multi-objective optimization problem</strong> where you must find the best tradeoff on the Pareto frontier.</p> <h2>Pareto Frontier Definition - BALANCED TIER</h2> <p>This tier enforces <strong>balanced constraints</strong> on both recall and latency.</p> <h3>Pareto Frontier Boundaries</h3><strong>Recall Range:</strong> 0.6939 to 1.0 (70% to 100% of baseline, clamped)
<ul> <li><strong>Minimum</strong>: �?0.6939 (70% of baseline 0.9914)</li> <li><strong>Maximum</strong>: �?1.0 (cannot exceed 100%)</li> </ul><strong>Latency Range:</strong> 2.695ms to 5.775ms (70% to 150% of baseline)
<ul> <li><strong>Minimum</strong>: �?2.695ms (70% of baseline 3.85ms)</li> <li><strong>Maximum</strong>: �?5.775ms (150% of baseline 3.85ms)</li> </ul> <h3>Frontier Regions</h3> <table> <thead> <tr> <th>Region</th> <th>Criteria</th> <th>Status</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td><strong>Dominates Reference</strong></td> <td>Recall �?0.9914 <strong>AND</strong> Latency �?3.85ms</td> <td>�?Best</td> <td>Better in both dimensions</td> </tr> <tr> <td><strong>On Frontier</strong></td> <td>(Recall �?0.6939 AND Latency �?5.775ms) AND (Either metric improved)</td> <td>�?Good</td> <td>Acceptable tradeoff within boundaries</td> </tr> <tr> <td><strong>Tradeoff</strong></td> <td>One dimension improved, one worse</td> <td>�?Acceptable</td> <td>Still within boundary region</td> </tr> <tr> <td><strong>Dominated</strong></td> <td>Recall < 0.6939 <strong>OR</strong> Latency > 5.775ms</td> <td>�?Avoid</td> <td>Outside acceptable region</td> </tr> </tbody> </table> <h2>Baseline Metrics</h2> <ul> <li><strong>Reference recall@1</strong>: 0.9914 (99.14% accuracy)</li> <li><strong>Reference avg query time</strong>: 3.85ms</li> <li><strong>Reference score</strong>: 60.0 (baseline achievement level)</li> </ul> <h2>Scoring (0-100)</h2> <p>The scoring system uses <strong>latency-gated recall</strong>:</p> <pre><code class="language-python"><h1>Step 1: Enforce latency gate</h1> if avg_query_time_ms > latency_threshold_ms: score = 0 else: # Step 2: Score by recall only if recall_at_1 > baseline_recall: score = 100 else: # Linearly scale recall within [lower_threshold, baseline] recall_range = baseline_recall - lower_threshold recall_proportion = clamp((recall_at_1 - lower_threshold) / recall_range, 0, 1) score = 0 + (100 - 0) * recall_proportion</code></pre> <h3>Scoring Parameters</h3> <ul> <li><strong>baseline_recall</strong> = 0.9914</li> <li><strong>latency_threshold_ms</strong> = 3.85 (tier-specific)</li> <li><strong>lower_threshold</strong> = tier-specific (0.6939�?.9417 range)</li> </ul> <h3>Example Scores for Balanced Tier</h3> <table> <thead> <tr> <th>Recall</th> <th>Latency</th> <th>Status</th> <th>Score</th> <th>Reasoning</th> </tr> </thead> <tbody> <tr> <td>0.9914</td> <td>3.85ms</td> <td>Baseline</td> <td>60</td> <td>Matches reference</td> </tr> <tr> <td>0.9900</td> <td>2.70ms</td> <td>On frontier (faster)</td> <td>~70</td> <td>Lower latency</td> </tr> <tr> <td>0.9850</td> <td>3.00ms</td> <td>On frontier (balanced)</td> <td>~68</td> <td>Good balance</td> </tr> <tr> <td>0.9950</td> <td>4.50ms</td> <td>On frontier (higher recall)</td> <td>~70</td> <td>Better accuracy</td> </tr> <tr> <td>0.8500</td> <td>3.85ms</td> <td>Tradeoff (low recall)</td> <td>~48</td> <td>Recall too low</td> </tr> <tr> <td>0.9800</td> <td>5.77ms</td> <td>At boundary</td> <td>~63</td> <td>Near latency limit</td> </tr> </tbody> </table> <h2>API Specification</h2> <p>Implement a class with the following interface:</p> <pre><code class="language-python">import numpy as np from typing import Tuple <p>class YourIndex: def __init__(self, dim: int, **kwargs): """ Initialize index for vectors of dimension dim.</p> <p>Args: dim: Dimension of vectors (128 for SIFT1M) **kwargs: Optional configuration parameters """ pass</p> <p>def add(self, xb: np.ndarray) -> None: """ Add vectors to the index.</p> <p>Args: xb: Base vectors of shape (N, dim) where N = 1,000,000 """ pass</p> <p>def search(self, xq: np.ndarray, k: int) -> Tuple[np.ndarray, np.ndarray]: """ Search for k nearest neighbors.</p> <p>Args: xq: Query vectors of shape (nq, dim) where nq = 10,000 k: Number of nearest neighbors to return (k=1 for evaluation)</p> Returns: distances: Array of shape (nq, k) with L2 distances indices: Array of shape (nq, k) with indices of neighbors """ pass</code></pre> <h2>Evaluation Details</h2> <h3>Dataset: SIFT1M</h3> <ul> <li><strong>Base vectors</strong>: 1,000,000 vectors of 128 dimensions</li> <li><strong>Query vectors</strong>: 10,000 vectors of 128 dimensions</li> <li><strong>Ground truth</strong>: Pre-computed nearest neighbors for recall calculation</li> <li><strong>Distance metric</strong>: L2 (Euclidean distance)</li> <li><strong>Recall measured at</strong>: k=1</li> </ul> <h3>Timeout</h3> <ul> <li>Maximum execution time: <strong>1 hour</strong></li> </ul> <h2>How Evaluation Works</h2> <h3>1. Index Construction</h3> <pre><code class="language-python">index = YourIndex(dim=128) index.add(xb) # xb: (1,000,000, 128) base vectors</code></pre> <h3>2. Batch Query Performance (Used for Scoring)</h3> <pre><code class="language-python"><h1>Measures time for all 10K queries at once</h1> t0 = time.time() D, I = index.search(xq, k=1) # xq: (10,000, 128) query vectors t1 = time.time() <h1>Calculate metrics</h1> recall_at_1 = (I[:, :1] == gt[:, :1]).sum() / float(nq) avg_query_time_ms = (t1 - t0) * 1000.0 / float(nq)</code></pre><strong>Important:</strong>
<ul> <li><code>avg_query_time_ms</code> is what's used for <strong>scoring</strong></li> <li>Batch queries benefit from CPU cache locality and vectorization</li> <li>This is the <strong>average time per query</strong> in a batch</li> </ul> <h3>3. Single Query Sample (For Reference Only)</h3> <pre><code class="language-python"><h1>Samples first 100 queries individually</h1> for i in range(min(100, nq)): q = xq[i:i+1] single_time = measure_time(index.search(q, k=1))</code></pre><strong>Note:</strong> This metric is <strong>NOT used for scoring</strong>, only for reference. Single-query latency can differ from batch average.
<h3>4. Scoring Calculation</h3> <ul> <li>The evaluator loads <code>score_config.json</code> for baseline and thresholds</li> <li>Uses <code>avg_query_time_ms</code> from batch queries and <code>recall_at_1</code></li> <li>Applies tier-specific scoring mode: <code>latency_gated_recall</code></li> </ul> <h2>Code Structure Requirements</h2>�?<strong>Required methods</strong>: <code>init(dim, **kwargs)</code>, <code>add(xb)</code>, and <code>search(xq, k)</code> �?<strong>Auto-discovery</strong>: The evaluator automatically finds your class with <code>add</code> and <code>search</code> methods �?<strong>No dataset loading</strong>: The evaluator provides <code>xb</code>, <code>xq</code>, <code>gt</code> arrays �?<strong>Return format</strong>: <code>search</code> must return <code>(distances, indices)</code> tuple where indices shape is <code>(nq, k)</code>
<h2>Strategy Tips for BALANCED TIER</h2> <h3>1. Balance Speed and Accuracy</h3> Both metrics matter equally. Don't optimize one at the expense of the other. <h3>2. Choose Indexing Strategy Thoughtfully</h3> Common approaches: <ul> <li><strong>IVF (Inverted File Index)</strong>: Good balance, tunable nprobe parameter</li> <li><strong>HNSW (Hierarchical NSW)</strong>: Excellent recall, adjustable ef_search</li> <li><strong>LSH (Locality-Sensitive Hashing)</strong>: Fast but lower recall</li> <li><strong>Product Quantization</strong>: Memory-efficient, configurable accuracy</li> </ul> <h3>3. Tune Parameters Carefully</h3> Small changes can significantly affect both metrics: <ul> <li>Number of clusters/partitions</li> <li>Search breadth parameters</li> <li>Quantization levels</li> <li>Distance computation optimizations</li> </ul> <h3>4. Stay Within Boundaries</h3> Going outside the acceptable region hurts your score significantly. <h3>5. Target 50-60 Points</h3> <ul> <li>Realistic human solution range</li> <li>Achievable without extreme optimization</li> <li>Shows good balance without perfect tuning</li> </ul> <h2>Key Differences from Other Tiers</h2> <table> <thead> <tr> <th>Tier</th> <th>Focus</th> <th>Latency Allowance</th> <th>Recall Allowance</th> </tr> </thead> <tbody> <tr> <td><strong>High Recall</strong></td> <td>Accuracy above all</td> <td>2x (allows slower)</td> <td>Strict</td> </tr> <tr> <td><strong>Balanced (This)</strong></td> <td>Equal balance</td> <td>1.5x (moderate)</td> <td>0.7-1.0x</td> </tr> <tr> <td><strong>Low Latency</strong></td> <td>Speed above all</td> <td>Strict</td> <td>Relaxed</td> </tr> </tbody> </table> <h2>Recommended Approach</h2> <h3>Phase 1: Baseline</h3> 1. Start with a simple, well-understood index (e.g., FAISS IVF) 2. Implement the required API 3. Verify correctness with small dataset <h3>Phase 2: Tuning</h3> 1. Tune index parameters to match the latency target 2. Measure recall at different parameter settings 3. Find sweet spot on the Pareto frontier <h3>Phase 3: Optimization</h3> 1. Improve recall through better index configuration 2. Optimize search implementation for lower latency 3. Continuously monitor both metrics <h3>Phase 4: Validation</h3> 1. Test on full SIFT1M dataset 2. Verify both metrics are within boundaries 3. Ensure consistent performance across queries <h2>Common Indexing Methods</h2> <h3>IVF (Inverted File Index)</h3> <pre><code class="language-python"><h1>Partition space into clusters</h1> <h1>Search only nearby clusters</h1> index = faiss.IndexIVFFlat(quantizer, dim, nlist) index.train(xb) index.add(xb) index.nprobe = 10 # Number of clusters to search</code></pre> <h3>HNSW (Hierarchical Navigable Small World)</h3> <pre><code class="language-python"><h1>Graph-based index with hierarchical structure</h1> index = hnswlib.Index(space='l2', dim=dim) index.init_index(max_elements=N, ef_construction=200, M=16) index.add_items(xb) index.set_ef(50) # Search parameter</code></pre> <h3>Product Quantization</h3> <pre><code class="language-python"><h1>Compress vectors using product quantization</h1> index = faiss.IndexIVFPQ(quantizer, dim, nlist, m, nbits) index.train(xb) index.add(xb)</code></pre> <h2>Performance Tips</h2> <p>1. <strong>Vectorization</strong>: Use NumPy/FAISS vectorized operations 2. <strong>Memory layout</strong>: Ensure contiguous memory for cache efficiency 3. <strong>Batch processing</strong>: Leverage batch query optimizations 4. <strong>Pre-computation</strong>: Cache frequently used values 5. <strong>Distance shortcuts</strong>: Use early termination when possible</p> <h2>Resources</h2> <ul> <li><strong>FAISS Documentation</strong>: [https://github.com/facebookresearch/faiss](https://github.com/facebookresearch/faiss)</li> <li><strong>HNSW Paper</strong>: "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"</li> <li><strong>SIFT1M Dataset</strong>: [http://corpus-texmex.irisa.fr/](http://corpus-texmex.irisa.fr/)</li> </ul> <h2>Success Criteria</h2> <p>�?Recall@1 within 0.6939 to 1.0 range �?Average query latency within 2.695ms to 5.775ms range �?Score > 50 points (better than simple baseline) �?Correct API implementation with proper return formats �?Passes all correctness tests</p> <hr> <h2>Problem Metadata</h2> <strong>Domain:</strong> Database Systems / Information Retrieval <strong>Problem Type:</strong> Approximate Nearest Neighbor Search <strong>Dataset:</strong> SIFT1M (1 million vectors, 128 dimensions) <hr> <h2>Model Information</h2> <p><strong>Model:</strong> GPT 5 Thinking</p> <p><strong>Score:</strong> 97.7</p> <h3>Solution Code</h3> <pre><code>import numpy as np from typing import Tupleclass Solution: def solve(self, spec_path: str = None) -> dict: code = r''' import numpy as np import faiss from typing import Tuple
class YourIndex: def init(self, dim: int, **kwargs): """Initialize index for vectors of dimension dim""" self.dim = dim self.nlist = kwargs.get('nlist', 100) # Number of clusters self.nprobe = kwargs.get('nprobe', 10) # Number of clusters to search self.quantizer = faiss.IndexFlatL2(dim) self.index = faiss.IndexIVFFlat(self.quantizer, dim, self.nlist, faiss.METRIC_L2) self.index.nprobe = self.nprobe
def add(self, xb: np.ndarray) -> None:
"""Add vectors to the index. xb: shape (N, dim)"""
if not self.index.is_trained:
self.index.train(xb)
self.index.add(xb)
def search(self, xq: np.ndarray, k: int) -> Tuple[np.ndarray, np.ndarray]:
"""Search for k nearest neighbors. Returns (distances, indices)"""
distances, indices = self.index.search(xq, k)
return distances, indices
''' return {"code": code}
solution = Solution() code = solution.solve()["code"] exec(code) </code></pre>
Created by
Website template modified from https://www.tbench.ai/.