FrontierCS: The Next Frontier of Computer Science

FrontierCS is a benchmark of open-ended problems across diverse areas of computer science.

read our paper ↗

view model performance

view the full leaderboard ↗

view FrontierCS problems

World Map

Algorithmic

Mr. Pacha, a Bolivian archeologist, discovered an ancient document near Tiwanaku that describes the world during the Tiwanaku Period (300-1000 CE). At that time, there were **N countries**, numbered from **1** to **N**. The document contains a list of **M** different pairs of adjacent countries: - (A[0], B[0]), (A[1], B[1]), ..., (A[M−1], B[M−1]) For each i (0 ≤ i < M), the document states that country A[i] was adjacent to country B[i] and vice versa. Pairs of countries not listed were **not adjacent**. Task -------------------------------------------------- Mr. Pacha wants to create a map of the world such that all adjacencies between countries are exactly as they were during the Tiwanaku Period. ### Map Creation Process 1. Choose a positive integer **K** 2. Draw the map as a grid of **K × K** square cells - Rows numbered from 0 to K−1 (top to bottom) - Columns numbered from 0 to K−1 (left to right) 3. Color each cell using one of N colors (colors numbered 1 to N) - Country j (1 ≤ j ≤ N) is represented by color j ### Coloring Constraints The coloring must satisfy **all** of the following conditions: 1. **Complete Coverage**: For each j (1 ≤ j ≤ N), there is at least one cell with color j 2. **Adjacency Preservation**: For each pair of adjacent countries (A[i], B[i]), there is at least one pair of adjacent cells where one is colored A[i] and the other is colored B[i] - Two cells are adjacent if they share a side 3. **No False Adjacencies**: For each pair of adjacent cells with different colors, the countries represented by these two colors were adjacent during the Tiwanaku Period ### Example If **N = 3**, **M = 2** and the pairs of adjacent countries are **(1,2)** and **(2,3)**, then the pair **(1,3)** was not adjacent. The following map of dimension **K = 3** satisfies all conditions: ``` 2 3 3 2 3 2 1 2 1 ``` **Note:** A country does not need to form a connected region on the map. In the map above, country 3 forms a connected region, while countries 1 and 2 form disconnected regions. Your Task -------------------------------------------------- Help Mr. Pacha choose a value of K and create a map. The document guarantees that such a map exists. Since Mr. Pacha prefers smaller maps, **your score depends on the value of K**, and lower values of K may result in a better score. However, finding the minimum possible value of K is not required. Implementation Details -------------------------------------------------- You should implement the following procedure: ```cpp std::vector<std::vector<int>> create_map(int N, int M, std::vector<int> A, std::vector<int> B) ``` ### Parameters - **N**: The number of countries - **M**: The number of pairs of adjacent countries - **A** and **B**: Arrays of length M describing adjacent countries ### Return Value The procedure should return an array **C** that represents the map: - Let K be the length of C - Each element of C must be an array of length K - Values must be integers between 1 and N inclusive - C[i][j] is the color of the cell at row i and column j (for 0 ≤ i, j < K) - **K must be less than or equal to 240** Constraints -------------------------------------------------- - 1 ≤ N ≤ 40 - 0 ≤ M ≤ N·(N−1)/2 - 1 ≤ A[i] < B[i] ≤ N for each i such that 0 ≤ i < M - The pairs (A[0], B[0]), ..., (A[M−1], B[M−1]) are distinct - There exists at least one map which satisfies all the conditions Scoring -------------------------------------------------- You need to make **R = K/N** as small as possible. A smaller R will result in a better score. Examples -------------------------------------------------- ### Example 1 **Call:** ```cpp create_map(3, 2, [1, 2], [2, 3]) ``` This is the example from the task description. The procedure can return: ```cpp [ [2, 3, 3], [2, 3, 2], [1, 2, 1] ] ``` ### Example 2 **Call:** ```cpp create_map(4, 4, [1, 1, 2, 3], [2, 3, 4, 4]) ``` In this example: - **N = 4**, **M = 4** - Adjacent country pairs: (1,2), (1,3), (2,4), (3,4) - Not adjacent: (1,4), (2,3) The procedure can return the following map of dimension **K = 7**: ```cpp [ [2, 1, 3, 3, 4, 3, 4], [2, 1, 3, 3, 3, 3, 3], [2, 1, 1, 1, 3, 4, 4], [2, 2, 2, 1, 3, 4, 3], [1, 1, 1, 2, 4, 4, 4], [2, 2, 1, 2, 2, 4, 3], [2, 2, 1, 2, 2, 4, 4] ] ``` The map could be smaller. For example, the following map of dimension **K = 2**: ```cpp [ [3, 1], [4, 2] ] ``` **Note:** Both maps satisfy K/N ≤ 2. Sample Grader -------------------------------------------------- ### Input Format ``` T # Number of scenarios N M # First scenario A[0] B[0] ... A[M-1] B[M-1] ``` ### Output Format ``` P # Length of array C Q[0] Q[1] ... Q[P-1] # Length of each row C[0][0] ... C[0][Q[0]-1] # First row ... C[P-1][0] ... C[P-1][Q[P-1]-1] # Last row ``` Here, P is the length of the array C returned by `create_map`, and Q[i] (0 ≤ i < P) is the length of C[i].

Treasure Packing

Algorithmic

A dragon has terrorized the region for decades, breathing fire and stealing treasures. A hero has decided to steal back the treasures and return them to the community. However, they drastically underestimated the size of the dragon's hoard, bringing only a **single 25-liter bag**, only strong enough to hold **20 kg**, to carry out the treasures. The hero has decided to try to return **as much value as possible**. The dragon has **12 different types** of items of different values, masses, and volumes: crowns, figurines, jeweled goblets, gold helms, etc. **Important:** Each item of the same type has the same mass and volume. For example, all goblets have the same mass and volume. Your Task -------------------------------------------------- Write a program to help the hero determine what items they should put in the bag to **maximize value** while accounting for the constraints on the bag. ### Constraints - **Weight limit:** 20 kg (20,000,000 mg) - **Volume limit:** 25 liters (25,000,000 µliters) Input Format -------------------------------------------------- The input is **JSON** keyed by treasure category name (e.g., `"goblet"`). - Every category has a **unique name** composed of at most 100 lowercase ASCII characters - There are **exactly twelve** treasure categories Each corresponding value is a list with: 1. **q** (1 ≤ q ≤ 10,000): Maximum number of treasures of this category that can be looted 2. **v** (0 < v ≤ 10⁹): Value of one item 3. **m** (0 < m ≤ 20×10⁶): Mass in **mg** of one item (1 million mg per kg) 4. **l** (0 < l ≤ 25×10⁶): Volume in **µliters** of one item (1 million µliters per liter) ### Sample Input ```json { "circlet": [19, 113005, 146800, 514247], "coppercoin": [887, 6171, 12593, 18081], "crown": [13, 726439, 1079353, 1212213], "dagger": [21, 279513, 558367, 522344], "figurine": [18, 26272, 1488281, 2295986], "goblet": [22, 300053, 698590, 986387], "goldcoin": [129, 14426, 82176, 27597], "helm": [3, 974983, 2470209, 882803], "jewelry": [23, 272450, 171396, 262226], "plate": [22, 98881, 246257, 363420], "silvercoin": [288, 12587, 53480, 28654], "torc": [17, 244957, 388222, 500000] } ``` Output Format -------------------------------------------------- Print a **JSON** with the same keys as the input, but with **single nonnegative integer values** giving the numbers of items of that category added to the bag in your solution. ### Sample Output ```json { "circlet": 2, "coppercoin": 8, "crown": 13, "dagger": 0, "figurine": 0, "goblet": 0, "goldcoin": 0, "helm": 0, "jewelry": 23, "plate": 0, "silvercoin": 1, "torc": 4 } ``` Scoring -------------------------------------------------- Producing an algorithm that always generates optimal solutions is very difficult, so your solution only needs to be **"good enough"**. We will compare your output to: 1. A **baseline heuristic** provided by the NSA 2. A **best effort** to compute the true optimum ### Score Calculation Your score on this problem is the **average of the per-test-case score**: ``` Score = 100 × clamp((your_value - baseline_value) / (best_value - baseline_value), 0, 1) ``` Where: - **your_value**: Total value of treasures in your solution - **baseline_value**: Total value from the NSA heuristic - **best_value**: Best known or computed optimal value **Requirements:** - You **must beat the NSA heuristic** to receive points - The better your solution compared to the optimum, the higher your score Problem Type -------------------------------------------------- This is a **variant of the 0/1 Knapsack Problem** with two constraints (weight and volume) instead of one. This is also known as the **Multidimensional Knapsack Problem**. ### Key Differences from Classic Knapsack: - **Dual constraints:** Must satisfy both weight AND volume limits - **Multiple copies:** Can take multiple items of the same type (up to q items) - **Approximate solutions:** Getting close to optimal is sufficient for a good score

Permutation Guess

Algorithmic

There is a **hidden permutation** of n integers. Recall that a permutation of n is a sequence where each integer from **1 to n** (both inclusive) appears **exactly once**. Your task is to unravel the permutation using queries. ### Query Mechanism Each query consists of: - A sequence (not necessarily a permutation) with **n integers** ranging from 1 to n (both inclusive) Each query is answered with: - An integer **x**, indicating the **number of positions** where the corresponding element in your query sequence matches that of the hidden permutation ### Example If the hidden permutation is `{1, 3, 4, 2, 5}` and your query is `{2, 3, 5, 2, 5}`: - Position 1: 1 ≠ 2 ❌ - Position 2: 3 = 3 ✓ - Position 3: 4 ≠ 5 ❌ - Position 4: 2 = 2 ✓ - Position 5: 5 = 5 ✓ The answer will be **3** (three positions match). Input -------------------------------------------------- There is only one test case in each test file. The first line contains an integer **n** (1 ≤ n ≤ 10³) indicating the length of the hidden permutation. Interaction Protocol -------------------------------------------------- ### To Ask a Query Output one line in the following format: ``` 0 a[1] a[2] ... a[n] ``` Where: - First output `0` followed by a space - Then print a sequence of n integers ranging from 1 to n, separated by spaces - **After flushing your output**, read a single integer x indicating the answer to your query ### To Submit Your Guess Output one line in the following format: ``` 1 p[1] p[2] ... p[n] ``` Where: - First output `1` followed by a space - Then print your guessed permutation of n, separated by spaces - **After flushing your output, exit immediately** ### Important Notes - The answer for each test case is **pre-determined** (the interactor is not adaptive) - Your final guess **does not count** as a query - Invalid queries or exceeding the query limit may result in a Time Limit Exceeded verdict ### Flushing Output To flush your output, use: | Language | Flush Command | |---|---| | C (printf) | `fflush(stdout)` | | C++ (cout) | `cout.flush()` | | Java | `System.out.flush()` | | Python | `stdout.flush()` | Scoring -------------------------------------------------- This problem is graded based on the **number of queries** you use. ### Requirements - You **must use fewer than 10,000 queries** to receive any points - After that, your answer will be compared to a reference solution (`best_queries`) ### Score Calculation Your final score is calculated as the **average** across all test cases: ``` Score = 100 × clamp((10000 - your_queries) / (10000 - best_queries), 0, 1) ``` Where: - **your_queries**: Number of queries you used - **best_queries**: Number of queries in the reference solution - Lower query count = higher score Example Interaction -------------------------------------------------- **Note:** In the example below, `n = 5` and the hidden permutation is `{3, 1, 5, 2, 4}`. | Your Output | Interactor Response | Explanation | |---|---|---| | `0 3 1 3 2 2` | `3` | Matches at positions 1, 2, and 4 | | `0 3 1 5 2 2` | `4` | Matches at positions 1, 2, 3, and 4 | | `0 3 5 4 4 4` | `2` | Matches at positions 1 and 5 | | `1 3 1 5 2 4` | — | Final guess submitted | ### Example Analysis From the queries: 1. Query `{3, 1, 3, 2, 2}` → 3 matches 2. Query `{3, 1, 5, 2, 2}` → 4 matches (one more match when position 3 is 5) 3. Query `{3, 5, 4, 4, 4}` → 2 matches (positions 1 and 5) From these queries, we can deduce the hidden permutation is `{3, 1, 5, 2, 4}`. Strategy Tips -------------------------------------------------- 1. **Start simple**: Try systematic approaches like checking one position at a time 2. **Use information wisely**: Each query gives you information about multiple positions 3. **Binary search**: Consider strategies that eliminate multiple possibilities per query 4. **Optimize**: Try to minimize queries while maintaining correctness 5. **Pre-determination**: Since the permutation is pre-determined, you can use deterministic strategies

Square Packing

Algorithmic

You are given an integer **n**. Your task is to place **n unit squares** (side length 1) inside an **axis-parallel square container** of side length **L** such that: - Every unit square lies entirely inside the container - Unit squares have **no common interior points** (touching edges/corners is allowed) - Each unit square may be **rotated by an arbitrary angle** ### Goal **Minimize L** (the container side length). Input -------------------------------------------------- A single integer **n** (1 ≤ n ≤ 100,000). Output -------------------------------------------------- **First line:** A real number **L** (the claimed container side length) **Next n lines:** Three real numbers `xi yi ai` for i = 1..n: - `(xi, yi)`: The center of the i-th unit square - `ai`: Its rotation angle in degrees counterclockwise (0 ≤ ai < 180) **Precision:** All numbers must be finite reals. At least **6 decimal places** is recommended. ### Sample Output Format ``` L x1 y1 a1 x2 y2 a2 ... xn yn an ``` Validity Constraints -------------------------------------------------- Your output is valid if and only if: 1. **Containment:** Every point of every unit square lies inside [0, L] × [0, L] 2. **Non-overlap (interiors):** The interiors of any two unit squares are disjoint - Touching along edges or at corners is **allowed** 3. **Angles:** 0 ≤ ai < 180 for all i ### Judge Verification (epsilon = 1e-7) - **Containment:** A square is accepted if its maximum outward violation beyond the container is ≤ 1e-7 - **Non-overlap:** Two squares are accepted as interior-disjoint if the minimum signed distance between their boundaries is ≥ −1e-7 **Invalid submissions score 0 for that test.** Baseline -------------------------------------------------- A simple baseline packs all squares **unrotated** (ai = 0) on the unit grid, with centers at (0.5 + x, 0.5 + y), using the smallest integer-sided container that fits all n. The baseline side length is: ``` L₀(n) = ceil(√n) ``` **Example:** For n = 11, the baseline uses L₀ = 4.000000 Scoring -------------------------------------------------- Let **L** be your reported container side length (validated by the judge). Define: - **LB = √n** (trivial area lower bound; no packing can have L < LB) - **L₀ = ceil(√n)** (baseline side length) - **s = s(n)** (reference scale; satisfies LB ≤ s ≤ L₀) ### Score Calculation | Condition | Score | |---|---| | Invalid submission | 0 points | | L ≥ L₀ | 1 point | | L = LB | 100 points | | LB < L ≤ s | 95 + 5 × min(1.0, 1.1 × p₂) where p₂ = (s − L) / (s − LB) | | s < L < L₀ | 94 × min(1.0, 1.1 × p₁) + 1 where p₁ = (L₀ − L) / (L₀ − s) | ### Scoring Notes - **Baseline (L = L₀):** 1 point - **Meeting s(n):** At least 95 points - **Area bound (L = LB):** 100 points - Scores vary smoothly between these anchors - The +10% amplification is applied within each band and capped at that band's ceiling (95 or 100) ### Reference Scale s(n) For n ≤ 100: s(n) is the best human score For n > 100: s(n) = 2 × s(ceil(n / 4)) This recursive definition allows scoring for large n based on smaller instances. Example -------------------------------------------------- ### Input ``` 11 ``` ### Sample Output (Baseline) ``` 4.000000 0.500000 0.500000 0.000000 1.500000 0.500000 0.000000 2.500000 0.500000 0.000000 3.500000 0.500000 0.000000 0.500000 1.500000 0.000000 1.500000 1.500000 0.000000 2.500000 1.500000 0.000000 3.500000 1.500000 0.000000 0.500000 2.500000 0.000000 1.500000 2.500000 0.000000 2.500000 2.500000 0.000000 ``` This is a valid baseline packing for n = 11 with L = 4.000000, achieving the minimum score of 1 point. Additional Clarifications -------------------------------------------------- ### Unit Squares - Side length **exactly 1** - Centered at (xi, yi) - Rotated by ai degrees around the center ### Ordering Squares may be listed in **any order**. ### Precision - Inputs are read as doubles - Small rounding differences are tolerated per epsilon (1e-7) ### Touching - Squares **may touch** each other and the container boundary - Only **interior overlap is forbidden** Strategy Tips -------------------------------------------------- 1. **Start with baseline:** Implement the simple grid-based baseline first 2. **Local optimization:** Use local search or simulated annealing to improve placements 3. **Smart rotations:** Small rotations can often reduce wasted space 4. **Nonlinear optimization:** Apply optimization techniques with overlap penalties 5. **Known packings:** For small n (≤ 100), research optimal or near-optimal packings 6. **Recursive approach:** For large n, consider dividing into smaller subproblems Theoretical Background -------------------------------------------------- - **Area lower bound:** Since each unit square has area 1, and n squares must fit in L × L, we have L² ≥ n, thus L ≥ √n - **Compactness:** For fixed n and L, the feasible set is closed and bounded, guaranteeing that a minimal container side length exists - **Complexity:** Finding the optimal packing is a hard geometric optimization problem; heuristics are essential for large n

Kernel Optimization for GDPA

Research

Learn a closed-form expression `f(x1, …, xd)` that predicts the target `y` for several synthetic datasets. Each dataset is provided as a CSV file with columns `x1, x2, …, xd, y`. Your solver must fit a single symbolic expression per dataset. All datasets share the same API and scoring rules. Input Format ------------ - During evaluation, your `Solution.solve` implementation receives the full feature matrix `X` (`numpy.ndarray` of shape `n × d`) and target vector `y` (`numpy.ndarray` of length `n`). - Datasets available under `resources/data/` (and mirrored via `download_datasets.sh`) include: - `dataset_mccormick.csv` - `dataset_peaks.csv` - `dataset_sincos.csv` - `dataset_ripple.csv` - `dataset_mixed_polyexp_4d.csv` Output Specification -------------------- Implement `Solution` in `solution.py` with the following interface: ```python class Solution: def __init__(self, **kwargs): ... def solve(self, X, y) -> dict: return { "expression": "<python expression in x1..xd>", "predictions": [...], # length-n list/array (optional; auto-derived from expression if omitted) "details": { "loss": <optional>, "complexity": <optional integer>, ... }, } ``` - `expression` must be a single Python-evaluable string using variables `x1..xd`, numeric constants, binary operators `+ - * /`, parentheses, and unary functions `sin`, `cos`, `exp`, `log`. - If `predictions` are omitted or have the wrong length, the evaluator will regenerate them by evaluating `expression` on the dataset. - If `details["complexity"]` is missing, the evaluator computes complexity from the expression tree. Scoring ------- For each dataset: ``` MSE = (1/n) Σ_i (y_i - ŷ_i)² Score = 100 × clamp((m_base - MSE) / (m_base - m_ref), 0, 1) × 0.99^(max(C - C_ref, 0)) ``` where: - `m_base` is the Mean Squared Error of the linear baseline (precomputed). - `m_ref` and `C_ref` are the MSE and complexity of the provided reference expression. - `C` is the complexity of your expression, given by `2 × (#binary ops) + (#unary ops)`. - If `m_base = m_ref`, the score is 100 when `MSE ≤ m_ref` and 0 otherwise. The overall score reported by `evaluate.sh` is the mean score across datasets. Environment & Dependencies -------------------------- - `set_up_env.sh` creates a dedicated virtual environment in `execution_env/.venv_symbolic_regression` and installs `pysr`, `numpy`, `pandas`, and `sympy`. - If you rely on additional packages, install them from within your solution (network access permitting). Evaluation Flow --------------- 1. `download_datasets.sh` copies the CSV files into the shared `datasets/symbolic_regression` cache. 2. `set_up_env.sh` prepares the Python environment. 3. Your `solution.py` should reside in `execution_env/solution_env/solution.py`. 4. `evaluate.sh` activates the environment, runs `evaluator.py`, and prints the mean score. Detailed per-dataset metrics are written to `result.json`. Resources --------- - `resources/data/`: CSV datasets used for training/evaluation. - `resources/reference_metrics.json`: Baseline linear-error and reference expression metrics used for scoring. - `evaluator.py`: Evaluation entry point invoked by `evaluate.sh`.

Vector Database Design

Research

Design a Vector Database index optimized for **recall** subject to a **latency constraint**. This tier uses latency-gated scoring: solutions exceeding the latency threshold receive zero points, while solutions meeting the constraint are scored purely by recall@1. **Optimization Goal**: Maximize recall@1 within latency constraint $$ \text{score} = \begin{cases} 0 & \text{if } t_{\text{query}} > t_{\text{max}} \\ 100 & \text{if } t_{\text{query}} \leq t_{\text{max}} \text{ and } r \geq r_{\text{baseline}} \\ 100 \cdot \frac{r - r_{\text{min}}}{r_{\text{baseline}} - r_{\text{min}}} & \text{if } t_{\text{query}} \leq t_{\text{max}} \text{ and } r < r_{\text{baseline}} \end{cases} $$ Where: - $r$: Your recall@1 - $t_{\text{query}}$: Your average query latency (ms) - $r_{\text{baseline}} = 0.9914$ (baseline recall) - $r_{\text{min}} = 0.6939$ (minimum acceptable recall, 70% of baseline) - $t_{\text{max}} = 5.775\text{ms}$ (maximum allowed latency, 150% of baseline 3.85ms) **Key Insight**: Latency is a hard constraint. Only recall determines your score within the constraint. Baseline Performance -------------------- - Recall@1: **0.9914** (99.14%) - Avg query time: **3.85ms** - Baseline score: **100** (recall equals baseline within latency constraint) Scoring Examples ---------------- Assuming all solutions meet latency constraint ($t \leq 5.775\text{ms}$): | Recall@1 | Latency | Score Calculation | Score | |----------|---------|-------------------|-------| | 0.9914 | 3.85ms | $r = r_{\text{baseline}}$ → max score | **100** | | 0.9950 | 3.00ms | $r > r_{\text{baseline}}$ → max score | **100** | | 0.9500 | 2.50ms | $\frac{0.95 - 0.6939}{0.9914 - 0.6939} = 0.860$ | **86.0** | | 0.8500 | 4.00ms | $\frac{0.85 - 0.6939}{0.9914 - 0.6939} = 0.524$ | **52.4** | | 0.6939 | 5.00ms | $r = r_{\text{min}}$ → minimum score | **0** | | 0.9900 | **6.00ms** | $t > t_{\text{max}}$ → latency gate fails | **0** | **Note**: Faster latency does NOT increase score - only recall matters if constraint is met. API Specification ----------------- Implement a class with the following interface: ```python import numpy as np from typing import Tuple class YourIndexClass: def __init__(self, dim: int, **kwargs): """ Initialize the index for vectors of dimension `dim`. Args: dim: Vector dimensionality (e.g., 128 for SIFT1M) **kwargs: Optional parameters (e.g., M, ef_construction for HNSW) Example: index = YourIndexClass(dim=128, M=16, ef_search=64) """ pass def add(self, xb: np.ndarray) -> None: """ Add vectors to the index. Args: xb: Base vectors, shape (N, dim), dtype float32 Notes: - Can be called multiple times (cumulative) - Must handle large N (e.g., 1,000,000 vectors) Example: index.add(xb) # xb.shape = (1000000, 128) """ pass def search(self, xq: np.ndarray, k: int) -> Tuple[np.ndarray, np.ndarray]: """ Search for k nearest neighbors of query vectors. Args: xq: Query vectors, shape (nq, dim), dtype float32 k: Number of nearest neighbors to return Returns: (distances, indices): - distances: shape (nq, k), dtype float32, L2 distances - indices: shape (nq, k), dtype int64, indices into base vectors Notes: - Must return exactly k neighbors per query - Indices should refer to positions in the vectors passed to add() - Lower distance = more similar Example: D, I = index.search(xq, k=1) # xq.shape = (10000, 128) # D.shape = (10000, 1), I.shape = (10000, 1) """ pass ``` **Implementation Requirements**: - Class can have any name (evaluator auto-discovers classes with `add` and `search` methods) - Must handle SIFT1M dataset: 1M base vectors, 10K queries, 128 dimensions - Your `search` must return tuple `(distances, indices)` with shapes `(nq, k)` - Distances should be L2 (Euclidean) or L2-squared - No need to handle dataset loading - evaluator provides numpy arrays Evaluation Process ------------------ The evaluator follows these steps: ### 1. Load Dataset ```python from faiss.contrib.datasets import DatasetSIFT1M ds = DatasetSIFT1M() xb = ds.get_database() # (1000000, 128) float32 xq = ds.get_queries() # (10000, 128) float32 gt = ds.get_groundtruth() # (10000, 100) int64 - ground truth indices ``` ### 2. Build Index ```python from solution import YourIndexClass # Auto-discovered d = xb.shape[1] # 128 for SIFT1M index = YourIndexClass(d) # Pass dimension as first argument index.add(xb) # Add all 1M base vectors ``` ### 3. Measure Performance (Batch Queries) ```python import time t0 = time.time() D, I = index.search(xq, k=1) # Search all 10K queries at once t1 = time.time() # Calculate metrics recall_at_1 = (I[:, :1] == gt[:, :1]).sum() / len(xq) avg_query_time_ms = (t1 - t0) * 1000.0 / len(xq) ``` **Important**: `avg_query_time_ms` from **batch queries** is used for scoring. Batch queries benefit from CPU cache and vectorization, typically faster than single queries. ### 4. Calculate Score ```python if avg_query_time_ms > 5.775: score = 0.0 elif recall_at_1 >= 0.9914: score = 100.0 else: recall_range = 0.9914 - 0.6939 recall_proportion = (recall_at_1 - 0.6939) / recall_range score = max(0.0, min(100.0, 100.0 * recall_proportion)) ``` Dataset Details --------------- - **Name**: SIFT1M - **Base vectors**: 1,000,000 vectors of dimension 128 - **Query vectors**: 10,000 vectors - **Ground truth**: Precomputed nearest neighbors (k=1) - **Metric**: L2 (Euclidean distance) - **Vector type**: float32 Runtime Platform ---------------- - **Infrastructure**: Evaluations run on SkyPilot-managed cloud instances (AWS, GCP, or Azure) - **Compute**: CPU-only instances (no GPU required) - **Environment**: Docker containerized execution with Python 3, NumPy ≥1.24, FAISS-CPU ≥1.7.4 Constraints ----------- - **Timeout**: 1 hour for entire evaluation (index construction + queries) - **Memory**: Use reasonable memory (index should fit in RAM) - **Latency constraint**: avg_query_time_ms ≤ 5.775ms - **Recall range**: 0.6939 ≤ recall@1 ≤ 1.0 Strategy Tips ------------- 1. **Focus on recall**: Latency only needs to meet threshold, doesn't improve score beyond that 2. **Batch optimization is key**: Your `search` should handle batch queries efficiently 3. **Parameter tuning**: Small changes (e.g., HNSW's M, ef_search) significantly affect recall 4. **Don't over-optimize latency**: Meeting 5.775ms is enough; focus energy on recall Example: Simple Baseline ------------------------- ```python import numpy as np class SimpleIndex: def __init__(self, dim: int, **kwargs): self.dim = dim self.xb = None def add(self, xb: np.ndarray) -> None: if self.xb is None: self.xb = xb.copy() else: self.xb = np.vstack([self.xb, xb]) def search(self, xq: np.ndarray, k: int) -> tuple: # Compute all pairwise L2 distances # xq: (nq, dim), xb: (N, dim) # distances: (nq, N) distances = np.sqrt(((xq[:, np.newaxis, :] - self.xb[np.newaxis, :, :]) ** 2).sum(axis=2)) # Get k nearest neighbors indices = np.argpartition(distances, k-1, axis=1)[:, :k] sorted_indices = np.argsort(distances[np.arange(len(xq))[:, None], indices], axis=1) final_indices = indices[np.arange(len(xq))[:, None], sorted_indices] final_distances = distances[np.arange(len(xq))[:, None], final_indices] return final_distances, final_indices ``` **Note**: This baseline achieves perfect recall (100%) but is too slow for large datasets. Use approximate methods like HNSW, IVF, or LSH for better speed-recall tradeoffs. Debugging Tips -------------- - **Test locally**: Use a subset of data (e.g., 10K vectors) for faster iteration - **Verify shapes**: Ensure `search` returns `(nq, k)` shaped arrays - **Check recall calculation**: `(I[:, :1] == gt[:, :1]).sum() / len(xq)` - **Profile latency**: Measure batch vs single query performance separately - **Validate before submit**: Run full 1M dataset locally if possible

view all FrontierCS problems ↗

submit your results

Send us an email to submit your results: qmang@berkeley.edu wenhao.chai@princeton.edu

Website template modified from https://www.tbench.ai/.