Kernel Optimization for GDPA
Research
Instruction
Learn a closed-form expression f(x1, …, xd) that predicts the target y for several synthetic datasets.
Each dataset is provided as a CSV file with columns x1, x2, …, xd, y. Your solver must fit a single symbolic expression per dataset. All datasets share the same API and scoring rules.
Input Format
- During evaluation, your
Solution.solveimplementation receives the full feature matrixX(numpy.ndarrayof shapen × d) and target vectory(numpy.ndarrayof lengthn). - Datasets available under
resources/data/(and mirrored viadownload_datasets.sh) include:dataset_mccormick.csvdataset_peaks.csvdataset_sincos.csvdataset_ripple.csvdataset_mixed_polyexp_4d.csv
Output Specification
Implement Solution in solution.py with the following interface:
class Solution:
def __init__(self, **kwargs):
...
def solve(self, X, y) -> dict:
return {
"expression": "<python expression in x1..xd>",
"predictions": [...], # length-n list/array (optional; auto-derived from expression if omitted)
"details": {
"loss": <optional>,
"complexity": <optional integer>,
...
},
}
expressionmust be a single Python-evaluable string using variablesx1..xd, numeric constants, binary operators+ - * /, parentheses, and unary functionssin,cos,exp,log.- If
predictionsare omitted or have the wrong length, the evaluator will regenerate them by evaluatingexpressionon the dataset. - If
details["complexity"]is missing, the evaluator computes complexity from the expression tree.
Scoring
For each dataset:
MSE = (1/n) Σ_i (y_i - ŷ_i)²
Score = 100 × clamp((m_base - MSE) / (m_base - m_ref), 0, 1) × 0.99^(max(C - C_ref, 0))
where:
m_baseis the Mean Squared Error of the linear baseline (precomputed).m_refandC_refare the MSE and complexity of the provided reference expression.Cis the complexity of your expression, given by2 × (#binary ops) + (#unary ops).- If
m_base = m_ref, the score is 100 whenMSE ≤ m_refand 0 otherwise.
The overall score reported by evaluate.sh is the mean score across datasets.
Environment & Dependencies
set_up_env.shcreates a dedicated virtual environment inexecution_env/.venv_symbolic_regressionand installspysr,numpy,pandas, andsympy.- If you rely on additional packages, install them from within your solution (network access permitting).
Evaluation Flow
download_datasets.shcopies the CSV files into the shareddatasets/symbolic_regressioncache.set_up_env.shprepares the Python environment.- Your
solution.pyshould reside inexecution_env/solution_env/solution.py. evaluate.shactivates the environment, runsevaluator.py, and prints the mean score. Detailed per-dataset metrics are written toresult.json.
Resources
resources/data/: CSV datasets used for training/evaluation.resources/reference_metrics.json: Baseline linear-error and reference expression metrics used for scoring.evaluator.py: Evaluation entry point invoked byevaluate.sh.
Created by
Website template modified from https://www.tbench.ai/.