For step-by-step instructions on how to submit a model, please refer to the Python or R guide, as applicable.
All information used for portfolio construction at time t must be available at or before time t. Portfolio weights cannot use future information in any form.
You cannot use returns from June 2025 to create a portfolio in May 2025. All information used for portfolio construction at time t must be available at or before time t.
Feature selection must be entirely algorithmic. Manual selection of features based on historical performance knowledge is prohibited. All feature engineering must be performed programmatically within the submitted code.
Creating a model that exclusively uses 12-month return momentum and book-to-market equity to build a portfolio. Such selections could introduce look-ahead bias if the feature choices were influenced by their known historical performance.
Building a portfolio based on the ten best-performing characteristics at time t, where the selection is determined algorithmically using only information available at that point in time.
Only the provided CTF dataset is permitted. External data sources, including macroeconomic indicators, alternative data, or web-scraped information, are prohibited. Feature engineering through mathematical transformations of provided characteristics is permitted.
Using external macroeconomic data, alternative datasets, or web-scraped information to enhance your model.
Creating new features by transforming or combining existing characteristics in the provided dataset (e.g., ratios, moving averages, or interaction terms).
Submissions must be fully reproducible. All code must be self-contained with complete dependency specifications:
requirements.txt or pyproject.toml for package dependenciesrenv.lock for reproducible package versionsDependencies must be available from PyPI (Python) or CRAN (R). Private package repositories or local packages are not supported. See Rule 8 for pre-installed packages.
statsmodels==0.14.0 not statsmodels>=0.14)uv (Python) or renv (R) for precise dependency management and reproducibilityRequired submission components:
main() function with the specified signature (see Rules 11-12)ctff_test is TruePortfolios must be rebalanced monthly. No constraints are imposed on portfolio construction methodology. Shorting, leverage, position limits, and turnover constraints are permitted at the submitter's discretion.
Submissions of prior published work are encouraged. Multiple submissions are permitted to allow iterative improvement. Standard academic citation practices are recommended for methodology descriptions.
Each execution environment includes a baseline set of packages. Additional dependencies may be specified in your dependency file.
Python Pre-Installed Packages:
pandas (≥2.0.0)numpy (≥1.24.0)pyarrow (≥10.0.0)boto3 (≥1.26.0)scipyscikit-learnpolarsjoblibR Pre-Installed Packages:
arrow (required for Parquet I/O)data.table (high-performance data manipulation)dplyr (tidyverse data manipulation)tidyr (data tidying)Additional packages may be installed by including a requirements.txt (Python) or renv.lock (R) file with your submission. All additional packages must pass security scanning before installation.
Submissions execute on high-performance computing infrastructure with the following resource allocations:
| Resource | Specification |
|---|---|
| CPU Cores | 32 |
| Memory | 300 GB RAM |
| Execution Time Limit | 24 hours |
Submissions that exceed memory limits will terminate with an out-of-memory error. Submissions that exceed the time limit will be terminated and marked as failed.
float32 instead of float64)Note: Resource specifications may be updated. Check this page for current values.
Submitted code executes in a fully isolated network environment:
Any code that requires runtime network access will fail. All data required for model execution is provided via the function arguments.
Your submission must define a main function with the exact signature specified for your language:
def main(chars: pd.DataFrame, features: pd.DataFrame, daily_ret: pd.DataFrame) -> pd.DataFrame:
"""
Args:
chars: Stock characteristics (ctff_chars.parquet)
features: Computed features (ctff_features.parquet)
daily_ret: Historical daily returns (ctff_daily_ret.parquet)
Returns:
DataFrame with columns: id, eom, w
"""
# Your model logic
return output_df
main <- function(chars, features, daily_ret) {
# chars: data.frame from ctff_chars.parquet
# features: data.frame from ctff_features.parquet
# daily_ret: data.frame from ctff_daily_ret.parquet
# Return data.frame with columns: id, eom, w
return(output_df)
}
Submissions without a valid main function will fail validation.
Your main() function must return a DataFrame with the following schema:
| Column | Type | Description |
|---|---|---|
id |
integer | Security identifier from the input data |
eom |
date | End of month date (YYYY-MM-DD format) |
w |
float | Portfolio weight |
id column:The id values come from the input data and must be returned unchanged. For CRSP securities, the id is the CRSP permno. For Compustat securities, the id is a composite identifier. Your output should use the same id values present in the input DataFrames.
Example output:
| id | eom | w |
|---|---|---|
| 10006 | 2024-01-31 | 0.05 |
| 17566 | 2024-01-31 | 0.03 |
| 38914 | 2024-01-31 | -0.02 |
Validation Requirements:
id, eom, w (case-sensitive)The pipeline infrastructure automatically captures your function's return value and writes it to the output file. Your results must be returned exclusively via the main() function's return value. Do not attempt startup scripts, container entrypoints, or direct output file writing.
Source code files must meet the following requirements:
| Constraint | Limit |
|---|---|
| Maximum file size | 1 MB per file |
| File encoding | UTF-8 |
| Binary files | Not permitted (source files only) |
Files exceeding these limits will be rejected during validation.
Your main() function receives three DataFrames as arguments, loaded from the following Parquet files:
| Argument | Source File | Description |
|---|---|---|
chars |
ctff_chars.parquet |
Stock characteristics (fundamental data) |
features |
ctff_features.parquet |
Computed features (technical indicators) |
daily_ret |
ctff_daily_ret.parquet |
Historical daily returns |
The pipeline loads these files and passes them to your function. You do not need to read files directly.
The CTF_EXECUTION_MODE environment variable indicates which mode is running, but most models do not need to check this.
The following operations are prohibited and will cause submission rejection:
Network Operations:
socket, urllib, requests, http.client (Python)download.file(), url(), httr calls (R)Shell Execution:
subprocess, os.system, os.popen (Python)system(), system2(), shell() (R)Dynamic Code Execution:
eval(), exec(), compile() (Python)eval(), parse() with arbitrary strings (R)Filesystem Access:
Credential Exposure:
Submissions containing these patterns will fail security validation.
All package dependencies are scanned for known vulnerabilities before installation:
Scanning Tools:
Submissions with dependencies containing HIGH or CRITICAL severity vulnerabilities will be rejected. If you believe your submission was incorrectly rejected, contact the administrators.
Standard output and error streams from your code are captured:
print() (Python) or print()/cat() (R) for debugging outputConsider logging progress updates, timing information, and intermediate results to aid troubleshooting.
For reproducible results:
np.random.seed(42) or set.seed(42))While not strictly enforced, reproducible code helps with debugging and validation.
main() functionNote: These rules are designed to ensure the academic integrity, security, and real-world applicability of submitted models. Adherence to these guidelines is essential for meaningful comparative analysis within the Common Task Framework. Rules and resource specifications may be updated; please check this page regularly for the most current requirements.