Global Factor Data

Common Task Framework Rules

For step-by-step instructions on how to submit a model, please refer to the Python or R guide, as applicable.

Rule 1: Temporal Integrity

Portfolio weights at time t cannot be based on information available after time t. This ensures that all model predictions are feasible and implementable in real-time trading scenarios.

Example:

You cannot use returns from June 2025 to create a portfolio in May 2025. All information used for portfolio construction at time t must be available at or before time t.

Rule 2: Feature Selection Constraints

Features cannot be manually selected. Feature selection must be algorithmic/rule-based. This prevents the introduction of look-ahead bias in historical backtests, which would artificially inflate model performance metrics.

Prohibited Example:

Creating a model that exclusively uses 12-month return momentum and book-to-market equity to build a portfolio. Such selections could introduce look-ahead bias if the feature choices were influenced by their known historical performance.

Permitted Example:

Building a portfolio based on the ten best-performing characteristics at time t, where the selection is determined algorithmically using only information available at that point in time.

Rule 3: Data Sources

Users can only use the provided CTF dataset from WRDS. No external data sources are permitted. This ensures a level playing field and consistent comparisons across all submitted models.

Prohibited Example:

Using external macroeconomic data, alternative datasets, or web-scraped information to enhance your model.

Permitted Example:

Creating new features by transforming or combining existing characteristics in the provided dataset (e.g., ratios, moving averages, or interaction terms).

Rule 4: Reproducibility Requirements

Code must be completely self-contained and fully reproducible. Submissions should include a dependency specification file (for example, requirements.txt or pyproject.toml for Python, and DESCRIPTION or renv.lock for R). Any package dependencies must be pulled from the Python Package Index (PyPI) or the Comprehensive R Archive Network (CRAN). This ensures that all submissions can be reliably tested in our HPC environment.

Recommendation:

Consider using modern dependency management tools (e.g., uv for Python or renv for R) to ensure precise version control and reproducibility of your environment.

Rule 5: Technical Implementation

Submissions must include the following components:

A self-contained R or Python script that, given training data, assigns portfolio weights for each stock-month in a generic test set (must be out-of-sample). See the Python and R guides for more details.
A CSV file with portfolio weights for all observations from chars.parquet where the column “ctff_test” is True.
A PDF document describing the methodology. The document can be a full research paper or a concise step-by-step document. The only requirement is that the document contains enough information for people to understand the methodology.

Rule 6: Portfolio Construction

Portfolios should be rebalanced monthly, as reflected in the required CSV output format. This aligns with standard academic practice and ensures consistency across submissions.

There are currently no imposed portfolio constraints. You may implement shorting, leverage, position limits, or turnover constraints as you see fit for your strategy.

Rule 7: Academic Integrity

Users are permitted and encouraged to submit copies of their prior work. The goal is to assemble a comprehensive collection of models proposed in the academic literature.

While there are no explicit requirements for citing prior research, standard academic practices are encouraged when building upon existing methodologies.

Multiple submissions are allowed. You may submit various models or iterations to explore different approaches to portfolio construction.

Note: These rules are designed to ensure the academic integrity and real-world applicability of submitted models. Adherence to these guidelines is essential for meaningful comparative analysis within the Common Task Framework.