While the Train-Synthetic-Test-Real (TSTR) approach directly measures if synthetic data can replace real data for training, the Train-Real-Test-Synthetic (TRTS) methodology offers a complementary perspective. Instead of assessing the synthetic data's training utility, TRTS evaluates how well the synthetic data resembles the real data from the viewpoint of a model trained on the original dataset.
The TRTS workflow is essentially the inverse of TSTR:
- Train on Real Data: Train a chosen machine learning model (let's call it ModelR) using the original, real training dataset (Realtrain).
- Test on Synthetic Data: Evaluate the performance of ModelR using the synthetically generated dataset (Syntheticgen) as the test set.
- (Optional) Test on Real Hold-out Data: For comparison, evaluate ModelR on a held-out portion of the real data (Realtest).
Here's a diagram illustrating the flow:
The TRTS evaluation process: A model is trained exclusively on real data and subsequently tested on the synthetic dataset. Performance on a real test set often serves as a baseline.
Interpreting TRTS Results
The performance metrics obtained from testing ModelR on Syntheticgen (e.g., accuracy, AUC, F1-score) provide insights into the synthetic data's characteristics:
- High Performance on Synthetic Data: If ModelR performs well on Syntheticgen, it suggests that the patterns, relationships, and decision boundaries learned from Realtrain are also present and recognizable in the synthetic data. The synthetic data effectively mimics the distribution that ModelR learned.
- Low Performance on Synthetic Data: Conversely, poor performance indicates that ModelR, despite being proficient on the real data it was trained on, struggles to generalize to the synthetic data. This implies a distributional mismatch; the synthetic data lacks or misrepresents the patterns ModelR identified as important in the real data.
Comparing TRTS with TSTR and Baselines
Comparing the TRTS score (performance of ModelR on Syntheticgen) with the performance of the same model on the real test set (Realtest) is informative:
- TRTS Score ≈ Real Test Score: This is often a desirable outcome. It suggests the synthetic data mirrors the real data's characteristics well enough that a model trained on real data performs similarly on both. The synthetic data appears representative.
- TRTS Score > Real Test Score: This scenario might seem positive initially, but it requires careful scrutiny. It could mean the synthetic data is too similar to the specific Realtrain dataset used to train ModelR. This might happen if the generative model overfits or memorizes aspects of its training data. While fidelity to the training set is high, the synthetic data might lack the diversity or generalizability present in unseen real data (Realtest).
- TRTS Score < Real Test Score: This is a common result, indicating that the synthetic data doesn't capture all the nuances or the exact distribution of the real data, causing the real-data-trained model to perform worse on it.
TRTS complements TSTR by answering a different question.
- TSTR asks: "Can I train a useful model using only synthetic data?" (Focus: Replaceability)
- TRTS asks: "Does the synthetic data look statistically similar to the real data, from the perspective of a model trained on real data?" (Focus: Representativeness)
A high TSTR score indicates practical utility for model training. A high TRTS score suggests the generator successfully learned patterns from the real training set, but it doesn't guarantee TSTR utility, especially if the high score is due to overfitting during generation. Ideally, you seek synthetic data that performs well in both TSTR and TRTS evaluations relative to real-data baselines, indicating a good balance of utility and fidelity without simple memorization.
Like TSTR, the choice of the downstream model (ModelR) and the evaluation metrics can influence TRTS results. It's often beneficial to run TRTS using the same model architecture planned for the actual downstream task.