Skip to content

Error During Retraining with New Labels #7187

Open
@willysoft

Description

@willysoft

Issue Description

I encountered an issue while attempting to retrain a model using the ML.NET framework. The retraining works perfectly when the new data contains existing labels, but it fails with the following error when new labels (not present in the original training data) are introduced:

// Retrain model
var retrainedModel = mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(
    new LbfgsMaximumEntropyMulticlassTrainer.Options() { 
        L1Regularization = 0.1195667F, 
        L2Regularization = 0.03125F, 
        LabelColumnName = @"col1", 
        FeatureColumnName = @"Features" 
    }).Fit(transformedNewData, originalModelParameters);

Error Message

System.InvalidOperationException: 'No valid training instances found, all instances have missing features.'

Steps to Reproduce

  1. Train an initial model using a dataset with a specific set of labels.
  2. Attempt to retrain the model using a new dataset that includes labels not present in the original dataset.

Expected Behavior

The model should be able to retrain successfully even when new labels are introduced in the retraining dataset.

Actual Behavior

The retraining process fails with an InvalidOperationException, stating that there are no valid training instances because all instances have missing features.

Environment

  • ML.NET version: 3.0.1
  • .NET version: net8.0
  • Operating System: Windows 10

Code Sample

public static void ReTrain(string outputModelPath, IEnumerable<ModelInput> newDatas)
{
    var mlContext = new MLContext();

    // Define DataViewSchema of data prep pipeline and trained model
    DataViewSchema dataPrepPipelineSchema, modelSchema;

    // Load data preparation pipeline and trained model
    var dataPrepPipeline = mlContext.Model.Load("data_preparation_pipeline.zip", out dataPrepPipelineSchema);
    var trainedModel = mlContext.Model.Load("ogd_model.zip", out modelSchema);

    // Extract trained model parameters
    var transformers = (IEnumerable<ITransformer>)trainedModel;
    var originalModelParameters = ((MulticlassPredictionTransformer<MaximumEntropyModelParameters>?)transformers.FirstOrDefault(x => x is MulticlassPredictionTransformer<MaximumEntropyModelParameters>))?.Model;

    // Load New Data
    var newDataView = mlContext.Data.LoadFromEnumerable(newDatas);

    // Preprocess Data
    var transformedNewData = dataPrepPipeline.Transform(newDataView);

    // Retrain model
    var retrainedModel = mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(
        new LbfgsMaximumEntropyMulticlassTrainer.Options() { 
            L1Regularization = 0.1195667F, 
            L2Regularization = 0.03125F, 
            LabelColumnName = @"col1", 
            FeatureColumnName = @"Features" 
        }).Fit(transformedNewData, originalModelParameters);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    classificationBugs related classification tasksenhancementNew feature or requestuntriagedNew issue has not been triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions