Skip to content

Ignore Columns ColumnInference #6777

Open
@luisquintanilla

Description

@luisquintanilla

When using InferColumns, there is now way to specify which columns to exclude.

// Define data path
var dataPath = Path.GetFullPath(@"../Data/issues_train.tsv");

// Infer column information
ColumnInferenceResults columnInference =
    ctx.Auto().InferColumns(dataPath, separatorChar: '\t', labelColumnName: "Area", groupColumns: false);

That means that you have to do it manually.

var columnsToExclude = new[]{"ID","Title"};

columnInference.TextLoaderOptions.Columns = 
    columnInference.TextLoaderOptions.Columns
        .Where(col => !columnsToExclude.Contains(col.Name)).ToArray();

columnInference.ColumnInformation.NumericColumnNames.Remove("ID");
columnInference.ColumnInformation.TextColumnNames.Remove("Title");

Since the TextLoaderOptions and ColumnInformation are used downstream

// Create text loader
TextLoader loader = ctx.Data.CreateTextLoader(columnInference.TextLoaderOptions);

// Load data into IDataView
IDataView data = loader.Load(dataPath);

SweepablePipeline pipeline =
    ctx.Auto().Featurizer(data, columnInformation: columnInference.ColumnInformation)
        .Append(ctx.Transforms.Conversion.MapValueToKey(columnInference.ColumnInformation.LabelColumnName))
        .Append(ctx.Auto().MultiClassification(labelColumnName: columnInference.ColumnInformation.LabelColumnName))
        .Append(ctx.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

In the InferColumns API, it might be the best place to set this value

Metadata

Metadata

Assignees

No one assigned

    Labels

    AutoML.NETAutomating various steps of the machine learning processenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions