Open
Description
When using InferColumns
, there is now way to specify which columns to exclude.
// Define data path
var dataPath = Path.GetFullPath(@"../Data/issues_train.tsv");
// Infer column information
ColumnInferenceResults columnInference =
ctx.Auto().InferColumns(dataPath, separatorChar: '\t', labelColumnName: "Area", groupColumns: false);
That means that you have to do it manually.
var columnsToExclude = new[]{"ID","Title"};
columnInference.TextLoaderOptions.Columns =
columnInference.TextLoaderOptions.Columns
.Where(col => !columnsToExclude.Contains(col.Name)).ToArray();
columnInference.ColumnInformation.NumericColumnNames.Remove("ID");
columnInference.ColumnInformation.TextColumnNames.Remove("Title");
Since the TextLoaderOptions and ColumnInformation are used downstream
// Create text loader
TextLoader loader = ctx.Data.CreateTextLoader(columnInference.TextLoaderOptions);
// Load data into IDataView
IDataView data = loader.Load(dataPath);
SweepablePipeline pipeline =
ctx.Auto().Featurizer(data, columnInformation: columnInference.ColumnInformation)
.Append(ctx.Transforms.Conversion.MapValueToKey(columnInference.ColumnInformation.LabelColumnName))
.Append(ctx.Auto().MultiClassification(labelColumnName: columnInference.ColumnInformation.LabelColumnName))
.Append(ctx.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
In the InferColumns API, it might be the best place to set this value