Description
System Information (please complete the following information):
- OS & Version:Windows 11
- ML.NET Version: latest from main
- .NET Version: .NET 7.0
This is also just to point out a "pain point". I have a temporary solution. There probably exists a better way, but I could not find it.
ColumnInformation is needed by ctx.Auto().Featurizer(data, columnInformation: columnInference.ColumnInformation)
Samples use ctx.Auto().InferColumns but this method only accepts CSV files.
How to get ColumnInformation if user reads the data from IDV binary file, or perhaps SQL Server, or uses custom data objects?
for example,
var data = ctx.Data.LoadFromBinary(source);
....
SweepablePipeline pipeline = ctx.Transforms.SelectColumns(columnsToKeep)
.Append(ctx.Auto().Featurizer(data, columnInformation: columnInference.ColumnInformation)) **<<--- How to get ColumnInformation from data ?**
.Append(ctx.Auto().BinaryClassification(labelColumnName: columnInference.ColumnInformation.LabelColumnName
Temporary workaround:
I will save column inference results as json from orginal CSV together with IDV file, then deserialize to get the matchin column information for said CSV file.
Thoughts:
IDataView has a Schema, just not inside a correct class. Is there a method to convert?