Skip to content

AutoML 2.0: Get columnInformation from binary IDV file #6653

Open
@torronen

Description

@torronen

System Information (please complete the following information):

  • OS & Version:Windows 11
  • ML.NET Version: latest from main
  • .NET Version: .NET 7.0

This is also just to point out a "pain point". I have a temporary solution. There probably exists a better way, but I could not find it.

ColumnInformation is needed by ctx.Auto().Featurizer(data, columnInformation: columnInference.ColumnInformation)

Samples use ctx.Auto().InferColumns but this method only accepts CSV files.

How to get ColumnInformation if user reads the data from IDV binary file, or perhaps SQL Server, or uses custom data objects?
for example,

var data = ctx.Data.LoadFromBinary(source);
....
 SweepablePipeline pipeline = ctx.Transforms.SelectColumns(columnsToKeep) 
                .Append(ctx.Auto().Featurizer(data, columnInformation: columnInference.ColumnInformation)) **<<--- How to get ColumnInformation from data ?**
                .Append(ctx.Auto().BinaryClassification(labelColumnName: columnInference.ColumnInformation.LabelColumnName

Temporary workaround:
I will save column inference results as json from orginal CSV together with IDV file, then deserialize to get the matchin column information for said CSV file.

Thoughts:
IDataView has a Schema, just not inside a correct class. Is there a method to convert?

Metadata

Metadata

Labels

AutoML.NETAutomating various steps of the machine learning processenhancementNew feature or requestneeds-further-triageuntriagedNew issue has not been triaged

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions