Description
ML.NET support at least two types of IDataView
serializations out of the box - text and binary files.
So I can use one of two to prepare my data set for AutoML
using (var stream = File.Create(textFileName))
mlContext.Data.SaveAsText(data, stream);
using (var stream = File.Create(binFileName))
mlContext.Data.SaveAsBinary(data, stream);
But when I try to use serialized file as an input for AutoML (both CLI and GUI version) it unable to parse them.
Binary format
Using binary format
mlnet auto-train --task binary-classification --dataset "data-bin.idv" --label-column-name IsCS --cache on --max-exploration-time 60 --verbosity diag
I see following error
Inferring Columns ...
An Error occured during inferring columns
Unable to split the file provided into multiple, consistent columns.
Microsoft.ML.AutoML.InferenceException: Unable to split the file provided into multiple, consistent columns.
at Microsoft.ML.AutoML.ColumnInferenceApi.InferSplit(MLContext context, TextFileSample sample, Nullable`1 separatorChar, Nullable`1 allowQuotedStrings, Nullable`1 supportSparse)
at Microsoft.ML.AutoML.ColumnInferenceApi.InferColumns(MLContext context, String path, ColumnInformation columnInfo, Nullable`1 separatorChar, Nullable`1 allowQuotedStrings, Nullable`1 supportSparse, Boolean trimWhitespace, Boolean groupColumns)
at Microsoft.ML.CLI.CodeGenerator.AutoMLEngine.InferColumns(MLContext context, ColumnInformation columnInformation)
at Microsoft.ML.CLI.CodeGenerator.CodeGenerationHelper.GenerateCode()
at Microsoft.ML.CLI.Program.<>c__DisplayClass1_0.<Main>b__0(NewCommandSettings options)
Please see the log file for more info.
Exiting ...
Text format
With --verbosity diag
it stuck on the line
Inferring Columns ...
Creating Data loader ...
Loading data ...
Exploring multiple ML algorithms and settings to find you the best model for ML task: binary-classification
For further learning check: https://aka.ms/mlnet-cli
| Trainer Accuracy AUC AUPRC F1-score Duration #Iteration |
[Source=AutoML, Kind=Trace] Channel started
with default verbosity
mlnet auto-train --task binary-classification --dataset "data-txt.tsv" --label-column-name IsCS --cache on --max-exploration-time 60
it return an error of type mismatch
xploring multiple ML algorithms and settings to find you the best model for ML task: binary-classification
For further learning check: https://aka.ms/mlnet-cli
──────────────────────────
Waiting for the first iteration to complete ... 00:00:00
Exception occured while exploring pipelines:
Provided label column 'IsCS' was of type Single, but only type Boolean is allowed.
Please see the log file for more info.
but data file looks correct (it serialized by ML.NET).
This is the header and first lines of dataset
#@ TextLoader{
#@ header+
#@ sep=tab
#@ col=IsCS:BL:0
#@ col=Features:R4:1-19
#@ }
IsCS 19 0:""
0 2 0.259255171 0 0 0 1.41421354 0 1.41421354 0 1.41421354 0 1.41421354 0 3 6 0 0 1 1192
0 6 0.259255171 0 0 0 1.41421354 0 1.41421354 0 1.41421354 0 1.41421354 0 3 6 0 0 1 1192