Hello Devz,
Here is a simple tutorial of how to use ML.Net to make a prediction based on regression. ML.Net is an open source and cross-platform machine learning framework. A regression is a statistical method to find the relation between variables.
Let’s use a budget context. We have a list of expenses with a cost and a category. We would like to predict the category based on the cost of an expense.
After you created your new project in Visual Studio, be sure to install ML.Net from NuGet and set your project to be built with X64 (required for ML.Net).
We can start by creating our models. First, a definition of an Expense:
using Microsoft.ML.Runtime.Api; namespace MLNetDemo { public class ExpenseData { [Column("0")] public float Cost { get; set; } [Column("1", name:"Label")] public float Category { get; set; } } }
The Column attribute comes from ML.NET and will help the framework to understand the mapping between the model and the columns in the CSV file. This ExpenseData model will represent the data we have: a CSV file looking that we will call “expenses.csv”:
Cost | CategoryId |
1.7 | 1 |
2.2 | 1 |
1300 | 2 |
7 | 3 |
9 | 3 |
120 | 4 |
1.7 | 1 |
1300 | 2 |
7.5 | 3 |
2.2 | 1 |
1300 | 2 |
9 | 3 |
130 | 4 |
1.7 | 1 |
3 | 1 |
2.2 | 1 |
1300 | 2 |
110 | 4 |
1.7 | 1 |
2.2 | 1 |
7.5 | 3 |
1.7 | 1 |
1300 | 2 |
2.2 | 1 |
122 | 4 |
7 | 3 |
1300 | 2 |
1.7 | 1 |
2.2 | 1 |
2.2 | 1 |
1.7 | 1 |
1300 | 2 |
7.5 | 3 |
1.7 | 1 |
1.7 | 1 |
1300 | 2 |
7.5 | 3 |
1300 | 2 |
1.7 | 1 |
2.2 | 1 |
1.7 | 1 |
3 | 1 |
1.7 | 1 |
1300 | 2 |
7.5 | 3 |
Where a category is defined like this:
namespace MLNetDemo { public class Category { public int Id { get; set; } public string Name { get; set; } } }
And with:
var categoryList = new List<Category> { new Category{Id=1, Name="sdw"}, new Category{Id=2, Name="rent"}, new Category{Id=3, Name="cigs"}, new Category{Id=4, Name="resto"} };
We will create another CSV file called: “expenses_tests.csv”, containing this (for a later usage):
Cost | CategoryId |
1.7 | 1 |
1300 | 2 |
2.2 | 1 |
9 | 3 |
7.5 | 3 |
130 | 4 |
1300 | 2 |
1.7 | 1 |
3 | 1 |
2.2 | 1 |
7.5 | 3 |
9 | 3 |
1300 | 2 |
1.7 | 1 |
1.7 | 1 |
2.2 | 1 |
3 | 1 |
120 | 4 |
1300 | 2 |
Which will be the data we will use to test our prediction model.
Now, we can create our CategoryPrediction class:
using Microsoft.ML.Runtime.Api; namespace MLNetDemo { public class CategoryPrediction { [ColumnName("Score")] public float Category { get; set; } } }
Where ColumnName is a mandatory attribute helping ML.Net to identify the target of the prediction.
Now, let’s see the whole code:
using Microsoft.ML.Legacy; using Microsoft.ML.Legacy.Data; using Microsoft.ML.Legacy.Models; using Microsoft.ML.Legacy.Trainers; using Microsoft.ML.Legacy.Transforms; using System; using System.Collections.Generic; using System.Linq; namespace MLNetDemo { class Program { //Don't forget to build in X64 mode (ML.Net requiers it) static void Main(string[] args) { //Create the category list with the mapping of the Id and the Name of the Category var categoryList = new List<Category> { new Category{Id=1, Name="sdw"}, new Category{Id=2, Name="rent"}, new Category{Id=3, Name="cigs"}, new Category{Id=4, Name="resto"} }; //2.54 is not a known value but the model should predict the "sdw" category //because the model will be trained with 1.7, 2.2 and 3.0 a values for "sdw" category var valueToPredict = 2.54f; var predictedCategoryId = PredictCategoryFromCost(valueToPredict); //Find the category by its Id var predictedCategory = categoryList.FirstOrDefault(x => x.Id == predictedCategoryId); Console.WriteLine($"\r\nThe cost of {valueToPredict}$ gives a Category prediction of {predictedCategory.Name}."); Console.WriteLine("\r\nPress any key..."); Console.ReadKey(); } private static float PredictCategoryFromCost(float costToPredict) { var predictedCategoryId = 0; var model = CreateModel(); TestModel(model); //Save the model on disk to reuse later model.WriteAsync("model.zip"); //Here is how to load this model later instead re-training it every time: //var loadedModel = PredictionModel.ReadAsync<ExpenseData, CategoryPrediction>("model.zip").Result; var expenseToPredict = new ExpenseData { Cost = costToPredict }; var prediction = model.Predict(expenseToPredict); predictedCategoryId = (int)Math.Round(prediction.Category); return predictedCategoryId; } private static PredictionModel<ExpenseData, CategoryPrediction> CreateModel() { var dataFileName = "expenses.csv"; var pipeline = new LearningPipeline { new TextLoader(dataFileName).CreateFrom<ExpenseData>(useHeader: true, separator: ','), new ColumnConcatenator("Features", "Cost"), new GeneralizedAdditiveModelRegressor() }; var model = pipeline.Train<ExpenseData, CategoryPrediction>(); return model; } private static void TestModel(PredictionModel model) { var testData = new TextLoader("expenses_tests.csv").CreateFrom<ExpenseData>(useHeader: true, separator: ','); var evaluator = new RegressionEvaluator(); var metrics = evaluator.Evaluate(model, testData); //Let's check the precision of our model Console.WriteLine($"RMS: {metrics.Rms}"); Console.WriteLine($"R^2: {metrics.RSquared}"); //The closer to 1, the best the model has been trained } } }
And the prediction seems correct:
Happy coding! 😉