Machine Learning Model
For Asset Performance Management In PHP

Hey folks!! Welcome to my personal blog. My name is Prathamesh Varhadpande. I am a 3rd year Computer Science Undergraduate Student. In this blog I will be discussing about a ML model for Asset Performance Management using PHP.

Intro...

Yes! You heard it right the first time… A machine learning model that too in PHP. Now you must have wondered why I have chosen PHP and not Python where the world is going crazy behind it for implementing Machine Learning, Data Science models, Artificial Intelligence and what not… But wait a minute can we even implement a machine learning model in PHP? The answer is big yes.

There are excellent libraries available in PHP which are used for implementing Machine Learning viz. PHP-ML, Rubix-ML, PHP-AI, etc. and some of the libraries are still under the development phase.

In this blog I will be focusing on the Rubix ML library.

Now coming back to our project, we are going to build an Asset Performance Management ML model. Now, you might wonder why I have chosen this topic? The reason is, I participated in the Smart India Hackathon which is organized by Govt. of India. In this Hackathon I was assigned with the problem statement regarding the Asset Performance Management in Oil and Gas Industries. But the cherry on the cake was… my proposed solution stood finalist in this Hackathon.

So, grab a seat and let’s have a deep dive in this ML model. But before starting the technical details let us first understand, What Asset Performance Management is?

Asset Performance Management

Asset Performance Management (APM) in case of software are the systems that act to improve the reliability and availability of physical assets while minimizing risk and operating costs. APM typically includes predictive maintenance, condition monitoring of assets, reducing the unplanned downtime often using technologies such as data collection, visualizing the data and showing the analytics related with it.

Basically, the equipment’s used for storage, controlling flow, containing chemical reactions, refining and processing are known as Process Plants Equipment’s. In our case, we are dealing with the Oil and Gas Industry where typical Process Plant Equipment’s may be Heat Exchanger, Turbines, Coolant equipment, Gas Compressors, Pumps, Mixers, Storage Tanks, etc. Mainly the equipment’s are classified into two categories: 1. High Risk Equipment’s (Mostly Rotating Equipment’s) and Low Risk Equipment’s (Mostly Fixed Equipment’s).

In this blog I will be focusing on only one asset which is Heat Exchanger which can be classified as High-Risk Equipment.

Okay folks, now before moving towards Dataset part first let us understand, What Heat Exchanger actually is?

Heat Exchanger

Heat Exchanger is a process plant equipment that is specially designed for heat transfer between different media. One medium is process fluid and the other is a heat-absorbing coolant comprised of chilled liquid or gas. Heat exchangers function by bringing a cooled fluid into close contact with a hot industrial process or piece of equipment. This allows for an exchange of the heat between the two mediums by using the principles of thermal heat conduction.

Now let’s study about the parameters of Heat Exchanger so that we can generate our Dataset.

#Parameters –

  1. Flow Rate of Fluid (Q) – It is the amount of fluid (in liters) flowing per minute through the Heat Exchanger.
  2. Pressure at Cold Inlet (P1) in kg/cm^2.
  3. Pressure at Hot Inlet (P2) in kg/cm^2.
  4. Cold Fluid Pressure Drop (PCD) in (mm of HG).
  5. Hot Fluid Pressure Drop (PHD) in (mm of HG).
  6. Cold Fluid Inlet Temperature (T1(k)).
  7. Cold Fluid Outlet Temperature (T2(k)).
  8. Hot Fluid Inlet Temperature (T3(k)).
  9. Hot Fluid Outlet Temperature (T4(k)).

Based on the above parameters we will generate our analytics.

So now we know what Heat Exchanger is and how it works. Now let’s have a mind mapping of our ML model for better understanding...

Mind-Mapping

Process...

Now let's study about the process we are going to follow while building our ML model...

So, now you must have understood what process we are going to follow for our ML model. Now let’s dive into the generation of Dataset...

Dataset Generation...

Now you might wonder, how we are going to generate the Dataset since all the Assets used in the Process Plant Equipment’s are the mechanical assets and not the software one.

Here the idea is to connect the IoT devices to the machinery which will measure the parameters of that equipment and correspondingly send the generated data to the APM system which will predict the behavior and condition of that equipment.

Since due to Covid19 condition we all were in lockdown, I could not arrange for IoT devices which were going to generate the data. But don’t worry folks there is another way we can generate Dataset for our ML model.

We can use the already generated Data from the machinery and then we can hardcode it into our ML model. For the Dataset I searched for few Research Papers onto the internet…upon searching I found one research paper published on Elsevier.com which talks about the performance measurement of Heat Exchanger. In this research paper they have already generated and verified the Dataset for Heat Exchanger. So, I decided to use this Dataset for our ML model. The link for research paper is given below, you can refer this paper about how they have generated the dataset for heat exchanger:

Research Paper from Elsevier for Heat Exchanger Efficiency

Now let’s have a look on our Dataset for asset efficiency…

We can clearly see that all the parameters of heat exchanger which we discussed earlier in this blog are present in this dataset along with the Efficiency of Heat Exchanger for a particular data record in the dataset. Our dataset consists of approximately 2000+ records. Having a large dataset is always a plus point while training a ML model.

But now we need two more datasets i.e. for Reliability of Heat Exchanger and Working Status of Heat Exchanger. Since we are going for hardcode method we manually need to generate the dataset.

The Heat Exchanger dataset which we collected, the efficiency of the Heat Exchanger is mentioned in the last column so using these values we labeled the dataset for Efficiency, Reliability and Working Status which is classified as:

For Efficiency -

                   L_Efficiency - 85-90%      M_Efficiency - 90-95%      H_Efficiency - 95% and above

     For Reliability -

                   L_Reliable- 85-90%      M_Reliable- 90-95%      H_Reliable - 95% and above

     For Status -

                    L_Stable - 85-95%    H_Stable - 95% and above

Now, let’s have a look at all three datasets where I have labeled the output column (marked in red).

Efficiency Dataset –

Reliability Dataset –

Working Status Dataset -

*Note - All the Datasets are in Excel format we need to convert it into CSV (Comma Separated Value) format.

So, the next step is to split the Dataset into Training Dataset and Testing Dataset. While working with the algorithm we will split the dataset into training dataset and testing dataset by using split method.

We will use the 70%-30% rule for splitting the dataset. After splitting the dataset 70% of the dataset will be used for the Training purpose and 30% of the dataset will be used for the Testing purpose. So, our training dataset consists of approximately 1400+ data records and testing dataset consists of 600+ data records.

So now we are clear and done with the dataset generation part. Now we will move towards the Machine Learning part of our project.

Machine Learning for Asset Performance Management

For Asset Performance Management I have chosen Supervised Machine Learning Technique.

For those who don’t know what Supervised Machine Learning is…let me explain it in short. Supervised Machine Learning is when the model is getting trained on a labeled dataset. Labeled dataset is one which have both input and output parameters.

In this type of learning both training and validation datasets are labeled, as shown in the figures below. The red color box is the output parameter from the dataset. While remaining parameters will be considered as input parameters.

I have chosen Supervised Machine Learning technique reason being the dataset. Since we are using the Labeled dataset it is apt to use the Supervised Machine Learning for APM.

As discussed earlier, I am going to use the Rubix-ML Library for implementing Machine Learning in PHP. So, let’s have a deep dive into the Rubix-ML library.

# Rubix-ML Library –

Rubix-ML is a high-level Machine Learning Library in PHP which has over 40+ ML libraries for supervised and unsupervised machine learning. It is open source and free to use commercially. Below is the link provided for Rubix-ML library:

rubixml.com

Since we are using supervised machine learning technique, we now need to implement a supervised machine learning algorithm. There are lot of Supervised ML algorithms available in Rubix-ML library, but the most apt algorithm which I found corresponding to our dataset is K-Nearest Neighbors (K-NN). For those who don’t know let me first explain what is K-NN…

# K-NN Algorithm -

K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories.

But why KNN?

So, now we understand our K-NN Algorithm, let’s study about the Kernel Distance which we will be using in our algorithm.

# Kernel Distance (Hamming Distance) -

The key aspect of the kernel distance is its interpretation as a distance between probability measure. Given two objects A and B, and a measure of similarity between them given by K (A, B), then the induced distance between A and B can be defined as the difference between the self-similarities K (A, A) + K (B, B) and the cross-similarity K (A, B). For example, take two strings of same length:

                1) Roller    2) Wagner

Now if we compare two strings character by character then, First four characters from both the strings does not match but after that the remaining two characters matches from both the strings.

Hence, the Hamming Distance between the two strings will be 4.

Similarly, if we take two strings from our Heat Exchanger Dataset for Efficiency:

                1) L_Efficiency    2) M_Efficiency

If we compare these two strings character by character then except first two characters both the words match for rest of the characters. Hence, the hamming distance will be 2. Less is the hamming distance more is the accuracy of our ML model.

For more information on Kernel Distance refer below link:

Research Paper on Kernel Distance

There are different types of Kernel Distance available for Categorical Data and Non-Categorical Data. Since we are working on Categorical Data, we need to use Distance Kernel for Categorical Data.

Now, since we are using K-NN Algorithm and as we are working on the Categorical Data, the most appropriate Distance Kernel is the Hamming Distance Kernel.

For more information on Hamming Distance Metric refer below link:

Research Paper on Hamming Distance Metric

But why to use Distance Metric when we are using K-NN?

The answer is quite simple, as K-NN work on feature matching, Distance Metric will help find the data points with similar feature and then by using the K-NN concept of nearest neighbors we can find the most suitable/nearest group of data point which matches with the input data.

So now, we are all set and done with the Dataset and the Algorithm part, let’s have a deep dive into the coding part of our model.

PHP Machine Learning Code:

Hey folks, till now I have discussed all the aspects related to our project but still I have not answered your question which is, Why I chose PHP over Python?

The reason being PHP is a server-side object-oriented programming language which is obviously meant for web-based application. Being server-side, its execution is very fast as compared to Python which is a scripting language. Moreover, when we are dealing with large amount of live data it is very important that the processing of data is fast and consistent which is provided by PHP. Also, PHP is able to run on many platforms but Python has its limitation on working on different platforms. Another important reason being PHP broadly supports database connectivity but on other hand Python has its limitation again when comes to database connectivity.

Here, in my project for Smart India Hackathon, which is Asset Performance Management, I proposed a Web Based Approach for implementing ML model. So, adhering to all such reasons I chose PHP over Python.

I hope you are satisfied with this explanation about why I chose PHP over Python. Again, these are my personal views as it may differ from person to person depending on project requirement.

So, let’s get started writing the ML code in PHP. Writing ML code in PHP is very easy as it takes just 5 lines of code using Rubix-ML library.

*Note – For using Rubix-ML PHP Library we need PHP Version 7.4 and later.

First to begin with, consider we need to predict the efficiency of our asset which is heat exchanger.

#Code for Predicting Efficiency of Heat Exchanger –

Okay folks, now let me explain you the code step by step…

Now every machine learning project requires some libraries and packages to work with data and to implement the algorithms. Likely, we need to import some packages/class from Rubix-ML library into our ML model.

*Note - Since, PHP is OOP language we sometimes refer packages as Class.

The libraries are:

a) include __DIR__ . '/vendor/autoload.php'; –

This is basically not a library but a Composer. Composer is an application-level package manager for the PHP programming language that provides a standard format for managing dependencies of PHP software and required libraries.

b) use Rubix\ML\Datasets\Labeled; -

Now we import the Datasets package for importing our dataset into algorithm and manipulating with our dataset. This package consists of a Class named Labeled which has predefined methods for working with the labeled dataset.

c) use Rubix\ML\Extractors\CSV; -

Although we have used the Dataset package for manipulating with the dataset but it is only possible when we import the dataset into our algorithm in a proper format. So, we need to use the Extractors package. Extractors are iterators that let you loop over the records of a dataset in storage and can be used to instantiate a dataset object using the fromIterator() method. Similarly, our dataset should be in the CSV format when using it in our ML model. Thus, Extractors package also consists of CSV Class which automatically converts dataset files into CSV format. CSV files have the advantage of being able to be processed line by line. Thus, all CSV data are imported as categorical type (strings) by default.

d) use Rubix\ML\Classifiers\KNearestNeighbors; -

Till now we have imported the dataset, now we need to import the algorithm package which we are going to implement in our code. Since, we are performing Classification in our model we need to import the Classifiers package which has the KNearestNeighbors Algorithm. So now, we have imported the Classifiers Class which means now we can apply K-NN on our dataset.

e) use Rubix\ML\Kernels\Distance\Hamming; -

After importing the Classifiers package we can now work on our algorithm but still the Distance Kernel part is unknown to our K-NN algorithm based on which it is going to calculate the nearest neighbor. So, for that we import the Kernels package which consists of the Distance package where tons of Distance metrics are available. Since we are working with the categorical data we need to import the corresponding metrics. So, we import the Hamming Distance Metric for our ML code.

f) use Rubix\ML\CrossValidation\Metrics\Accuracy; -

Now coming to the last package which is CrossValidation. Generally, CrossValidation techniques are used for evaluating ML models. Evaluating in the sense measuring the accuracy of our ML model on different parameters based on different subsets of data. Here in our ML model we are just going to measure the accuracy of our Algorithm on our provided Testing Dataset.

Now coming to the actual code let’s explore it step by step…

*Note – 1) Comments in PHP are represented using ‘//’.

2) To display output in PHP the syntax is:

echo ‘write_your_output_code/sentence_here’

3) In PHP variable is declared using ‘$’ symbol

Step 1) Loading Dataset into Algorithm:

Code:

$dataset = Labeled::fromIterator(new CSV('heat_efficiency.csv'));

Here, I have declared one variable named dataset which will store our imported dataset which is for Efficiency of Heat Exchanger. The object of the Class CSV is instantiated by using the new keyword. The new Keyword is followed by extractor named CSV(), in this extractor we need to pass the path of our required dataset and then it will convert it into CSV format. But only storing the dataset is not enough as we even want to work with the records present in that dataset. So, we need to use the Class Labeled which consists of method named fromIterator() which iterates over each record from the dataset. We can use this method fromIterator() from Class Labeled by using a scope resolution operator.

Step 2) Splitting the dataset into training and testing dataset:

Code:

[$training, $testing] = $dataset-> stratifiedSplit(0.7);

Here, again I have declared two variables named training and testing which will store the respective dataset after split, which will be operated on our imported dataset. For splitting the dataset, we use the predefined method named stratifiedSplit which takes one argument as input which is obviously the splitting factor for dataset. It takes argument values between 0 and 1. As discussed earlier, I am going to split the dataset into 70:30 ratio. Thus, I passed 0.7 as an argument to the method. Here, by default it assumes that if we are working with two variables then the splitting will be done accordingly. Thus, it splits our dataset into 70% for training and 30% for testing. That’s actually the power of Rubix-ML library.

Step 3) Applying estimator i.e. the K-NN Algorithm to the ML model:

Code:

$estimator = new KNearestNeighbors(2, true, new Hamming(2));

Hey folks, now coming to the interesting part which is applying the machine learning algorithm or you can say estimator for our ml model. Here, I have declared a variable named estimator which will be storing the ML parameters and Class i.e. Algorithm that we are going to use in our model. So, by using new keyword the object of the Class K-NN is instantiated and now we can use our algorithm. Since we are going for K-NearestNeighbors we need to use the KNearestNeighbors Class. The KNearestNeighbors have different parameters. The first parameter is the value of nearest neighbor which we will consider for our ML model. The value generally depends on the dataset we are using and the user. Here I choose 2 as the parameter of nearest neighbor since we are working on three categories of values i.e. 1) L_Efficiency, 2) M_Efficiency and 3) H_Efficiency. It literally means that whenever a new data point will be entered the K-NN algorithm will search for 2 Nearest or likely same data point/group where the entered data point suits perfectly. We cannot choose nearest parameter greater than 3 as it will cause ambiguity while the algorithm searches for likely same data point. The second parameter is the Boolean value which is set to true. It literally means that the algorithm has to consider the distances of nearest neighbors while making predictions. The third parameter which is really important is the Distance Metric. As discussed earlier, we are using the Hamming Distance Metric. So, we need to create the object of Hamming Distance Class with new keyword to instantiate it. It takes a parameter for measuring the distance. Here I have passed 2 as a parameter as we have discussed earlier in the Kernel Distance section.

Step 4) Training/Fitting the Algorithm with Training Dataset:

Code:

$estimator->train($training);

Now comes the fun part, here our model is ready to be trained. Here, we apply train method on the training variable which earlier had stored the Training Dataset. But how the algorithm is going to be trained for the dataset. For that we make the variable estimator to apply on the training variable using the train method. Variable estimator had stored the pattern building algorithm and parameters to apply on the dataset. Thus, we have trained our ML model on our Dataset for particular patterns.

Step 5) Testing/Predicting the Trained Algorithm on Testing Dataset:

Code:

$predictions = $estimator->predict($testing);

Until now, we have trained our ML model on the training dataset. Now it’s time to predict the output or test the patterns learnt by our ML model. For that I have declared another variable named predictions which will store the predicted datapoints from the testing dataset. But how it will predict? For that we need to use the predict method which will work on our testing dataset. But still the predict method don’t know how to predict. For that we need to apply the estimator on predict method as it stores the learned pattern and parameters for the ML model. Based on the learned patterns the predict method will make predictions from the testing dataset and it will store it in the variable named predictions.

Step 6) Printing the Predicted values:

Code:

$output = array_slice($predictions, 0, 5);

After we have tested/predicted the values from the testing dataset, the algorithm will store the predicted values in the form of arrays. As the algorithm is working on the complete dataset it will predict value for each group of data points. Therefore, we need to restrict or choose limited values of output for better insights. Here, I have used the array_slice method to limit the output values. It takes three parameters, first one is the predictions variable which has stored the predicted values, the second and third parameter is the starting index and ending index of the array. Since I want first five output values so I choose starting index as 0 and ending index as 5. Finally, the first 5 predicted values are stored in the variable named output.

Step 7) Finding the Accuracy of our ML Model:

Code:

$metric = new Accuracy();
      $score  =  $metric->score($predictions, $testing->labels());

Now coming into the endgame is to find how accurate our ML model worked on the testing dataset. For that we need to use the predefined Accuracy Class which will find the accuracy of our ML model. So, I instantiate the object of that class using the new keyword and stored it in the variable named metric. Now here comes the fun part, measuring accuracy of the model means the number of classifications a model correctly predicts divided by the total number of predictions made. Here, we need not do the calculations as the Accuracy class has predefined method named score which will do the complex stuff. Scoring is also called prediction, and is the process of generating values based on a trained machine learning model, given some new input data (testing data). Now, the score method takes two parameters i.e. the tested data points and the corresponding parameters based on which it has generated the result. So, I have passed predictions variable as a first parameter which holds the patterns tested on dataset and similarly the second variable named testing which holds the testing dataset, is now applied on the labels method. The labels method generally holds the output value provided by the user while training. Thus, it now cross-validates the generated data with the provided data when we apply the Accuracy class stored in variable metrics.

Step 8) Converting the Score into User format:

Code:

echo 'Accuracy of the model is ' . (string) ($score * 100.0) . '%' . PHP_EOL;

This line of code does not do any complex task but just converts the score into user format. As the score generated in the previous step was lying between 0 and 1, we convert it into percent form by multiplying it with 100 thus making it more user understandable. It fairly depends on the user whether to convert it or not.

And Voila!!! We have completed our ML Model here.

Now, you might be thinking what about the remaining datasets i.e. the Heat Exchanger Reliability and Status. Don’t worry it’s not a tedious task, you just need to replace the heat_efficiency dataset with the heat_reliability and heat_status datasets, remaining the ML model will perform the same task for both the datasets.

Until here, we have just performed the machine learning stuff where we got to know about the predictions based on the datasets provided. But this stuff is very boring as just a piece of data is predicted for an instance. Now, if we want to know how the data of heat exchanger is behaving for the complete dataset then we need to explore the dataset. Yes! You guessed it right, we are going to do Data Exploration with Visualization.

Just going off-track a bit, how many of you have watched the movie Drishyam? What does this movie convey? The movie conveys the fact that, Visual Memories are the strongest memories which helps faster communication of data with better understanding.

So, let’s have a Drishyam now (I mean Data Visualization)... 😅😅

Data Visualization

For Data Visualization, you can use any of your favorite plotting software where you just need to import the "dataset.csv" file into it.

Here I have done this work using JavaScript library named Canvas as I have developed this entire project for Web Based Application perspective.

*Note – Data is explained for different conditions based on the dataset I have used.

1) Heat Exchanger Efficiency Line Graph:

I have plotted the Line Graph between the Efficiency values of Heat Exchanger over time for different values of data. You can refer the dataset discussed earlier.

This graph shows the behavior of Heat Exchanger’s Efficiency over different set of data points. We can see that the Efficiency of the asset is High somewhere whereas it is low or you can say medium somewhere. Over set of data points the efficiency of the asset is decreasing which means the asset needs to be consider for maintenance.

2)Heat Exchanger Reliability Graph:

The graph is plotted between fluid flow rate of the asset and the efficiency values for better visualization purpose.

Basically, the asset is said to be reliable if it has less unplanned downtime. Above graph clearly shows that over the set of datapoints the reliability of asset is increasing a bit and then it takes a steep fall which means the asset is not reliable over the period of time.

3)Heat Exchanger Reliability Histogram:

The Histogram is plotted between the Reliability Categories of Heat Exchanger over different values of Fluid Flow Rate for better Data Visualization.

Clearly, we can see that for most of the time the Heat Exchanger showed Medium Reliability and Low Reliability whereas there were very few moments where our asset was Highly Reliable. Thus, we can clearly conclude that our asset needs to be consider for maintenance. Unless it will cause unplanned downtime and will increase the operating costs.

*Note - We can plot the above visualizations based on several different parameters.

4)Machine Learning Prediction:

Until now we have seen what the data points said from our dataset using the Data Visualization Technique. But the most important part of our ML model which is prediction of output result is left. I know you all are excited about the predicted result. That’s why I kept it for later part.

So, let’s have a look at the below diagram:

After clicking on Assess button our asset showed prediction as Medium Efficiency, Low Reliability and was Less Stable. So, as you can see that it is quite a proof for our Data Visualization results where it showed the same results. That’s the reason why I lastly revealed the predicted output of our ML model. After checking the accuracy of our model, it showed 91% accuracy which is quite acceptable for K-NN ML model.

I know a lot can be done with this dataset and actually lot of things are still left to explore in this ML model. This ML model was basically based on the hardcoded datasets but actually when we will be working with the live data using the IoT sensors the analysis will be more accurate.

I hope you discovered some new stuff today and loved my work. So, as you can see it’s actually quite fun performing machine learning in PHP and a lot needs to be discovered from the same.

If you want to refer to the Datasets, Code and Visualizations used in this blog, then head over to my github repository by clicking below link.

GitHub Repository

I hope you enjoyed the PHP Machine Learning Journey!!

Thankyou. Stay Safe and Happy Learning!!!