top of page

MACHINE LEARNING

wp3205204.jpg

Machine learning is one of the branches of artificial intelligence that allows systems to learn by themselves. In machine learning, other systems are no longer explicitly programmed, but are trained, experienced, and discover appropriate patterns in the data themselves. This project is supervised in the field of learning.
​
Implementation of algorithms:
The implementation of the algorithms is done with a general idea: the implementation of two models of supervised learning (logistic regression and linear regression) in a single class. All algorithms are located in one module. Among the written algorithms are the following:
- Pre-processing stage: standardization algorithms, normalization, training data sharing and testing, etc.
- Optimization algorithms: reduction gradient, normal equation.
- Other algorithms: Hypothesis functions, cost.
- Error finder functions: These functions properly monitor the processing of algorithms.
These algorithms have been used in the project.
 
Projects:
1. Estimating and forecasting carbon dioxide emissions of passenger cars
Here we aim to predict the amount of carbon dioxide emissions of passenger cars using a multi-passenger database in Canada. This database contains features such as the number of cylinders, engine volume, company name, vehicle model, and so on. By examining the characteristics of different cars, we achieve a model that can predict the amount of carbon dioxide emissions for new cars. Before starting the tutorial, we will analyze the database a bit. In this example, only a graph analysis is provided, including the following diagrams:

1. Histogram chart
2. Bar chart
3. Dispersion diagrams
Finally, using univariate and multivariate linear regression, we teach a model that can predict the amount of carbon dioxide emissions for new vehicles.
2. Classify students based on their grades
This is a simple example of using the listed algorithms for categorization. Also, this classification is based on only two scores from each student; So much so not accurate. The purpose of this problem is only to test the above algorithms in the field of classification.

 3. Analysis and forecasting of bank data
Banks are always dealing with data analysis. They are always looking for the behavioral patterns of their customers so that they can provide the best services to them. Therefore, in every bank, there is a data science team that examines customer information and understands their patterns. In this example, we have dealt with the bank deposit section. Our goal is to find out how likely a bank customer is to deposit in the same bank. For this purpose, we used a database containing the information of 45,000 customers of a Portuguese bank in 2012. This information includes age, occupation, education, inventory, etc. of customers.

 
 Project steps:
1. Receive comprehensive database information (number of data, number of features, name of features, type of variables, statistical information of features such as mean and number and quarters, display of the first and last three data, etc.)

 2. Database grouping based on a specific feature and then statistical analysis of the database:
 as Example We group the whole database by different occupations and then calculate the average of the other features. For example, the average balance of teachers' accounts
 3. Check the number and dispersion of some features:
 This is done both numerically and graphically.
 4. Histogram chart of some features
 5. Circular diagram of the scatter of different values ​​of each property along with the percentage
 6. Check the deposit amount of different categories.
 For example, 4,585 divorced people did not deposit in the bank and only 622 divorced people were willing to deposit in the bank.
 This is also done graphically.
 After reviewing the database, it is time to build and teach the machine learning model.
 Note: Due to the slowness of the reduction gradient algorithm, we have taught the model using ready-made machine learning libraries.
 
 Model training steps:
- Preprocessing :
1. Convert non-numeric features to numeric:

 All stages of machine learning training are using mathematics. Since non-numeric data is not processable, convert it to numeric data we do.
 2. Select 25 more effective features:
Choosing more effective features helps speed learning. For example, in the case of deposits, the effect of a person's account balance is much greater than the effect of the date of opening a bank account. So we try to process only the features that have the greatest impact on people's deposits. Of course, this may reduce learning accuracy a bit.

 3. Data standardization:
 Given that our data is in different intervals, it is better to put all the data in a standard range. So that the average of the whole database is zero and its scatter is 1. This increases the speed and accuracy of learning
 4. Divide the data into training sets and test sets
 5. Convert a bank database from an unbalanced database to a balanced database.
 One of the things we notice when analyzing bank data is that this database is very unbalanced. In the chart section, we saw that only 11% of customers are willing to deposit. This is a bad performance for a bank. Also this lack Balance makes learning difficult. In this case, the system will be more accurate in identifying the pattern of customers who have made a deposit; Because at the time of testing, it has mostly seen samples of deposited customers.
 - Education:
6. Select logistic regression model:
Given that this is a two-class classification, we have used logistic regression. Logistic regression is used to classify two classes or several classes.

 7. Training of training set data
 8. Predict new data (test data)
 9. Check the accuracy of the forecast
​
​
​

bottom of page