Sunday, March 22, 2015

Test scaling Logistic regression (ML classification algorithm) by partitioning data by Histogram [2]

Information about important classes..

Class
Description
GetMinMax.java
Calculate the min max values of each feature
HistogramEnsembler.java
Encapsulate partitioning data through creation of models for each group. Also the model ensembling logic is implemented.
HistogramHelper.java
This calculates the bin number and converting a given coordinate to a bin number.
HistogramTester.java
Analysis of the partitioned data using a Histogram
HistogramTree.java
Allow formulating and manipulation of the histogram.
IHistogramHelper.java
Defines the contract of the histogram.
LogisticRegresionTester.java
Analysis of the logistic regression with full data set.
Metrics.java
Efficiency calculation
RandomPartitionedEnSembler.java
Encapsulate random partitioning data through creation of models for each group. Also the model ensembling logic is implemented.
RandomPartitionTester.java
Analysis of the logistic regression with randomly partitioned data.


TestHistogramHelper.java
This does the unit tests of the HistogramHelper class. It tests the dimensions and the change axis order.
TestHistogramTree.java
This does the unit tests of the HistogramTree class. It tests the neighbouring bins and checks the grops are of equal sizes.

Testing and Evaluation



Technique Used
Accuracy (%)
Performance
Method 01
Logistic regression with full data set
74.2720201581548
training time : 52s
Prediction time: 12s
Method 02
Logistic regression with randomly partitioned data set
74.35145399566024
training time: 2min42s
Prediction time: 15s
Method 03
Logistic regression with partitioned data using histogram.
74.36789359939304
training time: 06min07s
Prediction time: 12s

Above table is being used to compare three different methods used to run logistic regression.
According to the statistics given in the table in terms of accuracy all three methods shows relatively similar accuracy levels. When comparing training times method 01 shows the best time of 52s whereas method 03 shows worst. however considering the prediction times method 01 and method 03 shows best of 12s.
 

No comments:

Post a Comment