{"id":620,"date":"2019-08-08T09:49:20","date_gmt":"2019-08-07T22:49:20","guid":{"rendered":"https:\/\/spectdata.com\/?p=620"},"modified":"2020-12-15T22:22:42","modified_gmt":"2020-12-15T11:22:42","slug":"variable-selection-using-lasso","status":"publish","type":"post","link":"https:\/\/spectdata.com\/index.php\/2019\/08\/08\/variable-selection-using-lasso\/","title":{"rendered":"Variable selection using LASSO"},"content":{"rendered":"\n<p>Data analysts and data scientists use different regression methods for different kinds of analytics problems. From the simplest ones to the most complex ones. One of the most talked-about methods is the Lasso. Lasso was often described as one of the most useful linear regression tools and we are about to find out why.<\/p>\n\n\n\n<p>LASSO is actually an abbreviation for \u201cLeast absolute shrinkage and selection operator\u201d, which basically summarizes how Lasso regression works. Lasso does regression analysis using a shrinkage parameter \u201cwhere data are shrunk to a certain central point\u201d [<a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/lasso-regression\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">1<\/a>] and performs variable selection by forcing the coefficients of \u201cnot-so-significant\u201d variables to become zero through a penalty.<\/p>\n\n\n\n<p>Now to understand more about this powerful tool, we will apply\nthis example to a real-world problem.<\/p>\n\n\n\n<p>We got our data from Kaggle.com [<a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.kaggle.com\/uciml\/breast-cancer-wisconsin-data\" target=\"_blank\">2<\/a>] about a few breast cancer diagnostic cases. This will be used for the entire demo session. The dataset contains characteristics of the cell nuclei present in the digitized image of a fine needle aspirate (FNA) of a breast mass. The problem we are solving for is to identify what are the physical characteristics of the breast mass that significantly tells us whether it is benign or malignant.<\/p>\n\n\n\n<p><strong>Prepare data<\/strong><\/p>\n\n\n\n<p>We divide our data into a training set and a test set.<\/p>\n\n\n\n<style>.gist table { margin-bottom: 0; }<\/style><div style=\"tab-size: 8\" id=\"gist97517739\" class=\"gist\">\n    <div class=\"gist-file\" translate=\"no\" data-color-mode=\"light\" data-light-theme=\"light\">\n      <div class=\"gist-data\">\n        \n<div class=\"js-gist-file-update-container js-task-list-container\">\n      <div id=\"file-data-prep-r\" class=\"file my-2\">\n    \n    <div itemprop=\"text\"\n      class=\"Box-body p-0 blob-wrapper data type-r  \"\n      style=\"overflow: auto\" tabindex=\"0\" role=\"region\"\n      aria-label=\"data prep.r content, created by mfmakahiya on 07:07AM on July 30, 2019.\"\n    >\n\n        \n<div class=\"js-check-hidden-unicode js-blob-code-container blob-code-content\">\n\n  <template class=\"js-file-alert-template\">\n  <div data-view-component=\"true\" class=\"flash flash-warn flash-full d-flex flex-items-center\">\n  <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n    <span>\n      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.\n      <a class=\"Link--inTextBlock\" href=\"https:\/\/github.co\/hiddenchars\" target=\"_blank\">Learn more about bidirectional Unicode characters<\/a>\n    <\/span>\n\n\n  <div data-view-component=\"true\" class=\"flash-action\">        <a href=\"{{ revealButtonHref }}\" data-view-component=\"true\" class=\"btn-sm btn\">    Show hidden characters\n<\/a>\n<\/div>\n<\/div><\/template>\n<template class=\"js-line-alert-template\">\n  <span aria-label=\"This line has hidden Unicode characters\" data-view-component=\"true\" class=\"line-alert tooltipped tooltipped-e\">\n    <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n<\/span><\/template>\n\n  <table data-hpc class=\"highlight tab-size js-file-line-container\" data-tab-size=\"4\" data-paste-markdown-skip data-tagsearch-path=\"data prep.r\">\n        <tr>\n          <td id=\"file-data-prep-r-L1\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"1\"><\/td>\n          <td id=\"file-data-prep-r-LC1\" class=\"blob-code blob-code-inner js-file-line\">library(Matrix)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L2\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"2\"><\/td>\n          <td id=\"file-data-prep-r-LC2\" class=\"blob-code blob-code-inner js-file-line\">library(glmnet)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L3\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"3\"><\/td>\n          <td id=\"file-data-prep-r-LC3\" class=\"blob-code blob-code-inner js-file-line\">library(pROC)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L4\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"4\"><\/td>\n          <td id=\"file-data-prep-r-LC4\" class=\"blob-code blob-code-inner js-file-line\">library(caret)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L5\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"5\"><\/td>\n          <td id=\"file-data-prep-r-LC5\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L6\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"6\"><\/td>\n          <td id=\"file-data-prep-r-LC6\" class=\"blob-code blob-code-inner js-file-line\"># Import dataset<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L7\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"7\"><\/td>\n          <td id=\"file-data-prep-r-LC7\" class=\"blob-code blob-code-inner js-file-line\">data1 = read.csv(file = &quot;.\/data\/input\/breast-cancer.csv&quot;)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L8\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"8\"><\/td>\n          <td id=\"file-data-prep-r-LC8\" class=\"blob-code blob-code-inner js-file-line\">data1$diagnosis&lt;-ifelse(data1$diagnosis==&#39;M&#39;, 1,0)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L9\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"9\"><\/td>\n          <td id=\"file-data-prep-r-LC9\" class=\"blob-code blob-code-inner js-file-line\">data2 = data.matrix(data1)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L10\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"10\"><\/td>\n          <td id=\"file-data-prep-r-LC10\" class=\"blob-code blob-code-inner js-file-line\">Matrix(data2, sparse = TRUE)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L11\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"11\"><\/td>\n          <td id=\"file-data-prep-r-LC11\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L12\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"12\"><\/td>\n          <td id=\"file-data-prep-r-LC12\" class=\"blob-code blob-code-inner js-file-line\">set.seed(6789)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L13\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"13\"><\/td>\n          <td id=\"file-data-prep-r-LC13\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L14\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"14\"><\/td>\n          <td id=\"file-data-prep-r-LC14\" class=\"blob-code blob-code-inner js-file-line\"># Split the data to train and test<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L15\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"15\"><\/td>\n          <td id=\"file-data-prep-r-LC15\" class=\"blob-code blob-code-inner js-file-line\">split = sample(nrow(data1), floor(0.7*nrow(data1)))<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L16\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"16\"><\/td>\n          <td id=\"file-data-prep-r-LC16\" class=\"blob-code blob-code-inner js-file-line\">train = data1[split,]<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L17\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"17\"><\/td>\n          <td id=\"file-data-prep-r-LC17\" class=\"blob-code blob-code-inner js-file-line\">test = data1[-split,]<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L18\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"18\"><\/td>\n          <td id=\"file-data-prep-r-LC18\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L19\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"19\"><\/td>\n          <td id=\"file-data-prep-r-LC19\" class=\"blob-code blob-code-inner js-file-line\">train_sparse = sparse.model.matrix(~., train[,3:32])<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-data-prep-r-L20\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"20\"><\/td>\n          <td id=\"file-data-prep-r-LC20\" class=\"blob-code blob-code-inner js-file-line\">test_sparse = sparse.model.matrix(~., test[,3:32])<\/td>\n        <\/tr>\n  <\/table>\n<\/div>\n\n\n    <\/div>\n\n  <\/div>\n\n<\/div>\n\n      <\/div>\n      <div class=\"gist-meta\">\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/c2065015ca19b0dba5c4038949c85655\/raw\/cf93718cb07b0d78728745f74053991c9dba4735\/data%20prep.r\" style=\"float:right\" class=\"Link--inTextBlock\">view raw<\/a>\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/c2065015ca19b0dba5c4038949c85655#file-data-prep-r\" class=\"Link--inTextBlock\">\n          data prep.r\n        <\/a>\n        hosted with &#10084; by <a class=\"Link--inTextBlock\" href=\"https:\/\/github.com\">GitHub<\/a>\n      <\/div>\n    <\/div>\n<\/div>\n\n\n\n\n<p><strong>Train the model<\/strong><\/p>\n\n\n\n<p>After training the training set, we used cross-validation to determine the best lambda.<\/p>\n\n\n\n<style>.gist table { margin-bottom: 0; }<\/style><div style=\"tab-size: 8\" id=\"gist97517807\" class=\"gist\">\n    <div class=\"gist-file\" translate=\"no\" data-color-mode=\"light\" data-light-theme=\"light\">\n      <div class=\"gist-data\">\n        \n<div class=\"js-gist-file-update-container js-task-list-container\">\n      <div id=\"file-train-model-r\" class=\"file my-2\">\n    \n    <div itemprop=\"text\"\n      class=\"Box-body p-0 blob-wrapper data type-r  \"\n      style=\"overflow: auto\" tabindex=\"0\" role=\"region\"\n      aria-label=\"train model.r content, created by mfmakahiya on 07:13AM on July 30, 2019.\"\n    >\n\n        \n<div class=\"js-check-hidden-unicode js-blob-code-container blob-code-content\">\n\n  <template class=\"js-file-alert-template\">\n  <div data-view-component=\"true\" class=\"flash flash-warn flash-full d-flex flex-items-center\">\n  <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n    <span>\n      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.\n      <a class=\"Link--inTextBlock\" href=\"https:\/\/github.co\/hiddenchars\" target=\"_blank\">Learn more about bidirectional Unicode characters<\/a>\n    <\/span>\n\n\n  <div data-view-component=\"true\" class=\"flash-action\">        <a href=\"{{ revealButtonHref }}\" data-view-component=\"true\" class=\"btn-sm btn\">    Show hidden characters\n<\/a>\n<\/div>\n<\/div><\/template>\n<template class=\"js-line-alert-template\">\n  <span aria-label=\"This line has hidden Unicode characters\" data-view-component=\"true\" class=\"line-alert tooltipped tooltipped-e\">\n    <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n<\/span><\/template>\n\n  <table data-hpc class=\"highlight tab-size js-file-line-container\" data-tab-size=\"4\" data-paste-markdown-skip data-tagsearch-path=\"train model.r\">\n        <tr>\n          <td id=\"file-train-model-r-L1\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"1\"><\/td>\n          <td id=\"file-train-model-r-LC1\" class=\"blob-code blob-code-inner js-file-line\"># Train the model<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L2\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"2\"><\/td>\n          <td id=\"file-train-model-r-LC2\" class=\"blob-code blob-code-inner js-file-line\">glmmod = glmnet(x=train_sparse, y=as.factor(train[,2]), alpha=1, family=&quot;binomial&quot;)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L3\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"3\"><\/td>\n          <td id=\"file-train-model-r-LC3\" class=\"blob-code blob-code-inner js-file-line\">plot(glmmod, xvar=&quot;lambda&quot;)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L4\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"4\"><\/td>\n          <td id=\"file-train-model-r-LC4\" class=\"blob-code blob-code-inner js-file-line\">glmmod<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L5\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"5\"><\/td>\n          <td id=\"file-train-model-r-LC5\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L6\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"6\"><\/td>\n          <td id=\"file-train-model-r-LC6\" class=\"blob-code blob-code-inner js-file-line\">coef(glmmod)[,100]<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L7\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"7\"><\/td>\n          <td id=\"file-train-model-r-LC7\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L8\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"8\"><\/td>\n          <td id=\"file-train-model-r-LC8\" class=\"blob-code blob-code-inner js-file-line\"># Try cross validation lasso<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L9\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"9\"><\/td>\n          <td id=\"file-train-model-r-LC9\" class=\"blob-code blob-code-inner js-file-line\">cv.glmmod = cv.glmnet(x=train_sparse, y=as.factor(train[,2]), alpha=1, family=&quot;binomial&quot;)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L10\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"10\"><\/td>\n          <td id=\"file-train-model-r-LC10\" class=\"blob-code blob-code-inner js-file-line\">plot(cv.glmmod)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L11\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"11\"><\/td>\n          <td id=\"file-train-model-r-LC11\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L12\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"12\"><\/td>\n          <td id=\"file-train-model-r-LC12\" class=\"blob-code blob-code-inner js-file-line\">lambda = cv.glmmod$lambda.1se # the value of lambda used by default<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L13\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"13\"><\/td>\n          <td id=\"file-train-model-r-LC13\" class=\"blob-code blob-code-inner js-file-line\">lambda<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L14\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"14\"><\/td>\n          <td id=\"file-train-model-r-LC14\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L15\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"15\"><\/td>\n          <td id=\"file-train-model-r-LC15\" class=\"blob-code blob-code-inner js-file-line\">coefs = as.matrix(coef(cv.glmmod)) # convert to a matrix (618 by 1)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L16\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"16\"><\/td>\n          <td id=\"file-train-model-r-LC16\" class=\"blob-code blob-code-inner js-file-line\">ix = which(abs(coefs[,1]) &gt; 0)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L17\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"17\"><\/td>\n          <td id=\"file-train-model-r-LC17\" class=\"blob-code blob-code-inner js-file-line\">length(ix)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L18\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"18\"><\/td>\n          <td id=\"file-train-model-r-LC18\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L19\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"19\"><\/td>\n          <td id=\"file-train-model-r-LC19\" class=\"blob-code blob-code-inner js-file-line\">coefs[ix,1, drop=FALSE]<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L20\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"20\"><\/td>\n          <td id=\"file-train-model-r-LC20\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L21\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"21\"><\/td>\n          <td id=\"file-train-model-r-LC21\" class=\"blob-code blob-code-inner js-file-line\">test$cv.glmmod &lt;- predict(cv.glmmod,newx=test_sparse,type=&#39;response&#39;)[,1]<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L22\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"22\"><\/td>\n          <td id=\"file-train-model-r-LC22\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L23\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"23\"><\/td>\n          <td id=\"file-train-model-r-LC23\" class=\"blob-code blob-code-inner js-file-line\">########################<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L24\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"24\"><\/td>\n          <td id=\"file-train-model-r-LC24\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L25\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"25\"><\/td>\n          <td id=\"file-train-model-r-LC25\" class=\"blob-code blob-code-inner js-file-line\"># Get optimal lambda<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L26\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"26\"><\/td>\n          <td id=\"file-train-model-r-LC26\" class=\"blob-code blob-code-inner js-file-line\">best.lambda &lt;- cv.glmmod$lambda.min<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model-r-L27\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"27\"><\/td>\n          <td id=\"file-train-model-r-LC27\" class=\"blob-code blob-code-inner js-file-line\">best.lambda<\/td>\n        <\/tr>\n  <\/table>\n<\/div>\n\n\n    <\/div>\n\n  <\/div>\n\n<\/div>\n\n      <\/div>\n      <div class=\"gist-meta\">\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/7cbf7c74ecdd9b16eafa4ef83683eda4\/raw\/dfb2563a918948d2968516ba59cf59b1fd4a97e8\/train%20model.r\" style=\"float:right\" class=\"Link--inTextBlock\">view raw<\/a>\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/7cbf7c74ecdd9b16eafa4ef83683eda4#file-train-model-r\" class=\"Link--inTextBlock\">\n          train model.r\n        <\/a>\n        hosted with &#10084; by <a class=\"Link--inTextBlock\" href=\"https:\/\/github.com\">GitHub<\/a>\n      <\/div>\n    <\/div>\n<\/div>\n\n\n\n\n<p><strong>Predict<\/strong><\/p>\n\n\n\n<p>We predict the response variable for the test set, then, looked at the confusion matrix.<\/p>\n\n\n\n<style>.gist table { margin-bottom: 0; }<\/style><div style=\"tab-size: 8\" id=\"gist97517834\" class=\"gist\">\n    <div class=\"gist-file\" translate=\"no\" data-color-mode=\"light\" data-light-theme=\"light\">\n      <div class=\"gist-data\">\n        \n<div class=\"js-gist-file-update-container js-task-list-container\">\n      <div id=\"file-predict-r\" class=\"file my-2\">\n    \n    <div itemprop=\"text\"\n      class=\"Box-body p-0 blob-wrapper data type-r  \"\n      style=\"overflow: auto\" tabindex=\"0\" role=\"region\"\n      aria-label=\"predict.r content, created by mfmakahiya on 07:15AM on July 30, 2019.\"\n    >\n\n        \n<div class=\"js-check-hidden-unicode js-blob-code-container blob-code-content\">\n\n  <template class=\"js-file-alert-template\">\n  <div data-view-component=\"true\" class=\"flash flash-warn flash-full d-flex flex-items-center\">\n  <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n    <span>\n      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.\n      <a class=\"Link--inTextBlock\" href=\"https:\/\/github.co\/hiddenchars\" target=\"_blank\">Learn more about bidirectional Unicode characters<\/a>\n    <\/span>\n\n\n  <div data-view-component=\"true\" class=\"flash-action\">        <a href=\"{{ revealButtonHref }}\" data-view-component=\"true\" class=\"btn-sm btn\">    Show hidden characters\n<\/a>\n<\/div>\n<\/div><\/template>\n<template class=\"js-line-alert-template\">\n  <span aria-label=\"This line has hidden Unicode characters\" data-view-component=\"true\" class=\"line-alert tooltipped tooltipped-e\">\n    <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n<\/span><\/template>\n\n  <table data-hpc class=\"highlight tab-size js-file-line-container\" data-tab-size=\"4\" data-paste-markdown-skip data-tagsearch-path=\"predict.r\">\n        <tr>\n          <td id=\"file-predict-r-L1\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"1\"><\/td>\n          <td id=\"file-predict-r-LC1\" class=\"blob-code blob-code-inner js-file-line\"># Predict the test set using the model<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L2\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"2\"><\/td>\n          <td id=\"file-predict-r-LC2\" class=\"blob-code blob-code-inner js-file-line\">pred_lasso = predict(glmmod, test_sparse, type=&quot;response&quot;, s=best.lambda)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L3\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"3\"><\/td>\n          <td id=\"file-predict-r-LC3\" class=\"blob-code blob-code-inner js-file-line\">pred_lasso<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L4\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"4\"><\/td>\n          <td id=\"file-predict-r-LC4\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L5\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"5\"><\/td>\n          <td id=\"file-predict-r-LC5\" class=\"blob-code blob-code-inner js-file-line\"># Apply a threshold<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L6\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"6\"><\/td>\n          <td id=\"file-predict-r-LC6\" class=\"blob-code blob-code-inner js-file-line\">new_pred_lasso = ifelse(pred_lasso &gt;= 0.5, 1, 0)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L7\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"7\"><\/td>\n          <td id=\"file-predict-r-LC7\" class=\"blob-code blob-code-inner js-file-line\">new_pred_lasso = data.frame(new_pred_lasso)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L8\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"8\"><\/td>\n          <td id=\"file-predict-r-LC8\" class=\"blob-code blob-code-inner js-file-line\">data_lasso = cbind(test[,2], new_pred_lasso)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L9\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"9\"><\/td>\n          <td id=\"file-predict-r-LC9\" class=\"blob-code blob-code-inner js-file-line\">names(data_lasso) = c(&quot;actual&quot;, &quot;pred&quot;)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L10\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"10\"><\/td>\n          <td id=\"file-predict-r-LC10\" class=\"blob-code blob-code-inner js-file-line\">xtab_lasso = table(data_lasso$actual, data_lasso$pred)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L11\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"11\"><\/td>\n          <td id=\"file-predict-r-LC11\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict-r-L12\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"12\"><\/td>\n          <td id=\"file-predict-r-LC12\" class=\"blob-code blob-code-inner js-file-line\">cm_lasso = confusionMatrix(xtab_lasso)<\/td>\n        <\/tr>\n  <\/table>\n<\/div>\n\n\n    <\/div>\n\n  <\/div>\n\n<\/div>\n\n      <\/div>\n      <div class=\"gist-meta\">\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/ed14cba5bf11a4ec6f43c1256d281786\/raw\/460dcdbaf20463e91b1627a63d0a10c649f71d42\/predict.r\" style=\"float:right\" class=\"Link--inTextBlock\">view raw<\/a>\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/ed14cba5bf11a4ec6f43c1256d281786#file-predict-r\" class=\"Link--inTextBlock\">\n          predict.r\n        <\/a>\n        hosted with &#10084; by <a class=\"Link--inTextBlock\" href=\"https:\/\/github.com\">GitHub<\/a>\n      <\/div>\n    <\/div>\n<\/div>\n\n\n\n\n<p><strong>Check performance<\/strong><\/p>\n\n\n\n<p>We compared the actual values of the response set versus the predicted values.<\/p>\n\n\n\n<style>.gist table { margin-bottom: 0; }<\/style><div style=\"tab-size: 8\" id=\"gist97517854\" class=\"gist\">\n    <div class=\"gist-file\" translate=\"no\" data-color-mode=\"light\" data-light-theme=\"light\">\n      <div class=\"gist-data\">\n        \n<div class=\"js-gist-file-update-container js-task-list-container\">\n      <div id=\"file-perf-r\" class=\"file my-2\">\n    \n    <div itemprop=\"text\"\n      class=\"Box-body p-0 blob-wrapper data type-r  \"\n      style=\"overflow: auto\" tabindex=\"0\" role=\"region\"\n      aria-label=\"perf.r content, created by mfmakahiya on 07:17AM on July 30, 2019.\"\n    >\n\n        \n<div class=\"js-check-hidden-unicode js-blob-code-container blob-code-content\">\n\n  <template class=\"js-file-alert-template\">\n  <div data-view-component=\"true\" class=\"flash flash-warn flash-full d-flex flex-items-center\">\n  <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n    <span>\n      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.\n      <a class=\"Link--inTextBlock\" href=\"https:\/\/github.co\/hiddenchars\" target=\"_blank\">Learn more about bidirectional Unicode characters<\/a>\n    <\/span>\n\n\n  <div data-view-component=\"true\" class=\"flash-action\">        <a href=\"{{ revealButtonHref }}\" data-view-component=\"true\" class=\"btn-sm btn\">    Show hidden characters\n<\/a>\n<\/div>\n<\/div><\/template>\n<template class=\"js-line-alert-template\">\n  <span aria-label=\"This line has hidden Unicode characters\" data-view-component=\"true\" class=\"line-alert tooltipped tooltipped-e\">\n    <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n<\/span><\/template>\n\n  <table data-hpc class=\"highlight tab-size js-file-line-container\" data-tab-size=\"4\" data-paste-markdown-skip data-tagsearch-path=\"perf.r\">\n        <tr>\n          <td id=\"file-perf-r-L1\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"1\"><\/td>\n          <td id=\"file-perf-r-LC1\" class=\"blob-code blob-code-inner js-file-line\"># Get performance measures<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-perf-r-L2\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"2\"><\/td>\n          <td id=\"file-perf-r-LC2\" class=\"blob-code blob-code-inner js-file-line\">overall_accuracy_lasso = cm_lasso$overall[&#39;Accuracy&#39;]<\/td>\n        <\/tr>\n  <\/table>\n<\/div>\n\n\n    <\/div>\n\n  <\/div>\n\n<\/div>\n\n      <\/div>\n      <div class=\"gist-meta\">\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/f3b9222c54256b6531a8696c9b76ca1d\/raw\/5d11ab07353f41e8e91edd52d517fce3a7c5dd85\/perf.r\" style=\"float:right\" class=\"Link--inTextBlock\">view raw<\/a>\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/f3b9222c54256b6531a8696c9b76ca1d#file-perf-r\" class=\"Link--inTextBlock\">\n          perf.r\n        <\/a>\n        hosted with &#10084; by <a class=\"Link--inTextBlock\" href=\"https:\/\/github.com\">GitHub<\/a>\n      <\/div>\n    <\/div>\n<\/div>\n\n\n\n\n<p>To compare, we will also solve the same problem using the Ordinary Least Squares method and then compare their results.<\/p>\n\n\n\n<p><strong>Train the model<\/strong><\/p>\n\n\n\n<style>.gist table { margin-bottom: 0; }<\/style><div style=\"tab-size: 8\" id=\"gist97518422\" class=\"gist\">\n    <div class=\"gist-file\" translate=\"no\" data-color-mode=\"light\" data-light-theme=\"light\">\n      <div class=\"gist-data\">\n        \n<div class=\"js-gist-file-update-container js-task-list-container\">\n      <div id=\"file-train-model2-r\" class=\"file my-2\">\n    \n    <div itemprop=\"text\"\n      class=\"Box-body p-0 blob-wrapper data type-r  \"\n      style=\"overflow: auto\" tabindex=\"0\" role=\"region\"\n      aria-label=\"train model2.r content, created by mfmakahiya on 07:53AM on July 30, 2019.\"\n    >\n\n        \n<div class=\"js-check-hidden-unicode js-blob-code-container blob-code-content\">\n\n  <template class=\"js-file-alert-template\">\n  <div data-view-component=\"true\" class=\"flash flash-warn flash-full d-flex flex-items-center\">\n  <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n    <span>\n      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.\n      <a class=\"Link--inTextBlock\" href=\"https:\/\/github.co\/hiddenchars\" target=\"_blank\">Learn more about bidirectional Unicode characters<\/a>\n    <\/span>\n\n\n  <div data-view-component=\"true\" class=\"flash-action\">        <a href=\"{{ revealButtonHref }}\" data-view-component=\"true\" class=\"btn-sm btn\">    Show hidden characters\n<\/a>\n<\/div>\n<\/div><\/template>\n<template class=\"js-line-alert-template\">\n  <span aria-label=\"This line has hidden Unicode characters\" data-view-component=\"true\" class=\"line-alert tooltipped tooltipped-e\">\n    <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n<\/span><\/template>\n\n  <table data-hpc class=\"highlight tab-size js-file-line-container\" data-tab-size=\"4\" data-paste-markdown-skip data-tagsearch-path=\"train model2.r\">\n        <tr>\n          <td id=\"file-train-model2-r-L1\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"1\"><\/td>\n          <td id=\"file-train-model2-r-LC1\" class=\"blob-code blob-code-inner js-file-line\"># Train the model (Logistic regression)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model2-r-L2\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"2\"><\/td>\n          <td id=\"file-train-model2-r-LC2\" class=\"blob-code blob-code-inner js-file-line\">lmmod = lm(diagnosis ~ . , data = train[,2:32])<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model2-r-L3\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"3\"><\/td>\n          <td id=\"file-train-model2-r-LC3\" class=\"blob-code blob-code-inner js-file-line\">summary(lmmod)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model2-r-L4\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"4\"><\/td>\n          <td id=\"file-train-model2-r-LC4\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-train-model2-r-L5\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"5\"><\/td>\n          <td id=\"file-train-model2-r-LC5\" class=\"blob-code blob-code-inner js-file-line\">coeftest(lmmod, vcov. = vcovHC, type = &quot;HC1&quot;)<\/td>\n        <\/tr>\n  <\/table>\n<\/div>\n\n\n    <\/div>\n\n  <\/div>\n\n<\/div>\n\n      <\/div>\n      <div class=\"gist-meta\">\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/732dadb51bb686e0611611149c27ca74\/raw\/bbaf7ebe378e10a07bf903590ba1eba0c7a10deb\/train%20model2.r\" style=\"float:right\" class=\"Link--inTextBlock\">view raw<\/a>\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/732dadb51bb686e0611611149c27ca74#file-train-model2-r\" class=\"Link--inTextBlock\">\n          train model2.r\n        <\/a>\n        hosted with &#10084; by <a class=\"Link--inTextBlock\" href=\"https:\/\/github.com\">GitHub<\/a>\n      <\/div>\n    <\/div>\n<\/div>\n\n\n\n\n<p><strong>Predict<\/strong><\/p>\n\n\n\n<style>.gist table { margin-bottom: 0; }<\/style><div style=\"tab-size: 8\" id=\"gist97518444\" class=\"gist\">\n    <div class=\"gist-file\" translate=\"no\" data-color-mode=\"light\" data-light-theme=\"light\">\n      <div class=\"gist-data\">\n        \n<div class=\"js-gist-file-update-container js-task-list-container\">\n      <div id=\"file-predict2-r\" class=\"file my-2\">\n    \n    <div itemprop=\"text\"\n      class=\"Box-body p-0 blob-wrapper data type-r  \"\n      style=\"overflow: auto\" tabindex=\"0\" role=\"region\"\n      aria-label=\"predict2.r content, created by mfmakahiya on 07:54AM on July 30, 2019.\"\n    >\n\n        \n<div class=\"js-check-hidden-unicode js-blob-code-container blob-code-content\">\n\n  <template class=\"js-file-alert-template\">\n  <div data-view-component=\"true\" class=\"flash flash-warn flash-full d-flex flex-items-center\">\n  <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n    <span>\n      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.\n      <a class=\"Link--inTextBlock\" href=\"https:\/\/github.co\/hiddenchars\" target=\"_blank\">Learn more about bidirectional Unicode characters<\/a>\n    <\/span>\n\n\n  <div data-view-component=\"true\" class=\"flash-action\">        <a href=\"{{ revealButtonHref }}\" data-view-component=\"true\" class=\"btn-sm btn\">    Show hidden characters\n<\/a>\n<\/div>\n<\/div><\/template>\n<template class=\"js-line-alert-template\">\n  <span aria-label=\"This line has hidden Unicode characters\" data-view-component=\"true\" class=\"line-alert tooltipped tooltipped-e\">\n    <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n<\/span><\/template>\n\n  <table data-hpc class=\"highlight tab-size js-file-line-container\" data-tab-size=\"4\" data-paste-markdown-skip data-tagsearch-path=\"predict2.r\">\n        <tr>\n          <td id=\"file-predict2-r-L1\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"1\"><\/td>\n          <td id=\"file-predict2-r-LC1\" class=\"blob-code blob-code-inner js-file-line\"># Predict the test set using the model<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L2\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"2\"><\/td>\n          <td id=\"file-predict2-r-LC2\" class=\"blob-code blob-code-inner js-file-line\">pred_ols = predict(lmmod, test[,3:32], type=&quot;response&quot;)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L3\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"3\"><\/td>\n          <td id=\"file-predict2-r-LC3\" class=\"blob-code blob-code-inner js-file-line\">pred_ols<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L4\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"4\"><\/td>\n          <td id=\"file-predict2-r-LC4\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L5\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"5\"><\/td>\n          <td id=\"file-predict2-r-LC5\" class=\"blob-code blob-code-inner js-file-line\"># Apply a threshold<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L6\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"6\"><\/td>\n          <td id=\"file-predict2-r-LC6\" class=\"blob-code blob-code-inner js-file-line\">new_pred_ols = ifelse(pred_ols &gt;= 0.5, 1, 0)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L7\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"7\"><\/td>\n          <td id=\"file-predict2-r-LC7\" class=\"blob-code blob-code-inner js-file-line\">new_pred_ols = data.frame(new_pred_ols)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L8\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"8\"><\/td>\n          <td id=\"file-predict2-r-LC8\" class=\"blob-code blob-code-inner js-file-line\">data_ols = cbind(test[,2], new_pred_ols)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L9\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"9\"><\/td>\n          <td id=\"file-predict2-r-LC9\" class=\"blob-code blob-code-inner js-file-line\">names(data_ols) = c(&quot;actual&quot;, &quot;pred&quot;)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L10\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"10\"><\/td>\n          <td id=\"file-predict2-r-LC10\" class=\"blob-code blob-code-inner js-file-line\">xtab_ols = table(data_ols$actual, data_ols$pred)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L11\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"11\"><\/td>\n          <td id=\"file-predict2-r-LC11\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-predict2-r-L12\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"12\"><\/td>\n          <td id=\"file-predict2-r-LC12\" class=\"blob-code blob-code-inner js-file-line\">cm_ols = confusionMatrix(xtab_ols)<\/td>\n        <\/tr>\n  <\/table>\n<\/div>\n\n\n    <\/div>\n\n  <\/div>\n\n<\/div>\n\n      <\/div>\n      <div class=\"gist-meta\">\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/d176a80a9e618bfe329043d91a2cfb6a\/raw\/a6d4367c529652850027545d2bfec68014edb54b\/predict2.r\" style=\"float:right\" class=\"Link--inTextBlock\">view raw<\/a>\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/d176a80a9e618bfe329043d91a2cfb6a#file-predict2-r\" class=\"Link--inTextBlock\">\n          predict2.r\n        <\/a>\n        hosted with &#10084; by <a class=\"Link--inTextBlock\" href=\"https:\/\/github.com\">GitHub<\/a>\n      <\/div>\n    <\/div>\n<\/div>\n\n\n\n\n<p><strong>Check performance<\/strong><\/p>\n\n\n\n<style>.gist table { margin-bottom: 0; }<\/style><div style=\"tab-size: 8\" id=\"gist97518458\" class=\"gist\">\n    <div class=\"gist-file\" translate=\"no\" data-color-mode=\"light\" data-light-theme=\"light\">\n      <div class=\"gist-data\">\n        \n<div class=\"js-gist-file-update-container js-task-list-container\">\n      <div id=\"file-perf2-r\" class=\"file my-2\">\n    \n    <div itemprop=\"text\"\n      class=\"Box-body p-0 blob-wrapper data type-r  \"\n      style=\"overflow: auto\" tabindex=\"0\" role=\"region\"\n      aria-label=\"perf2.r content, created by mfmakahiya on 07:56AM on July 30, 2019.\"\n    >\n\n        \n<div class=\"js-check-hidden-unicode js-blob-code-container blob-code-content\">\n\n  <template class=\"js-file-alert-template\">\n  <div data-view-component=\"true\" class=\"flash flash-warn flash-full d-flex flex-items-center\">\n  <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n    <span>\n      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.\n      <a class=\"Link--inTextBlock\" href=\"https:\/\/github.co\/hiddenchars\" target=\"_blank\">Learn more about bidirectional Unicode characters<\/a>\n    <\/span>\n\n\n  <div data-view-component=\"true\" class=\"flash-action\">        <a href=\"{{ revealButtonHref }}\" data-view-component=\"true\" class=\"btn-sm btn\">    Show hidden characters\n<\/a>\n<\/div>\n<\/div><\/template>\n<template class=\"js-line-alert-template\">\n  <span aria-label=\"This line has hidden Unicode characters\" data-view-component=\"true\" class=\"line-alert tooltipped tooltipped-e\">\n    <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n<\/span><\/template>\n\n  <table data-hpc class=\"highlight tab-size js-file-line-container\" data-tab-size=\"4\" data-paste-markdown-skip data-tagsearch-path=\"perf2.r\">\n        <tr>\n          <td id=\"file-perf2-r-L1\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"1\"><\/td>\n          <td id=\"file-perf2-r-LC1\" class=\"blob-code blob-code-inner js-file-line\"># Get performance measures<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-perf2-r-L2\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"2\"><\/td>\n          <td id=\"file-perf2-r-LC2\" class=\"blob-code blob-code-inner js-file-line\">overall_accuracy_ols = cm_ols$overall[&#39;Accuracy&#39;]<\/td>\n        <\/tr>\n  <\/table>\n<\/div>\n\n\n    <\/div>\n\n  <\/div>\n\n<\/div>\n\n      <\/div>\n      <div class=\"gist-meta\">\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/2b799ca47f59157ffd5a2f0642bca432\/raw\/ec0b6e54a6f4642dfd5f7e69d630dd79c41bf53f\/perf2.r\" style=\"float:right\" class=\"Link--inTextBlock\">view raw<\/a>\n        <a href=\"https:\/\/gist.github.com\/mfmakahiya\/2b799ca47f59157ffd5a2f0642bca432#file-perf2-r\" class=\"Link--inTextBlock\">\n          perf2.r\n        <\/a>\n        hosted with &#10084; by <a class=\"Link--inTextBlock\" href=\"https:\/\/github.com\">GitHub<\/a>\n      <\/div>\n    <\/div>\n<\/div>\n\n\n\n\n<p><\/p>\n\n\n\n<p>Now, comparing the accuracy of the two methods, Lasso got 166\/171 correctly giving a 97.01% accuracy, while ordinary least squares got 162\/171 correct predictions giving a 94.74%. However, since we are expecting this kind of performance because of the distribution of benign-to-malignant cases, let us look at the F1 of both models. This is to put equal importance on the number of False Positive (or non-malignant cases being classified as malignant) and False Negative (or malignant cases being classified as non-malignant) as they are both significant in our cancer problem. We want to, as much as possible, minimize the misclassifications as the classification determine what specific care or health measure should be provided to the patient. Looking at F1, Lasso gave us a 97.70% while ordinary least-squares gave us 95.90%. Again, Lasso outperformed the least-squares method.<\/p>\n\n\n\n<p>It might seem that the two has almost the same performance and that we can just use either of the two for this specific problem. However, if we dig into what the model looks like and how they were formulated, we can easily see the significant difference between the two methods.<\/p>\n\n\n\n<p>Examining the OLS model, all the input variables in the dataset are considered in the model. Please refer to the image below for the coefficients.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"678\" height=\"717\" src=\"https:\/\/spectdata.com\/wp-content\/uploads\/2019\/07\/OLS.jpg\" alt=\"\" class=\"wp-image-625\" srcset=\"https:\/\/spectdata.com\/wp-content\/uploads\/2019\/07\/OLS.jpg 678w, https:\/\/spectdata.com\/wp-content\/uploads\/2019\/07\/OLS-284x300.jpg 284w, https:\/\/spectdata.com\/wp-content\/uploads\/2019\/07\/OLS-648x685.jpg 648w, https:\/\/spectdata.com\/wp-content\/uploads\/2019\/07\/OLS-182x192.jpg 182w\" sizes=\"auto, (max-width: 678px) 100vw, 678px\" \/><figcaption>ordinary least squares model<\/figcaption><\/figure>\n\n\n\n<p>Now, looking at the Lasso model, we will notice that there are only a few variables being taken into account in the model (only 11\/30 independent variables). The rest are ignored or treated by the model as not significant in the outcome of the dependent variable. Yet, the accuracy of the model is at around 97%, even exceeding the model which takes into account all the independent variables! Refer to the below image for the model.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"381\" height=\"267\" src=\"https:\/\/spectdata.com\/wp-content\/uploads\/2019\/07\/Lasso.jpg\" alt=\"\" class=\"wp-image-626\" srcset=\"https:\/\/spectdata.com\/wp-content\/uploads\/2019\/07\/Lasso.jpg 381w, https:\/\/spectdata.com\/wp-content\/uploads\/2019\/07\/Lasso-300x210.jpg 300w, https:\/\/spectdata.com\/wp-content\/uploads\/2019\/07\/Lasso-274x192.jpg 274w\" sizes=\"auto, (max-width: 381px) 100vw, 381px\" \/><figcaption>Lasso regression<\/figcaption><\/figure>\n\n\n\n<p>We found that mean texture, mean concave points, mean fractal dimension, standard error in radius, standard error in fractal dimension, worst radius, worst texture, worst smoothness, worst concavity, worst concave points, and worst symmetry, altogether, strongly identifies whether cell nuclei in a breast mass is benign or malignant. <\/p>\n\n\n\n<p>What the above is telling us is that, sometimes, it is necessary to let go of other variables that are making the model unstable. Because these noisy\/irrelevant variables encourage the model to fit to noise, also known as overfitting.<\/p>\n\n\n\n<p>Let\u2019s look at the significant features of LASSO why it\nworked better than OLS in this specific case. As mentioned from the beginning,\none important feature of LASSO is variable selection. Lasso selects only the significant\nvariables in the model. If we will have a closer look at the data that we have,\nwe will notice that there are a lot of predictors and that some of the\nindependent variables are actually related to one another or we can group them.\nThis actually already give us a hint that it might be necessary to remove some\nof the variables.<\/p>\n\n\n\n<p>Getting predictions, it is, therefore, easier to get predictions as we need to prepare fewer features during inference. Unlike in OLS where we have to input all the values from the dataset in order to obtain the response value.<\/p>\n\n\n\n<p>Lastly, let us summarise the important characteristics of Lasso in general. Lasso is a supervised algorithm wherein the process identifies the variables that are strongly associated with the response variable. This is called variable selection. Then, Lasso forces the coefficients of the variables towards zero. This is now the process of shrinkage. This is to make the model less sensitive to the new data set. These processes help alleviate the limits of human cognition as fewer input variables are selected.<\/p>\n\n\n\n<p>If you would like to learn more about Lasso regression, I recommend taking a course in Coursera [<a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.coursera.org\/lecture\/machine-learning-data-analysis\/what-is-lasso-regression-0KIy7\" target=\"_blank\">3<\/a>] or just reading through this [<a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/towardsdatascience.com\/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b\" target=\"_blank\">4<\/a>].<\/p>\n\n\n\n<p>That&#8217;s all for the post. We&#8217;d love to hear your thoughts on these articles and anything else data related. SpectData is a boutique Data Science Consultancy with a niche in Artificial Intelligence and Natural Language Processing. This article is written by our Data Scientist, Marriane M.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data analysts and data scientists use different regression methods for different kinds of analytics problems. From the simplest ones to the most complex ones. One of the most talked-about methods is the Lasso. Lasso was often described as one of the most useful linear regression tools and we are about to find out why. LASSO &hellip; <a href=\"https:\/\/spectdata.com\/index.php\/2019\/08\/08\/variable-selection-using-lasso\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Variable selection using LASSO<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":650,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[5],"tags":[],"class_list":["post-620","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorials"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/spectdata.com\/wp-content\/uploads\/2019\/08\/rope-620529_640.jpg","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9PSm6-a0","_links":{"self":[{"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/posts\/620","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/comments?post=620"}],"version-history":[{"count":22,"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/posts\/620\/revisions"}],"predecessor-version":[{"id":655,"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/posts\/620\/revisions\/655"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/media\/650"}],"wp:attachment":[{"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/media?parent=620"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/categories?post=620"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/spectdata.com\/index.php\/wp-json\/wp\/v2\/tags?post=620"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}