Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Galois evaluation #1

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ src/naive_bayes_simp.hk:

src/NaiveBayes.hs: src/naive_bayes_simp.hk
# compile src/naive_bayes_simp.hk -o src/NaiveBayes.hs -M NaiveBayes
summary --logfloat-prelude src/naive_bayes_simp.hk -o src/NaiveBayes.hs -M NaiveBayes
# summary --logfloat-prelude src/naive_bayes_simp.hk -o src/NaiveBayes.hs -M NaiveBayes

build: src/NaiveBayes.hs data
stack build
Expand Down
5 changes: 4 additions & 1 deletion app/TCP.hs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
module Main where

import System.Environment (getArgs)
import qualified Data.ByteString.Char8 as B
import News (getNews)
import qualified System.Random.MWC as MWC
Expand All @@ -13,7 +14,9 @@ import Data.List (sort)
import Data.Number.LogFloat

main = do
(words, docs, topics) <- getNews (Just 10) [0..]
args <- getArgs
let n = (read (head args)) :: Int
(words, docs, topics) <- getNews (Just n) [0..]
g <- MWC.create
let
zPrior = onesFrom topics
Expand Down
6 changes: 4 additions & 2 deletions confusion.R
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
args <- commandArgs(TRUE)
arg <- args[1]
# Make sure we have all the packages we need
deps <- c("ggplot2", "reshape2")
new.packages <- deps[!(deps %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
for(dep in deps) do.call("library",list(dep))

conf <- table(read.csv("./nb-confusion.csv"))
conf <- table(read.csv(paste(arg, "csv", sep=".")))

g <- ggplot(as.data.frame(conf ), aes(y=true, x=predicted, fill=Freq))+geom_tile()
ggsave("nb-confusion.pdf", plot=g)
ggsave(paste(arg, "png", sep="."), plot=g, width=4, height=3, dpi=300)
20 changes: 20 additions & 0 deletions eval/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Evaluation results
Inital evaluation performed by Max Orhai at Galois, May 2017.

## Method
On a MacBook Pro I timed and ran the submitted solution 100 times, with training sets ranging from 1 to 100 documents per topic.
The captured output was [scored](score.py) for accuracy.
The scores and timing are summarized in a [CSV](scores.csv).

## Accuracy
The classifier performs well, attaining 60% accuracy after only 20 sampled documents per topic, and approaching 80% when given more training data.

The animation below shows the improvement as the size of the training set ranges from 1 to 100 documents for each of the 20 newsgroups.

![animated confusion matrix](nb-confusion.gif)

## Performance
As expected, run times are quadratic in the input size.
The plot below shows the (dimensionless) proportion of correctly classified documents along with the number of seconds *per document*, so it appears linear.

![performance plot](performance.png)
21 changes: 21 additions & 0 deletions eval/csv/nb-confusion-001.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
true, predicted
0, 6
1, 6
2, 14
3, 15
4, 3
5, 5
6, 11
7, 18
8, 17
9, 10
10, 9
11, 6
12, 15
13, 9
14, 12
15, 12
16, 15
17, 19
18, 15
19, 17
41 changes: 41 additions & 0 deletions eval/csv/nb-confusion-002.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
true, predicted
0, 6
0, 19
1, 6
1, 17
2, 15
2, 9
3, 2
3, 8
4, 17
4, 0
5, 5
5, 5
6, 8
6, 1
7, 13
7, 7
8, 13
8, 6
9, 9
9, 10
10, 9
10, 10
11, 11
11, 11
12, 15
12, 6
13, 2
13, 18
14, 18
14, 3
15, 18
15, 0
16, 16
16, 16
17, 18
17, 18
18, 10
18, 17
19, 0
19, 7
61 changes: 61 additions & 0 deletions eval/csv/nb-confusion-003.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
true, predicted
0, 0
0, 0
0, 18
1, 6
1, 17
1, 15
2, 15
2, 9
2, 3
3, 2
3, 8
3, 19
4, 3
4, 0
4, 7
5, 5
5, 6
5, 2
6, 8
6, 13
6, 15
7, 4
7, 7
7, 9
8, 17
8, 6
8, 19
9, 9
9, 10
9, 9
10, 9
10, 10
10, 10
11, 11
11, 11
11, 11
12, 12
12, 6
12, 15
13, 2
13, 18
13, 6
14, 12
14, 14
14, 14
15, 18
15, 0
15, 0
16, 16
16, 16
16, 16
17, 18
17, 17
17, 17
18, 0
18, 2
18, 10
19, 0
19, 0
19, 0
81 changes: 81 additions & 0 deletions eval/csv/nb-confusion-004.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
true, predicted
0, 0
0, 0
0, 9
0, 0
1, 6
1, 17
1, 3
1, 3
2, 4
2, 9
2, 3
2, 3
3, 2
3, 8
3, 19
3, 1
4, 3
4, 13
4, 7
4, 2
5, 5
5, 6
5, 1
5, 6
6, 8
6, 13
6, 15
6, 18
7, 10
7, 4
7, 16
7, 7
8, 4
8, 8
8, 12
8, 8
9, 9
9, 10
9, 9
9, 9
10, 9
10, 10
10, 10
10, 10
11, 11
11, 11
11, 11
11, 11
12, 12
12, 6
12, 15
12, 10
13, 14
13, 18
13, 6
13, 15
14, 12
14, 2
14, 14
14, 14
15, 13
15, 0
15, 0
15, 19
16, 16
16, 16
16, 16
16, 16
17, 18
17, 17
17, 17
17, 17
18, 0
18, 13
18, 18
18, 18
19, 0
19, 0
19, 0
19, 9
101 changes: 101 additions & 0 deletions eval/csv/nb-confusion-005.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
true, predicted
0, 0
0, 0
0, 9
0, 0
0, 17
1, 6
1, 17
1, 3
1, 3
1, 11
2, 1
2, 3
2, 3
2, 3
2, 3
3, 2
3, 8
3, 19
3, 2
3, 4
4, 3
4, 0
4, 7
4, 2
4, 3
5, 5
5, 6
5, 1
5, 2
5, 1
6, 8
6, 13
6, 15
6, 18
6, 8
7, 7
7, 7
7, 16
7, 7
7, 7
8, 12
8, 6
8, 7
8, 8
8, 9
9, 9
9, 10
9, 9
9, 9
9, 9
10, 9
10, 10
10, 10
10, 10
10, 9
11, 11
11, 11
11, 11
11, 11
11, 8
12, 12
12, 6
12, 15
12, 10
12, 6
13, 7
13, 0
13, 6
13, 15
13, 3
14, 0
14, 15
14, 14
14, 14
14, 14
15, 13
15, 0
15, 0
15, 19
15, 19
16, 16
16, 16
16, 16
16, 16
16, 15
17, 18
17, 17
17, 17
17, 17
17, 17
18, 0
18, 2
18, 18
18, 18
18, 18
19, 0
19, 0
19, 0
19, 9
19, 0
Loading