-
Notifications
You must be signed in to change notification settings - Fork 3
/
NBA.tpl
230 lines (200 loc) · 7.7 KB
/
NBA.tpl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
<!--head
meta:
title: NBA players in 2012-13
description: MDS and k-means clustering of player sport-statistics in chosen NBA team.
author: Rapporter Team (@rapporter)
packages:
- cluster
- directlabels
- fpc
- ggplot2
inputs:
- required: yes
class: character
name: teamname
label: Team Name
standalone: yes
value: Miami Heat
length:
min: 1.0
max: 1.0
description: Name of the Team
matchable: yes
options:
- Atlanta Hawks
- Boston Celtics
- Brooklyn Nets
- Charlotte Hornets
- Chicago Bulls
- Cleveland Cavaliers
- Dallas Mavericks
- Denver Nuggets
- Detroit Pistons
- Golden State Warriors
- Houston Rockets
- Indiana Pacers
- Los Angeles Clippers
- Los Angeles Lakers
- Memphis Grizzles
- Miami Heat
- Milwaukee Bucks
- Minnesota Timberwolves
- New Orleans Hornets
- New York Knicks
- Oklahoma City Thunder
- Orlando Magic
- Philadelphia 76ers
- Phoenix Suns
- Portland TrailBlazers
- Sacramento Kings
- San Antonio Spurs
- Toronto Raptors
- Utah Jazz
- Washington Wizards
allow_multiple: no
- required: yes
class: character
name: variables
label: Variables
standalone: yes
value: ~
length:
min: 2.0
max: 40.0
description: Calculations are based on these variables
matchable: yes
options:
- Age
- Games Played
- Games Started
- Minutes Played
- Field Goals
- Field Goal Attempts
- Three-Point Field Goals
- Three-Point Field Goal Attempts
- Free Throws
- Free Throw Attempts
- Offensive Rebounds
- Defensive Rebounds
- Total Rebounds
- Assists
- Steals
- Blocks
- Turnovers
- Personal Faults
- Points
- Field Goal Percentage
- Three-Point Field Goal Percentage
- Free Throw Percentage
allow_multiple: yes
- required: no
class: integer
name: clust_num
label: Number of Clusters
standalone: yes
value: ~
length:
min: 1.0
max: 1.0
description: One can set the Number of the Clusters for the K-Mean Clustering
limit:
min: 1.0
max: 5.0
head-->
Below you are able to check how the <%=teamname%> build up. Later you can answer the question: are there any statistical base of grouping the players in this team?
It can be interesting to see if there were cliques for the best players, mostly the players differ according to their positions or there are other effects which could be detected.
Rapporter will present you two ways to observe that:
- [Multidimensional-scaling (MDS)](http://en.wikipedia.org/wiki/Multidimensional_scaling)
- [K-means Clustering](http://en.wikipedia.org/wiki/K-means_clustering)
## MDS
In this part one can check that how <%=teamname%> looks in a two-dimensional view, so which players are similar to each other and which aren't.
The distances were computed based on the standardized form of the variables: <%= variables%>
<%=
nba <- rp.data
player_name <- nba[nba$Team == teamname,1]
Position <- nba[nba$Team == teamname,2]
nba <- nba[nba$Team == teamname, variables]
rownames(nba) <- player_name
nba <- scale(na.omit(nba))
d <- dist(nba)
fit <- cmdscale(d, eig=TRUE, k = 2)
x <- fit$points[, 1]
y <- fit$points[, 2]
plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2", xlim=c((min(x)*1.15),(max(x)*1.15)),ylim=c((min(y)*1.15),(max(y)*1.15)), main="Metric MDS", type="n")
+text(x, y, labels = row.names(nba), cex=1.1, col=panderOptions("graph.colors")[as.numeric(Position)])
%>
You can see the players colored by their positions:
* <span style="color:<%=panderOptions("graph.colors")[3] %> ;"> Point Guard </span>
* <span style="color:<%=panderOptions("graph.colors")[5] %> ;"> Shooting Guard </span>
* <span style="color:<%=panderOptions("graph.colors")[4] %> ;"> Small Forward </span>
* <span style="color:<%=panderOptions("graph.colors")[2] %> ;"> Power Forward </span>
* <span style="color:<%=panderOptions("graph.colors")[1] %> ;"> Center </span>
<%=
distance <- as.matrix(d)
maxind <- which(distance == max(distance), arr.ind = TRUE)
minind <- which(distance == min(distance[distance!=min(distance)]), arr.ind = TRUE)
%>
<%=
furthest <- colnames(distance)[which(colSums(distance) == max(colSums(distance)))]
nearest <- colnames(distance)[which(colSums(distance) == min(colSums(distance)))]%>
<%=furthest %> differ<%= ifelse(length(furthest) > 1, "", "s")%> the most from the others, and <%=nearest%> seem<%= (ifelse (length(nearest) > 1, "", "s"))%> to be the most "common" player<%= (ifelse (length(nearest) > 1, "s", ""))%>, which <%=ifelse(length(nearest) > 1, "lies", "lie") %> nearest to all other players.
<%=
distance[upper.tri(distance, diag = T)] <- NA
h <- NULL
notneeded <- apply(data.frame(unique(as.vector(sort(distance[lower.tri(distance)], decreasing = T))[1:5])), 1, function(i) h <<- rbind(h, which(distance == i, arr.ind = T)))
j <- NULL
notneeded <- apply(data.frame(unique(as.vector(sort(distance[lower.tri(distance)], decreasing = F))[1:5])), 1, function(i) j <<- rbind(j, which(distance == i, arr.ind = T)))
%>
#### Outsider Pairs
<%=paste0(p(c(rownames(distance)[h[1, 1]], colnames(distance)[h[1, 2]])), ' (', round(distance[h[1, 1], h[1, 2]], 2), ')')%> are the "furthest", <%=paste0(p(c(rownames(distance)[j[1, 1]], colnames(distance)[j[1, 2]])), ' (', round(distance[j[1, 1], j[1, 2]], 2), ')') %> are the "nearest" to each other.
#### In General
Now let's see which players can be said statistically far/similar to each other in general. The 5 pairs with the biggest differences and the 5 pairs with the smallest differences will be presented. In the brackets you can see the amount of the distances between two players.
According to the used variables the 5 furthest pairs are:
<%=
paste(pander.return(lapply(1:5, function(i) paste0(p(c(rownames(distance)[h[i, 1]], colnames(distance)[h[i, 2]])), ' (', round(distance[h[i, 1], h[i, 2]], 2), ')'))), collapse = '\n')%>
According to the used variables the 5 nearest pairs are:
<%=
paste(pander.return(lapply(1:5, function(i) paste0(p(c(rownames(distance)[j[i, 1]], colnames(distance)[j[i, 2]])), ' (', round(distance[j[i, 1], j[i, 2]], 2), ')'))), collapse = '\n') %>
## K-Means Clustering
Step onto an other field, let's see how a data-mining method would group these players. The K-Means Clustering process gives us an estimation which players form a group.
<%=
if (exists('clust_num') && !is.null(clust_num) && clust_num > 0) {
cn <- tryCatch(pam(nba, clust_num), error = function(e) e)
fit <- kmeans(scale(nba), clust_num)
} else {
cn <- tryCatch(pamk(nba), error = function(e) e)
if (cn$nc > 5) {
clust_num <- 5
cn <- tryCatch(pam(nba, clust_num), error = function(e) e)
fit <- kmeans(scale(nba), clust_num)
} else {
fit <- kmeans(scale(nba), cn$nc)
cn <- cn$pamobject
}
}
%>
The size of these clusters are: <%= fit$size %> .
In the first group we can find <%=rownames(nba)[fit$cluster == 1]%>.
<%
if (max(fit$cluster) > 1) { %>
In the second there are: <%=rownames(nba)[fit$cluster == 2]%>.
<% } %>
<%
if (max(fit$cluster) > 2) { %>
The third goup contains: <%=rownames(nba)[fit$cluster == 3]%>.
<% } %>
<%
if (max(fit$cluster) > 3) { %>
The following players are in the fourth group: <%=rownames(nba)[fit$cluster == 4]%>.
<% } %>
<%
if (max(fit$cluster) > 4) { %>
The players who are in the fifth group: <%=rownames(nba)[fit$cluster == 5]
%>.
<% }
%>
Check these groups on a map (the purple lines join the centers of the groups):
<%=
clusplot(cn, fit$cluster, color = TRUE, shade = TRUE, labels = 2, lines = 1, main = '', col.p = 'black', col.clus = panderOptions('graph.colors'))
%>
Are the position and grouping of the players fit your prior expectations? :)