forked from cloudera/flume
-
Notifications
You must be signed in to change notification settings - Fork 0
/
DEVNOTES
408 lines (296 loc) · 12.6 KB
/
DEVNOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
Flume Developer Notes
=====================
Jonathan Hsieh <[email protected]>
6/22/11
// This is in asciidoc markup
== Introduction
This is meant to be a a guide for issues that occur when building,
debugging and setting up Flume as developer.
== High level directory and file structure.
Flume uses the Maven build system and has a Maven project object model
(pom) that has many components broken down into Maven modules. Below
we describe the contents of different directories.
----
./bin/ Flume startup scripts
./conf/ Flume configuration file samples
./contrib/flogger Flume logger: a Flume client implemented in C
./docs/man Flume man pages
./flume-config-web Flume master configuration servlet module
./flume-core Flume core module
./flume-distribution Flume distribution package module
./flume-docs Flume documentation generation module
./flume-log4j-appender Flume log4j-avro appender module
./flume-microbenchmarks Flume performance microbenchmark test suite
./flume-node-web Flume node status servlet module
./flume-windows-dist Flume node Windows distribution package module
./plugins/ Flume plugin modules (hello world skeleton and hbase)
./src/javaperf Flume performance tests (out of date)
./src/javatest-torture Flume reliability tests (out of date)
----
The files exclusions in `.gitignore` are either autogenerated by Maven or Eclipse.
== Building and Testing Flume
=== Prerequisites
There are several tools required to do a full build of Flume but only
the Thrift compiler is required for development and testing builds.
To build documentation, you will need to have asciidoc installed.
To build Windows installers, you will need to have makensis installed.
==== Building Thrift
The Thrift compiler is required to build Flume and currently does not
have a binary packages avaiblle for Linux based platforms. (Windows
is available in binary). There are several requirements necesary to
build it. Here's a link to the requirements
http://wiki.apache.org/thrift/ThriftRequirements
This page also contains links explaining how to install the
requirements for various platforms.
=== Using Maven
We are using Maven v2.x.x. The Maven build system steps through
several phases to create build artefacts. At the highest level, the
phases that are relevent to most devs are "compile" -> "test" ->
"package" -> "install".
There are several options and "profiles" available in the Flume build.
The default profile is a "dev" profile. Below we include a examples
of common build command lines to build different profiles.
A development build that runs unit tests and installs to local Maven
repo. This builds and tests all plugins, but excludes modules that
have aren't needed during development (eg. Windows installer,
documentation).
----
mvn install
----
A development build that skips the execution of unit tests.
----
mvn install -DskipTests
----
A development build that runs unit tests. (no package generation)
----
mvn test
----
A development build that runs unit tests including only specific tests
(where <TestFile> is a regex of a class name without .java or .class
or path).
----
mvn test -Dtest=<ClassRegex>
----
Window node build, skipping unit tests (requires makensis).
NOTE: makensis is available on Linux and Mac OS X homebrew so this can
be built while running in these operating systems.
----
mvn install -Pwindows -DskipTests
----
Full build, skipping unit tests (requires asciidoc), and does not build Windows.
----
mvn install -Pfull-build -DskipTests
----
Full build, make both docs and Windows.
----
mvn install -Pfull-build,windows
----
==== Pointing the Maven build at the proper Thrift executable
Flume has, over time, upgraded to newer versions of Thrift. The Maven
build requires a pointer to the proper Thrift compiler.
If you install Thrift in a non-standard location (not
/usr/local/thrift/bin), you will need to provide the build some extra
information. This may be the case if you overrode the standard Thrift
install (+make install+ 's default target) or are running Thrift from
a home directory.
One way to provide this is via the Maven command line by setting the
thrift.executable variable (this assumes that we made different dirs
for different versions of Thrift):
----
mvn install -Dthrift.executable=/usr/local/thrift-0.6.0/bin/thrift
----
Another way to provide this information to your Maven build is to
modifiy your Maven profile by adding/modifiying your
~/.m2/settings.xml file and overriding the default thrift.executable
setting to point to your Thrift compiler executable. In the example
below, we install different versions of the Thrift compiler in
different directories and thus need to change the setting.
----
<settings>
<profiles>
<profile>
<id>flume</id>
<properties>
<thrift.executable>/usr/local/thrift-0.6.0/bin/thrift</thrift.executable>
</properties>
</profile>
</profiles>
<activeProfiles>
<activeProfile>flume</activeProfile>
</activeProfiles>
</settings>
----
==== Including or excluding specific sets of tests.
We've added hooks to the maven build that will enable you to exclude
or include specific tests on a test run. This is useful for excluding
flakey tests or making a build that focuses solely upon flakey tests.
To do this we created two variables:
# test.include.pattern
# test.exclude.pattern
These variables take regular expression patterns of the files to be
included or excluded.
For the next set of examples, let's say you have flakey test called
TestFlaky1 and TestFlaky2.
You can execute tests that skip TestFlaky1 and TestFlaky2 by using the
following command line:
----
mvn test -Dtest.exclude.pattern=**/TestFlaky*.java
----
Alternately, you could be more explicit
----
mvn test -Dtest.exclude.pattern=**/TestFlaky1.java,**/TestFlaky2.java
----
Conversely, you could execute only the flaky tests by using:
----
mvn test -Dtest.include.pattern=**/TestFlaky*.java
----
You can also have a combination of imports and exports. This runs
TestFlaky* but skips over TestFlaky2:
----
mvn test -Dtest.include.pattern=**/TestFlaky*.java -Dtest.exclude.pattern=**/TestFlaky2.java
----
NOTE: Both test.exclude.pattern and test.include.pattern get
overridden if the test parameter is used. Consider:
----
mvn test -Dtest.exclude.pattern=**/TestFlaky*.java -Dtest=TestFlaky1
---
In this case, TestFlaky1 will be run despite being in the
test.exclude.pattern.
=== Running the most recent build
To run the most recent build of Flume, first build the distribuion
packages.
----
mvn install -DskipTests
----
You can then traverse into
./flume-distribution/target/flume-distribution-<version>-bin/flume-distribution-<version>.
This directory is setup exactly as the tarball installation of Flume
would be.
=== Running Performance Microbenchmarks.
The suite of source and sink microbenchmark tests (located in
./flume-microbenchmarks/javaperf) can be run by using `mvn test -Pperf`.
Just like with the normal test cases, you can use the
`-Dtest=<TestClass>`. So you can do:
----
mvn test -Pperf -Dtest=PerfThriftSinks
----
The logs should output lines that are formatted similarly to these
lines:
----
[junit] nullsink,ubuntu,begin,10998597,552872,disk_loaded,2895851957,301662152,receiver_started,156786445,305698624,sink_started,105303802,305704456,thrift sink to thrift source done,39520160510,320377056,MB/s,4.579940971898899,23094932,320379168
[junit] [ 0us, 547,544 b mem] Starting (after gc)
[junit] [ 10,998,597ns d 10,998,597ns 552,872 b mem] begin
[junit] [ 2,914,443,637ns d 2,895,851,957ns 301,662,152 b mem] disk_loaded
[junit] [ 3,514,297,391ns d 156,786,445ns 305,698,624 b mem] receiver_started
[junit] [ 4,082,661,503ns d 105,303,802ns 305,704,456 b mem] sink_started
[junit] [ 44,235,264,972ns d 39,520,160,510ns 320,377,056 b mem] thrift sink to thrift source done
[junit] [ 44,878,445,315ns d 23,094,932ns 320,379,168 b mem] MB/s,4.579940971898899
----
The first line is a summary of all the information in cvs format. The
other lines are in a tabular, more human-readable form. The left
column is cumulative time in ns and the middle is delta from previous
in ns. The last column of numbers the amount of memory in heap,
followed but some comments or labels.
=== Building on Windows platforms
Building Flume in Windows is possible. One can generate packages and
installer executable on Windows. This build assumes a cygwin
envrionment, but may not require it.
This build requires
* Maven for Windows
* makensis (for Windows installer build)
* java 1.6+
You should be able run the normal mvn commands.
The current Windows installer executable does not handle all error
handling situations and does not checks to see if not run as
administrator.
=== Building documentation
Documentation for Flume is written in asciidoc. It relies on several
libraries to generate images.
* asciidoc v8.5.2
* graphviz (dot) v2.26.3
* xmlto
Documents can be built by running 'mvn -Pfull-build'
== Integrated Development Environments for Flume
Currently most Flume developers use the Eclipse IDE. We have included
some instructions for getting started with Eclipse.
=== Setting up a Flume Eclipse projects from the Maven POMs.
If you use Eclipse we suggest you use the m2eclipse plugin available
here to properly create an environment for dev and testing in Eclipse.
http://m2eclipse.sonatype.org/
After installing it in Eclipse you will want to "Import" the Flume
pom.xml project.
This can be done by going to the Eclipse applications menu, navigating
to File > Import... > Existing Maven Projects. From there, browse to
and select the directory that contains the root of the Flume project.
The build requires the location of the Thrift compiler executable --
see the instructions about .m2/settings.xml files in the building
Flume section for more details.
The flume-core project will have errors -- these can be resolved by manually adding these dirs to you build source dirs:
* ./flume-core/target/generated-sources/antlr3
* ./flume-core/target/generated-sources/avro
* ./flume-core/target/generated-sources/thrift
* ./flume-core/target/generated-sources/version
== Debugging Flume
=== Flume's web applications
The default setup for Flume is to run its servlets from .WAR files
that include precompiled jsps.
On can have the node or master start specfic servlets .WARs, by
pointing the following properties in the system's flume-site.conf
file, like below.
----
<property>
<name>flume.master.webapps.root</name>
<value>webapps/flumemaster.war</value>
<description>
Path where Flume master war lives. If a file it will load the
war, if a dir it will load all *.war in that dir.
</description>
</property>
<property>
<name>flume.node.webapps.root</name>
<value>webapps/flumemaster.war</value>
<description>
Path where Flume node war lives. If a file it will load the
war, if a dir it will load all *.war in that dir.
</description>
</property>
----
// TODO document how to debug JSPs while in Eclipse
== Rules of the Repository
We have a few basic rules for code in the repository.
The master/trunk pointer:
* MUST always build.
* SHOULD always pass all unit tests
When commitng code we tag pushes with JIRA numbers, and their short descriptions.
Generally these are in the following format:
----
FLUME-42: Description from the jira
----
All source files must include the following header (or a variant
depending on comment characters):
----
/**
* Licensed to Cloudera, Inc. under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Cloudera, Inc. licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
----
No build generated files should be checked in. Here are some examples
of generate files that should not be checked:
* html documentation
* thrift-generated source
* avro-generated source
* antlr generated source
* auto-generated versioning annotations