Datasoft Consulting Big data fond bleu

Comment configurer une connexion Flume

  |  Articles   |  Comment configurer une connexion Flume
Datasoft Consulting Big data tuto

Step 1:
ñ Download a set of sample data from
ñ Download the free files

Step 2:
ñ Create a folder
Inside /home/biadmin using mkdir command

ñ Copy the downloaded files to your Biginsights server into above created folder.

Step 3:
Now in the next steps, we will configure flume properties to ingest the files in the above folder into hdfs

ñ Go into /opt/ibm/biginsights/flume/conf
ñ Create copy of and name it flume.conf

cp flume.conf

ñ Create copy of and name it

Step 4:
ñ Edit the file, uncomment the line JAVA_HOME and type the value
JAVA_HOME = /opt/ibm/biginsights/jdk/
ñ Uncomment the line JAVA_OPTS, so that it looks like
JAVA_OPTS= »-Xms100m -Xmx200m »
ñ Leave rest of the file as it is.

Step 5:
ñ Edit flume.conf file to point to source directory (flumeingestion), a sink directory (hdfs dir) and channel which needs to be used to transfer the data. Remove all the content from it and copy this configuration in it. Before copying, please complete the hdfs path by providing the servers ipaddress

agent.sources = flumesource
agent.channels = memoryChannel
agent.sinks = flumeHDFS
# # For each one of the sources, the type is defined
agent.sources.flumesource.type = spooldir
agent.sources.flumesource.spoolDir = /home/biadmin/flumeingestion/
agent.sources.flumesource.bufferMaxLineLength = 80000
# The channel can be defined as follows.
agent.sources.flumesource.channels = memoryChannel
# Each sinkís type must be defined
agent.sinks.flumeHDFS.type = hdfs
agent.sinks.flumeHDFS.hdfs.path = hdfs://
agent.sinks.flumeHDFS.hdfs.fileType = DataStream
# #Format to be written
agent.sinks.flumeHDFS.hdfs.writeFormat = Text
agent.sinks.flumeHDFS.hdfs.maxOpenFiles = 10
# # rollover file based on maximum size of 10 MB
agent.sinks.flumeHDFS.hdfs.rollSize = 10485760
# # never rollover based on the number of events
agent.sinks.flumeHDFS.hdfs.rollCount = 0
# # rollover file based on max time of 1 mi
agent.sinks.flumeHDFS.hdfs.rollInterval = 60
# #Specify the channel the sink should use = memoryChannel
# # Each channelís type is defined.
agent.channels.memoryChannel.type = memory
# # Other config values specific to each type of channel(sink or source)
# # can be defined as well
# # In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 200000
agent.channels.memoryChannel.transactionCapacity = 160000
agent.channels.memoryChannel.byteCapacity = 0
agent.channels.memoryChannel.memoryCapacity = 0

# # connect source and sink = memoryChannel

Step 6:
ñ Create a folder in your hdfs file system which will be our sink directory. In the above configuration, we have specified
as the hdfs path. So create a directory under /user/biadmin and name it flume

Step 5:
ñ Run the command to finally run the flume instance.
You need to get to /opt/ibm/biginsights/flume/bin and then run :

./flume-ng agent -c /opt/ibm/biginsights/flume/conf -f /opt/ibm/biginsights/flume/conf/flume.conf -n agent

Step 6:
ñ Now your flume instance is running and it has started to copy data from flumeingestion directory. Please check your hdfs /user/biadmin/flume folder to see the data.
ñ If you check your server file system, you should see the text files being renamed as
Yourfile.complete which indicated that it has been processed.

Additional Step:
Now, try putting some files in flumeingestion directory in realtime and you should see them in hdfs specified location in 1 or 2 seconds.

The logs are in /opt/ibm/biginsights/flume/bin/logs
If you donít see the expected result please check the logs.