[ Bigdata ] 03. Sqoop과 Flume설치 후 Hadoop에 WAS 구축하기

IT/[ Bigdata ]

[ Bigdata ] 03. Sqoop과 Flume설치 후 Hadoop에 WAS 구축하기

kim.svadoz 2020. 8. 10. 13:34

20-03-13 금

< Flume 설치 및 설정 >

데이터를 추출하기 위해 사용되는 프로그램

시스템로그, 웹 서버의 로그, 클릭로그, 보안로그... 비정형 데이터를 HDFS에 적재하기 위해 사용되는 프로그램

대규모의 데이터로그가 발생하면 효율적으로 수집하고 저장하기위해 관리

flume, chukwa, scribe, fluented, splunk

apache.org - Flume

A Flume event is defined as a unit of data flow having a byte payload and an optional set of string attributes. A Flume agent is a (JVM) process that hosts the components through which events flow from an external source to the next destination (hop).

In order to flow the data across multiple agents or hops, the sink of the previous agent and source of the current hop need to be avro type with the sink pointing to the hostname (or IP address) and port of the source.

A very common scenario in log collection is a large number of log producing clients sending data to a few consumer agents that are attached to the storage subsystem. For example, logs collected from hundreds of web servers sent to a dozen of agents that write to HDFS cluster.

this can be achieved in Flume by configuring a number of first tier agents with an avro sink, all pointing to an avro source of single agent (Again you could use the thrift sources/sinks/clients in such a scenario). This source on the second tier agent consolidates the received events into a single channel which is consumed by a sink to its final destination.

[설정]

다운로드(압축풀기)
```
tar -zxvf flume~~~~
```
.bashrc에 설정 정보 등록하기

flume-env.sh rename하고 정보등록
- jdk홈디렉토리
- hadoop홈디렉토리

```
source .bashrc
cd apache-flume-1.6.0-bin/conf/
cp flume-env.sh.template flume-env.sh
```



![image-20200313114023215](https://user-images.githubusercontent.com/58545240/89753328-cd640580-db12-11ea-9da0-0f11f741ea74.png)

flume설정정보를 등록
- "flume-conf.properties.template을 rename해서 XXXX.properties"
- flume agent의 source,channel,sink에 대한 정보를 등록

```
cp flume-conf.properties.template console.properties
```

![image-20200313114023215](https://user-images.githubusercontent.com/58545240/89753328-cd640580-db12-11ea-9da0-0f11f741ea74.png)

[ Flume의 구성요소 ]

flume의 실행중인 프로세스를 agent라 부르며 source, channel, sink로 구성

1. source

데이터가 유입되는 지정(어떤 방식으로 데이터가 유입되는지 type으로 명시)

agent명.sources.source명.type=값

type
- netcat : telnet을 통해서 터미널로 들어오는 입력데이터
  
  (bind : 접속IP, port: 접속할 port)
- spoolDir : 특정 폴더에 저장된 파일
  
  (spoolDir : 폴더명)

2. channel

데이터를 보관하는 곳(source와 sink사이에 Queue)

3. sink

데이터를 내보내는 곳(어떤 방식으로 내보낼지)

type
- logger : flume서버 콘솔에 출력이 전달
  - flume을 실행할 때 -Dflume.root.logger=INFO.console을 추가
- file_roll : file을 읽어서 가져오는 경우
  - directory : 읽어온 파일을 저장할 output 폴더를 명시

[ Flume의실행 ]

실행명령어: ./bin/flume-ng agent
옵션 : 
--conf : 설정파일이 저장된 폴더명(-c)
--conf-file : 설정파일명(-f)
--name : agent의 이름(-n)
-Dflume.root.logger=INFO.console : flumne의 로그창에 기록

[hadoop@hadoop01 apache-flume-1.6.0-bin]$ ./bin/flume-ng agent --conf conf --conf-file ./conf/console.properties --name myConsole -Dflume.root.logger=INFO,console => (source가 telnet으로 입력하는 데이터인경우)

[root@hadoop01 ~]# yum install telnet
su hadoop
telnet localhost 44444

폴더에서 폴더로 이동시키기

cp ./conf/console.properties ./conf/myfolder.properties

```
[hadoop@hadoop01 apache-flume-1.6.0-bin]$ ./bin/flume-ng agent -c conf -f ./conf/myfolder.properties -n myConsole
```

hdfs로 이름 바꾸기

[hadoop@hadoop01 apache-flume-1.6.0-bin]$ cp ./conf/console.properties ./conf/hdfs.properties

[hadoop@hadoop01 apache-flume-1.6.0-bin]$ ./bin/flume-ng agent -c conf -f ./conf/hdfs.properties -n myhdfs

20-03-14 토

AVRO? Flume? WAS?
WAS에서 뽑아보는 작업?
AVRO : 네트워크 통신에 쓰이는 Sink의 타입명 중 하나
source의 타입
- netcat : 많이 쓰이진않음
- spoolDir : 폴더에서 가져오는 것이므로 많이쓰임
sink의 타입
- file_roll : 로컬에 저장할 때
- logger : 받아서 분석해야 하니 많이 쓰이진 않음
- hdfs

hdfs2.properties

[hadoop@hadoop01 apache-flume-1.6.0-bin]$ cp ./conf/hdfs.properties ./conf/hdfs2.properties

하둡실행하기

[hadoop@hadoop01 apache-flume-1.6.0-bin]$ ./bin/flume-ng agent -c conf -f ./conf/hdfs2.properties -n myhdfs

hdfs 3.properties

[hadoop@hadoop01 apache-flume-1.6.0-bin]$ cp ./conf/hdfs2.properties ./conf/hdfs3.properties

하둡 실행하기

[hadoop@hadoop01 apache-flume-1.6.0-bin]$ ./bin/flume-ng agent -c conf -f ./conf/hdfs3.properties -n myhdfs

< hadoop0 머신2 에서 tomcat 실행하기 >

다운로드 한 tomcat 압축풀기

[hadoop@hadoop02 ~]$ wget 붙여넣기
[hadoop@hadoop02 ~]$ tar -zxvf apache-tomcat-9.0.31.tar.gz

hadoop01에 있던 .bashrc 02로 이동(복사)

[hadoop@hadoop01 ~]$ scp .bashrc hadoop@hadoop02:/home/hadoop

scp hdfs2.properties hadoop@hadoop03:/home/apache-flume-1.6.0-bin/conf

톰캣 실행하기

[hadoop@hadoop02 ~]$ source .bashrc
[hadoop@hadoop02 ~]$ cd apache-tomcat-9.0.31/
[hadoop@hadoop02 apache-tomcat-9.0.31]$ cd bin/

[hadoop@hadoop02 bin]$ ./startup.sh 
[hadoop@hadoop02 bin]$ ./shutdown.sh 
[hadoop@hadoop02 bin]$ netstat -anp | grep 8080
[hadoop@hadoop02 bin]$ ./startup.sh

서버 확인해보기
manager 권한 부여

-   127.0.0.1:8080/manager

![image-20200314114941784](https://user-images.githubusercontent.com/58545240/89753430-33e92380-db13-11ea-84d8-6f57ffb0cc85.png)

ip 제한 해제하기

sts에 bigdatashop - META-INF - context.xml

<Resource name="jdbc/myspring" auth="Container"
              type="javax.sql.DataSource" 
              driverClassName="oracle.jdbc.driver.OracleDriver"
              url="jdbc:oracle:thin:@70.12.115.65:1521:xe"
              username="shop" password="shop"
              maxTotal="20" maxIdle="10"
              maxWaitMillis="-1"/>
// ip를 내 pc의 ip로 설정

프로젝트 export - web - war파일
크롬환경에서 http://192.168.111.129:8080/manager 에서 export해준 war파일 배치하기
크롬환경에서 http://192.168.111.129:8080/bigdataShop/index.do 로 접속가능. 끝.

=> 이제 hadoop02가 나의 서버가 된 것이다 !!

< hadoop03에 WAS 구축하기 미션 >

3번에 WAS구축
WAS에 bigdataShop을 배포
hadoop03에 flume을 설치
tomcat의 access log를 hdfs에 저장
- avro통신
- hdfs : /flume/tomcatlog
메일로 제출
- 3번의 was manager화면에 배포된 목록 캡쳐
- hdfs에 저장된 access log 캡쳐
- 각 머신의 flume설정 파일

./bin/flume-ng agent -c conf -f ./conf/hdfs2.properties -n myavro


[hadoop@hadoop03 ~]$ mkdir flume_input
[hadoop@hadoop03 ~]$ cp /home/hadoop/apache-tomcat-9.0.31/logs/localhost_access_log.2020-03-15.txt /home/hadoop/flume_input

sink는 보낼 머신에 대한 정보(01머신에 대한 정보 입력)
테스트는 하둡머신의 flume실행, WAS머신의 flume실행하고, flume_input폴더에 로그파일을 copy
1번과 3번 모두 1번IP 사용

저작자표시 비영리 변경금지

'IT > [ Bigdata ]' 카테고리의 다른 글

[ Bigdata ] 05. R과 크롤링(Crawling) (0)	2020.08.10
[ Bigdata ] 04. MongoDB (0)	2020.08.10
[ Bigdata ] 02. Hadoop 커스터마이징하기 (0)	2020.08.10
[ Bigdata ] 01. 하둡의 HDFS와 MapReduce (0)	2020.08.10
[ Bigdata ] 00. 빅데이터 플랫폼 구축(CentOS & Hadoop) (0)	2020.08.10

현재글[ Bigdata ] 03. Sqoop과 Flume설치 후 Hadoop에 WAS 구축하기

꾸준함이 무기입니다.

리눅스 커널, 운영체제 공룡책 강의 퀴즈, Linux Kernel, django, IT, 투포인터 알고리즘, Android, DP, Programming, 프로그래머스 알고리즘, C, RTOS, Python, Java, django framework, 임베디드 개발, 운영체제 공룡책, IT취업, 백트래킹, 개발자,

Today :
Yesterday :

programmer life guidance 101