大数据相关
相关工具
- kettle spoon (pdi-ce-7.0.0.0-25.zip Last Updated )
- Elasticsearch (elasticsearch-5.1.1.zip Release date:December 08, 2016)
- Kibana for mac (kibana-5.1.1-darwin-x86_64.tar.gz Release date:December 08, 2016)
- Logstash (logstash-5.1.1.tar.gz Release date:December 08, 2016)
安装ELK
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.1.zip
unzip elasticsearch-5.1.1.zip
wget https://artifacts.elastic.co/downloads/kibana/kibana-5.1.1-darwin-x86_64.tar.gz
tar zxvf kibana-5.1.1-darwin-x86_64.tar.gz
wget https://artifacts.elastic.co/downloads/logstash/logstash-5.1.1.tar.gz
tar zxvf logstash-5.1.1.tar.gz
安装 X-Pack
将 X-Pack 安装到 Elasticsearch
进入 Elasticsearch 目录,执行
./bin/elasticsearch-plugin install x-pack
安装过程会提示 "plugin requires additional permissions" 选Y继续安装
启动Elasticsearch
./bin/elasticsearch
将 X-Pack 安装到 Kibana
进入Kibana目录
./bin/kibana-plugin install x-pack
启动 Kibana
./bin/kibana
访问http://localhost:5601/
用户名:elastic
密码:changeme
文件编码
文件统一转为utf-8,首先要判断文件的编码,有两种工具,file和enca,同时用起来,因为有时候判断不准
安装enca
brew install enca
两个工具一起用
file csdn.com.txt &&enca csdn.com.txt
配置logstash
input {
file{
#监听文件的路径
path => "/Users/sword/bigdata/form_src/*.txt"
#监听多个目标文件
#path => ["E:/software/logstash-1.5.4/logstash-1.5.4/data/*","F:/test.txt"]
#监听文件的起始位置,默认是end
start_position => beginning
sincedb_path => "/Users/sword/bigdata/form_src/sincedb_log.txt"
}
}
filter{
grok{
match => {
"message" =>"'%{DATA:email}',\s'%{DATA:password}',\s'%{DATA:username}','%{DATA:from}'\r"
}
}
mutate{
remove_field => ["host","path","tags","message"]
}
fingerprint {
#用来组合source中的字段
concatenate_sources => true
source => ["username","email","password"]
target => "[@metadata][generated_id]"
#没有key就会出错
key => "znmfLov5KlNUeh2z"
#用MD5来保证数据唯一
method => 'MD5'
}
}
output{
elasticsearch {
hosts => ["localhost:9200"]
index => "base_data"
#workers => 5
document_id => "%{[@metadata][generated_id]}"
#用来标识来源
document_type => "csdn.com"
user => "elastic"
password => "changeme"
}
#stdout {
# codec => rubydebug {
# metadata => true
# }
#}
}
运行 logstash,来导入数据
./bin/logstash -f ./bigdata.conf
字段规划
- username(用户名)
- email(邮箱)
- password(密码)
- salt(盐值)
- nickname(昵称)
- qq(QQ号)
- mobile(手机号码)
- telno(固定电话)
- idno(身份证号码)
- realname(真实姓名)
- address(家庭住址)
- ip(IP地址)
- date(该条数据发生时间)
- other(其它数据)
- from(数据来源)
参考
Logstash 最佳实践 http://udn.yyuap.com/doc/logstash-best-practice-cn/
ELKstack 中文指南 http://kibana.logstash.es/content/
Elasticsearch 权威指南 http://es.xiaoleilu.com/
0 comments