A customer has a number of different devices on the network: Cisco switches, Vyatta routers, Linux servers, VMware hosts and guests, .... All these devices generate various kinds of log files. The goal is get all these logs transmitted over to one platform for review and analysis. Splunk is the most well known, but can have onerous licensing fees. For smaller organizations, something like Fluentd might be more value oriented.
From an initial view, I think I'll need a two or more blog entries in order to describe my experience with getting Fluentd up and running. The first entry reflects what I did to get Fluentd up and running on a single server, with logs originating from that one server. The second entry will describe what it takes to get Fluentd working in a more distributed manner.
For this first article, I found that the installation mechanism for the most recent versions of Fluentd deviate somewhat from what is generally documented. I had to search through a number of sites in order to piece together what I needed for a working configuration. The platform used is Ubuntu, based upon ubuntu-14.04-server-amd64.iso image.
First, because, time consistency is important, ensure that ntp has been installed, running and synced.
apt-get install ntp
Install the web server (some use nginx, but I'm staying vanilla here):
apt-get install apache2
In reviewing the documentation, there appears to be two ways of getting fluentd functionality: a) running fluentd at the command line, and b) using a td-agent service. This article discussing using fluentd at the command line. Installing td-agent doesn't appear to have instructions for installing with the latest release of Ubuntu, so I need to do more digging into that before writing about td-agent. General Questions has some comparisons between fluentd and td-target.
I used fluent docs as a starting point.
Java is required for elasticsearch, which is the search engine and filtering component of Fluentd. I used:
apt-get install default-jre
Go to elasticsearch download to find the latest version and
cd /usr/src
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.2.deb
dpkg -i elasticsearch-1.2.2.deb
Note the following once the installation completes:
### NOT starting elasticsearch by default on bootup, please execute
sudo update-rc.d elasticsearch defaults 95 10
### In order to start elasticsearch, execute
sudo /etc/init.d/elasticsearch start
A web front end to the search engine is required, which in this case is kibana. The latest download is found at
kibana installation.
wget https://download.elasticsearch.org/kibana/kibana/kibana-3.1.0.tar.gz
tar -zxvf kibana-3.1.0.tar.gz
mv kibana-3.1.0 /var/www/html/kibana
The privileges could be adjusted on /var/www/html/kibana. The web page can be accessed via http://<address>/kibana. Use the pre-built 'Logstash Dashboard'. There is an option somewhere to make it default. At this point, no entries are visible as Fluentd has not yet been installed or configured.
Further things to explore with kibana: Elastic Search API, Logstash conventions, Lucene syntax queries.
On the website, grigio.org, I found more clarifications for the configuration:
apt-get install ruby ruby-dev git build-essential curl
apt-get install libcurl4-gnutls-dev
gem install fluentd fluent-plugin-elasticsearch --no-rdoc --no-ri
This creates the default configuration file /etc/fluent/fluent.conf, which will require editing to get everything working. Once created, make a copy for archival
and roll-back purposes.
fluentd --setup /etc/fluent
This is a sample configuration for getting same-server logs running through fluentd. Most log types take a <source> section as well as a <match> section. The source section is the equivalent of an 'input'. From where things come. The two <source> I needed to add were: a) one for apache access logs, and b) one for rsyslog logs. I should add something for the apache error log as well. Auth logs and various other odds and ends can be added and tagged similarily.
The key to understanding is that the <source> takes the input and tags it. The <match> section uses the tag for identification, and the processes the log entry accordingly. My match statements currently have two <store> sections. The first is used for debugging. Fluentd will output records to stdout as they are processed. The second <store> section is where the magic happens: the record is sent to elasticsearch for processing and viewing.
What hasn't been discussed is how these sections can be changed a bit in order to install collector clients on various machines and forward entries to a central machine for processing. A subject for another article.
Append the following to /etc/rsyslog.conf. The process will need to be restarted. /etc/init.d/rsyslog restart doesn't appear to always restart the service. The machine will need to be restarted or kill the process and restart it.
*.* @127.0.0.1:42185
With these in place, fluent can be started:
fluentd -vv
The 'v's are used for troubleshooting and are optional.
To simulate log entries:
logger -t test foobar
Recycling the web browser will also generate entries through the apache log file collector.
Entries should now be appearing in the kibana web interface.
Some of the configuration file entries:
## File input
## read apache logs with tag=apache.access
<source>
type tail
format apache2
# path /var/log/httpd-access.log
path /var/log/apache2/access.log
pos_file /var/log/fluent/tmp/apache.access.pos
tag apache.access
</source>
# rsyslog stuff
<source>
type syslog
port 42185
# tag system
tag syslog
format syslog
# bind 127.0.0.1
</source>
<match syslog.**>
#<match syslog>
type copy
<store>
type stdout
</store>
<store>
type elasticsearch
logstash_format true
flush_interval 10s # for testing
port 9200
</store>
</match>
# match tag=apache.access and write to file
<match apache.access>
# type file
type copy
<store>
type stdout
</store>
<store>
type elasticsearch
logstash_format true
flush_interval 10s # for testing
port 9200
</store>
# path /var/log/fluent/access
</match>