This document describes the steps to push Nginx logs to S3 bucket via Fluentd. This can be done in 2 ways:

  1. Run a local Fluentd process on your system and tail corresponding log file.
  2. Run a Fluentd container on your system and send logs via Rsyslog.

Local Fluentd to ship logs

Setup Fluentd

  1. Install Fluentd by following these instructions.

  2. Depending your Linux OS distribution, ensure that td-agent is running:

    # This command works on AmazonLinux and RedHat based systems.
    sudo service td-agent status
    
    ● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
       Loaded: loaded (/usr/lib/systemd/system/td-agent.service; enabled; vendor preset: disabled)
       Active: active (running) since Fri 2021-03-26 05:05:28 UTC; 2s ago
         Docs: <https://docs.treasuredata.com/articles/td-agent>
      Process: 5879 ExecStop=/bin/kill -TERM ${MAINPID} (code=exited, status=0/SUCCESS)
      Process: 5891 ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=0/SUCCESS)
     Main PID: 5996 (fluentd)
        Tasks: 8
       Memory: 126.9M
       CGroup: /system.slice/td-agent.service
               ├─5996 /opt/td-agent/bin/ruby /opt/td-agent/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid
               └─5999 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /opt/td-agent/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid --under-supervisor
    
    Mar 26 05:05:25 ip-172-31-7-162.ap-south-1.compute.internal systemd[1]: Starting td-agent: Fluentd based data collector for Treasure Data...
    Mar 26 05:05:28 ip-172-31-7-162.ap-south-1.compute.internal systemd[1]: Started td-agent: Fluentd based data collector for Treasure Data.
    
  3. Ensure that the process for which you want to send the logs is running. In this case we will use Nginx:

    sudo service nginx status
    ● nginx.service - The nginx HTTP and reverse proxy server
       Loaded: loaded (/usr/lib/systemd/system/nginx.service; disabled; vendor preset: disabled)
       Active: active (running) since Tue 2021-03-23 09:02:54 UTC; 2 days ago
     Main PID: 25379 (nginx)
        Tasks: 2
       Memory: 4.3M
       CGroup: /system.slice/nginx.service
               ├─25379 nginx: master process /usr/sbin/nginx
               └─25382 nginx: worker process
    
    Mar 23 09:02:54 ip-172-31-7-162.ap-south-1.compute.internal systemd[1]: Starting The nginx HTTP and reverse proxy server...
    Mar 23 09:02:54 ip-172-31-7-162.ap-south-1.compute.internal nginx[25363]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
    Mar 23 09:02:54 ip-172-31-7-162.ap-south-1.compute.internal nginx[25363]: nginx: configuration file /etc/nginx/nginx.conf test is successful
    Mar 23 09:02:54 ip-172-31-7-162.ap-south-1.compute.internal systemd[1]: Failed to read PID from file /run/nginx.pid: Invalid argument
    Mar 23 09:02:54 ip-172-31-7-162.ap-south-1.compute.internal systemd[1]: Started The nginx HTTP and reverse proxy server.
    
  4. Add the following config to /etc/td-agent/td-agent.conf

    <source>
      @type tail
      path /var/log/nginx/access.log #...or where you placed your Apache access log
      pos_file /var/log/td-agent/nginx-access.log.pos # This is where you record file position
      tag nginx.access
      format nginx
    </source>
    
    <source>
      @type tail
      path /var/log/nginx/error.log
      pos_file /var/log/td-agent/nginx-error.log.pos
      tag nginx.error
      format nginx
      # format /^(?<time>[^ ]+ [^ ]+) \\[(?<log_level>.*)\\] (?<pid>\\d*).(?<tid>[^:]*): (?<message>.*)$/
    </source>
    
    <match nginx.*>
    
      @type "s3"
      s3_bucket "$S3_BUCKET"
      s3_region "$AWS_REGION"
      path "logs/$INSTANCE_ID/%Y/%m/%d"
      s3_object_key_format "%{path}/%{time_slice}_%{index}.log"
      time_slice_format %Y%m%d%H%M
      <format>
        localtime false
      </format>
      <buffer time>
        @type "file"
        path "/var/tmp/fluentd/buffer/s3"
        timekey $FLUENTD_CONFIG_S3_TIMEKEY
        timekey_wait $FLUENTD_CONFIG_S3_TIMEKEY_WAIT
        timekey_use_utc true
      </buffer>
    </match>
    
  5. Update the following variables in the above config:

    1. $S3_BUCKET - Destination S3 bucket where to send logs. Should exist.
    2. $AWS_REGION - AWS S3 bucket region
    3. $INSTANCE_ID - can be EC2 instance ID or a unique identifier which you may chose to omit. This is used to identfy the system from which logs are being sent.
    4. $FLUENTD_CONFIG_S3_TIMEKEY - Flush interval and how logs are grouped
    5. $FLUENTD_CONFIG_S3_TIMEKEY - Flush delay
  6. Restart td-agent . Ensure that Nginx is sending logs to have data for sending.

    sudo service td-agent restart	
    
  7. Wait for $TIMEKEY duration and then check the S3 bucket for logs.

Observations

Using environment variables with Fluentd

  1. You can use environment variables in fluentd config by following the format mentioned in the documentation.

  2. To ensure that environment variables are picked up by td-agent update /etc/sysconfig/td-agent with these values e.g.

    sudo touch /etc/sysconfig/td-agent
    cat > /etc/sysconfig/td-agent
    S3_BUCKET=nginx-logs
    
    sudo service td-agent restart