Elastic – Data ingestion, storage, and visualization

Hey there everyone! I hope some of you are here following the talk I gave at YOLOcon. This blog post should be a good review of the topics discussed and will add links to resources that will help you get started in experimenting with Elastic.

For those who did not attend the talk, here’s a little background. I recently had the chance to attend Elastic{ON}, a conference for an open-source software suite called Elastic. The conference was absolutely fantastic, between the talks about future enhancements, real life use-cases, and geeking out with all these people who love the software as much as I do. Since I rarely hear people in our major talking about this suite, I’m taking the opportunity to share it with you all through this post. From my experience with the suite, I would summarize their main goals into the following: Data Ingestion, Storage, and Visualization.

When I first started working with Elastic, those three goals were represented in three software components, and it was referred to as the ELK Stack. These components were Elasticsearch, Logstash, and Kibana, and they remain the core of the suite today with a few new additions. Links to the main pages of each are below.

Logstash – Your primary data import tool

Elasticsearch – Your one-stop data shop

Kibana – Visualization center

Beats – Logstash’s little helpers

X-Pack – Enterprise upgrade

Cloud – Cloud hosting

Elasticsearch is the data storage component, and really the heart, of Elastic. In short, Elasticsearch is a distributed database that deals in indices of JSON documents. JSON documents are stored in indices, which are split and replicated across shards, which are spread across multiple nodes in your cluster. It is designed to store both structured and unstructured data. Essentially, if you can format data into JSON statements, you can put it in this database. Ingestion options include ‘PUT’ statements like the one below, or using Logstash and Beats to add data.

PUT /customer/external/1?pretty
{
“name”: “John Doe”
}

Logstash is your primary data collection tool. It has a huge list of supported inputs (53 in total) and facilitates the “any data goes” attitude. After choosing a set of inputs, you build configurations in your Logstash server to parse the incoming data into JSON documents or whatever other output format (pick from 55 options) you like. Assuming you go the Elasticsearch route, those documents are then added into your database!

Now, Beats works in conjunction with, and as a replacement for, Logstash. These are small data shippers that live on your local machines and either ship data directly to Elasticsearch or send it to Logstash for additional processing. There are a few of these officially supported, and you can read more about them online. There is also documentation on how to build your own custom beat, and there is a good chance that another community member has created and shared a beat for your use-case!

Filebeat – log files

Metricbeat – metrics

Packetbeat – network data

Winlogbeat – Windows event logs

Heartbeat – uptime monitoring

X-Pack is the set of tools that takes you from an open-source project to an enterprise-level application. It adds authentication and other security-related functionality, enhanced cluster monitoring, alerting functionality, scheduled reporting, enhanced graphing, and soon to be added machine learning abilities. All of these features are valuable as your take your project to an enterprise level, but I wouldn’t worry about getting your hands on them until then. You can find many free projects that mimic their capabilities. The integration won’t be as seamless, but they’ll save your pocketbook until you or your company has the funding for the supported versions.

Finally, Elastic has recently added cloud support into their list of features. They have partnered with AWS and Google Cloud to provide managed clusters for a subscription fee. This does allow you access to the X-Pack features. So, if you want to try it out on a budget, this may be your way to go.

The benefit to sticking with the core Elastic products is that you know everything will work together, and separate component updates are released in sync. But if you want to work with any of the X-Pack features and don’t have the budget, or if there are some other features you feel you need which don’t exist in the current release, turn to the community. There are quite a few replacements out there. Apache Kafka can replace Logstash to add caching and high availability to your data ingestion process. Grafana is a Kibana replacement that has built-in alerting. Or Elastalert can replace Alerting/Watcher from the X-Pack for that alerting need without replacing a core module. These are just a few alternatives that I’ve worked with, but many more exist.

In the end, if you have any interest in playing with “big” data at home or if you’re thinking about changing things up at work, you should seriously consider checking out Elastic. Their active community and quality documentation make it easy to ramp up with small home projects, while still providing professional services for enterprise implementations. If you’re looking for some inspiration to start a project of your own, look no further than Elastic’s use-cases page, here.

By – Anna Wendt

 

Leave a comment