The Right Way to Splunk with Cloudian

Amit Rawlani, Director of Solutions & Alliances, Cloudian

In late 2018, Splunk introduced a feature called SmartStore, which offers enhanced storage management functionality. SmartStore allows you to move warm data buckets to S3-compatible object stores such as Cloudian HyperStore, the industry’s most S3-compatible on-prem object storage platform. Moving the data from expensive indexer storage achieves many benefits, including:

  • Decoupling of storage and compute layers for independent scaling of those resources to best serve workload demands
  • Elastic scaling of compute on-demand for search and indexing workloads
  • Growing storage independently to accommodate retention requirements
  • Cost savings with more flexible storage options

However, migrating existing Splunk indexes to SmartStore is a one-time occurrence. It is also an intensive activity with no undo button. Based on extensive internal testing and validation efforts, the Cloudian team has developed best practices for ensuring smooth deployment of Splunk SmartStore with Cloudian HyperStore. These range from provisioning all the way to safeguards to consider during the migration process itself. Listed below is a summary of the best practices guidelines.

Provisioning

For correct operations, the recommended deployment is HyperStore nodes with a network interface larger than the network interfaces on the Splunk Indexers.

splunk indexer nic size

Tuning Splunk SmartStore

Concurrent upload setting should be 8 threads (default value)

Concurrent download setting should be 18 threads (max tested is 24 threads)

For high ingest rates bucket size should be set to 10GB (setting = Auto_high)

Tuning Cloudian HyperStore

Consistency setting should be kept as Quorum. Anything else will have a performance impact

Link MTU should be set to 1500 to ensure optimal performance

Storage Sizing

Storage sizing guidelines should be followed. These will vary based on the compression ratios, metadata size and indexer volume sizing. Below is one specific guideline.

table

Cache Sizing

Undoubtedly the most important decision for appropriate SmartStore operations. Following parameters need to be considered while sizing SmartStore cache.

  • Daily Ingestion Rate (I)
  • Search time span for majority of your searches
  • Cache Retention (C) = 1 day / 10 days/ 30 days or more
  • Available disk space (D) on your indexers (assuming homogeneous disk space)
  • Replication Factor (R) =2
  • Min required cache size: [I*R + (C-1)*I]
  • Min required indexers = Min required cache size / D
  • Also factor in ingestion throughput requirements (~300GB/day/indexer) to determine the number of indexers

Based on the above parameters, the table below gives a framework on how to get the recommended SmartStore cache sizing.

1TB/Day
7-Day Cache
1TB/Day
10-Day Cache
1TB/Day
30-Day Cache
10TB/Day
10-Day Cache
10TB/Day
30-Day Cache
Ingest/Day (GB) 1,000 1,000 1,000 10,000 10,000
Storage/Indexer (GB) 2,000 2,000 2,000 2,000 2,000
Cache Retention 7 10 30 10 30
Replication Factor 2 2 2 2 2
Min Required Cache (GB) 8,000 11,000 31,000 110,000 310,000
Min Required #Indexers 4 6 16 55 155

Migration to SmartStore

As mentioned, the migration to SmartStore is a one-way street. For existing indexes following steps are recommended for a smooth migration

  1. Setup SmartStore target S3 bucket on HyperStore.
  2. Upload a file to the S3 bucket via CMC or S3 client.
  3. Set up the Volumes on Splunk Indexers without setting RemotePath for Indexes.
  4. Push the changes with the new Splunk Volume to the Splunk Index cluster.
  5. Use the Splunk RFS command to validate each Indexer is able to connect to the volume.

/opt/splunk/bin/splunk cmd splunkd rfs — ls –starts-with volume:NewVolume

  1. Once all Indexers report back connectivity Migrations can begin.
  2. Set RemotePath for one Index at a time. This allows for the ability to manage Index migrations more easily. This also limits the work of each indexer as it tries to move all Warm buckets for every index at once.

For more information on how to deploy Splunk SmartStore with Cloudian HyperStore, see the detailed deployment guide here.

Data Availability & Data Protection for the IoT World

New York cityscape

New York, “The City That Never Sleeps”. A very fitting moniker for a city that is full of energy and excitement. Servers located in data centers all around the world are constantly crunching numbers and generating analytics in every financial institution in New York. Why are some of these servers located worldwide? Well, for a variety of reasons, but in my humble opinion, it is to ensure that data is always on and always available. After all, we are talking about billions of dollars in capital electronically managed by the New York Stock Exchange alone.

By 2020, it is predicted that there will be at least 20+ billion internet connected devices. As your business grows, so will the amount of data and storage that you will need. We’ll obviously need solutions to protect our data on-premise or in the cloud. A company that can make sure customers data is always on, secure, highly available, and also protected, rules the IoT WORLD.

modern storage requirementsBut in order to serve and protect your data for the always on, always available IoT world, what requirements should we take into account before deploying any data protection or storage solution? If you are a data protection geek, you’ll most likely see some of your requirements being listed on the right. If you are a data protection solutions provider, you guys definitely rock! Data protection solutions such as Commvault, NetBackup, Rubrik, Veeam, etc. are likely the solutions you have in-house to protect your corporate data centers and your mobile devices. These are software-defined and they are designed to be highly available for on-premise or in-the-cloud data protection.

What about storage? What would you consider? I am sure there are many well-known storage providers you can easily name. But with the new kids on the block disrupting the storage market, would lowering your operating costs ($0.005/GB per month) and meeting the above-listed requirements pique your interest?

Amazon S3 and Cloudian
Cloudian is a software-defined storage company. The solution is fully S3 compliant, which means that if you are familiar with Amazon S3, you’ll love the features that comes with this solution. If you are not, as a data protection geek with more than 15 years of experience, I invite you to give Cloudian HyperStore free trial a shot. The features and capabilities of Cloudian HyperStore as a scale-out storage solution with true multi-tenancy is pretty cool in my books. Imagine being able to deploy and grow storage as you need it for your corporate user home directories, backups, archiving, and even object storage for virtualization solutions (i.e. Red Hat OpenStack). The use cases for scale-out storage solutions are vast. There is no more hardware vendor lock-in as you can easily select between the options of a Cloudian HyperStore appliance or commodity servers to roll-your-own scale-out storage with Cloudian HyperStore software.

Imagine that you, as a storage administrator, can easily provide Storage as a Service (STaaS) to all your users. Take a look at the image below. The granular object level management that is available on a per user basis is pretty sweet. I can provide access to my files/objects with read and/or write permissions, with object level ACL and share the object via a public URL access.
Cloudian object level management

To top it all off, I can also limit the maximum number of downloads of that specific object that I want to share. As a service provider, you can also use the analytics inherent in the solution to implement chargeback to your customers on every account that you manage using Cloudian HyperStore smart storage solution.

Best of all, if you decide that you want to move your data to Amazon, use Cloudian Hyperstore’s built-in auto-tiering feature. Dynamically move your data to Amazon S3 if you choose to do so. You don’t have to take my word for it. Cloudian will provide you with a 45-day free trial. Try it out today.

Reinventing Storage Administration at Scale with Cloudian HyperStore 6.0

Learn More About Cloudian HyperStoreDownload White Paper

At Cloudian, we are continuously enhancing and evolving our flagship product HyperStore, a fully S3-compatible object storage technology. But even with all our rich history of innovative releases, I’m most proud of the upcoming Cloudian HyperStore 6.0 – we have risen to a new challenge…@scale operations. How do we help IT administrators manage Petabytes of storage and still have time for a cup of tea?

As data volumes start to outpace operating budgets and the headcount needed to manage them, today’s enterprises are turning to full-featured, low-cost software defined storage technologies with almost limitless scalability and accessibility. While the software can be deployed on low-cost commodity hardware, thereby reducing the total cost of ownership (TCO), the cost of managing storage infrastructures – managing operations, performance, protecting data and tuning workflows and processes – has remained fixed.

A 2016 Gartner report found that the TCO for a terabyte of traditional on-premises storage was $2009 per terabyte, with 62 percent of the costs resulting from hardware and software acquisition ($1245). The operational costs associated with this terabyte of storage were 26 percent of the total, or $511, with “other” costs adding up to 12 percent ($253).

With software-defined storage like Cloudian HyperStore, the TCO for a terabyte of storage now drops to $866 per year, and with costs as low as one cent per gigabyte per month, hardware and software acquisition are reduced to just 14 percent of the total costs, or $122.

Unfortunately, as reported by Gartner, the cost of operations for this model remains fixed at $511, which now represents 58 percent of total yearly storage costs, significantly outweighing hardware and software expense. It’s clear that to further drive costs from cloud storage, organizations will need to find ways to reduce management burden and operational costs.

Cloudian HyperStore 6.0 is designed to help operate and manage at scale. The new release simplifies and automates the operational management of cloud storage by creating new system management features, which scale to meet the requirements of multi-petabyte, multi-region storage deployments. It lowers the management costs and reduces administrative burden. HyperStore 6.0 advances the vision Cloudian has shared for Smart Data Storage by delivering fully integrated smart data operations within the platform for seamless and extremely cost-efficient management.

The basis for many of 6.0’s key features came from our customers. They shared their “wish list” of management tools and functionality that could streamline or eliminate the manual tasks that normally weigh down the storage admin in 100+ node deployments.

Introducing Storage Operations and Management @Scale

 

Cloudian HyperStore 6.0 provides new tools, user interfaces and automated features to scale management and operational efficiency across complex cloud storage infrastructures, lowering storage management costs for enterprises and streamlining operational tasks for storage administrators. The new 6.0 release can reduce operations management as nodes are added, helping lower the overall operational costs to businesses as data volumes continue to grow. HyperStore 6.0 also provides greater data durability with more continuous automated failure resolution, and delivers robust system tuning tools for proactive and low-cost management of system health and operational efficiency for data storage at petabyte scale.

Operations @scale. In Cloudian HyperStore 6.0, an entirely new operations console provides an instant 360-degree view into storage system performance, with an enhanced GUI that greatly increases the visibility of key data. The new operations console enables storage admins to view and manage hundred of nodes across multiple data centers and cloud environments in a single screen, while automating such operations as adding and removing nodes with the simple configuration of an IP address. And the user interface isn’t just to look at: Users can click on the data representations to drill down and access detailed information and take action directly from the interface screen. 6.0 also automates non-disruptive, node-by-node rolling technical patches and upgrades that are delivered via touchless distribution and management systems, saving time and resources for storage admins.

Cloudian HyperStore operations console HyperStore’s newly renovated operations console provides a 360-degree, drillable view of the health of your system

Durability @scale. HyperStore 6.0 also extends data durability with automated features that constantly evaluate data integrity – even data at rest – to ensure that storage within the system is always repaired and data is always verified. When the system identifies data failures, it proactively repairs and rebuilds them. The Cloudian HyperStore 6.0 system proactively monitors and scans workflows for I/O or disk failure, which Smart Redirect features can track and map to help trigger automatic repairs. The new release enables both local cache replication for data protection and Amazon S3 cross-region replication – one of Cloudian’s many rich S3 compatible features – for regulatory compliance and disaster recovery.

Cluster rebalance and data rebuild information The storage admin can view cluster rebalance and data rebuild information from HyperStore’s robust analytics capabilities

Tuning @scale. Cloudian HyperStore 6.0 simplifies system tuning to optimize the infrastructure for the highest level of system health and data protection. Visual Storage Analytics automatically identifies “hot spots” and rebalances workflows for storage I/O protection. It also can locate object parts using new “Object GPS” functionality that track data distribution with high granularity across nodes, racks and data centers around the globe.

HyperStore 6.0 operations console The HyperStore 6.0 operations console allows the storage admin to view real-time capacity consumption and system performance

As before, Cloudian continues to keep pace with incorporating changes to the AWS S3 API into HyperStore. Cloudian is the leading object storage vendor with 100 percent S3 compatibility, and the only vendor to offer an S3 guarantee.

To learn more about how Cloudian HyperStore 6.0 can help your company simplify and automate the operational management of your hybrid storage infrastructures, contact Cloudian at www.cloudian.com.

Also watch this blog over the next few weeks as we add more detailed posts about the features of 6.0.

Stay tuned,

Paul

S3 API & Extensions for Enterprise Object Storage

Amazon’s S3 API is the de-facto standard for object storage APIs. Having multiple service providers, software providers, and applications standardize on S3 has made it easier to interchange between them and rapidly stand up new uses for object storage. But there are different grades of S3 compatibility. Some software and solutions provide only the basic CRUD (create, remove, update, delete) functions. At the other end is Cloudian’s Hyperstore, committed to providing the highest fidelity S3 compatibility backed by a guarantee.

The S3 API is an HTTP/S REST API where all operations are via HTTP PUT, POST, GET, DELETE, and HEAD requests. Each object is stored in a bucket. Beyond the basic object CRUD operations provided by S3, there are many advanced APIs like versioning, multi-part upload, access control list, and location constraint. There are multiple options for encryption including (1) server-side encryption where the server manages encyrption keys, (2) server-side encyption with customer keys, and (3) client-side encryption where the data is encrypted/decrypted at the client side. Though no single S3 user is likely to use all of the advanced APIs, the union of APIs used by different users quickly covers them all. The table below highlights some advanced object storage APIs supported by S3:

S3 Feature Azure Google Cloud OpenStack Swift
Object versioning No Yes Yes
Object ACL No Yes No
Bucket Lifecycle Expiry No Yes Yes
Multi-object delete No Yes Yes
Server-side encryption No Yes Yes
Server-side encryption with customer keys No No No
Cross-region replication Yes No Yes
Website No No No
Bucket logging No No No
POST object No No No

Table 1 – Comparison of some S3 advanced object storage APIs[1]

S3 API compatibility is a prerequisite, but not sufficient to provide object storage for enterprises. There are 4 additional areas that Cloudian has added to make S3 object storage enterprise-ready.

 

  1. Software or Appliance, not a service.The software-only package includes a Puppet-based installer with a wizard-style interface. It runs on commodity software (CentOS/RedHat) and commodity hardware. The appliances come in a few fixed models ranging from 1U (24TB) to the FL3000 series of PB-scale in 8U form.
  1. APIs for all functions
    • Configuration
    • Multi-Tenancy: User/Tenant provisioning
    • Quality of Service (QoS)
    • Reporting
    • S3 Extensions: Compression, Metadata APIs, Per-bucket Protection Policies.

    Highlighting the per-bucket protection policies feature, each bucket can have its own protection policy. For example, a“UK3US2” policy can be defined as UK DC with 3 replicas and US DC with 2 replicas. Another example is a “ECk6m2” policy as DC1 with Erasure Coding with 6 data and 2 coding fragments. As buckets are created they can be assigned a policy.

Bucket
Figure 1 – Per-bucket protection policies example

  1. O&M tools to install, monitor, and manage.In addition to the installer, a single pane web-based Cloudian Management Console (CMC) does system administration from the perspective of the system operator, a tenant/group administrator, and a regular user. It’s used to provision groups and users, view reports, manage the cluster, and monitor the cluster.

Cloudian Management Console

Figure 2 – CMC dashboard

  1. Integration with Other Products
    • NFS/CIFS file interface
    • OpenStack, CloudPlatform
    • Tiering to any S3 system (public or private).
    • Active Directory, LDAP

The opportunity and use case for enterprises and object storage has never been more compelling. Amazon S3 API compatibility ensures full portability of already working applications. Using Cloudian’s HyperStore platform instead of AWS, enterprise data can be brought on-premise for better data security and manageability at lower cost. For STaaS providers, S3 API compatibility, backed by a full guarantee, provides the same benefits of a fully controlled storage platform, and opens up a large range of compatible applications. Beyond the S3 API, Cloudian is committed to providing all operations by API and has added APIs to make the platform enterprise-ready, including multi-tenancy.

If you would like a technical overview, you can check out this webinar I recently presented, “S3 Technical Deep Dive” and make sure to check out more information on our S3 Guarantee…we’ll run all your S3 Apps anytime and anywhere – Guaranteed!

– Gary


[1] References:
http://docs.openstack.org/developer/swift/#object-storage-v1-rest-api-documentation
https://cloud.google.com/storage/docs/xml-api-overview
https://msdn.microsoft.com/en-us/library/azure/dd135733.aspx

Cloudian HyperStore Integration with Symantec NetBackup

Starting with Symantec NetBackup 7.7, administrators will find an exciting new feature for cloud storage backup: Cloudian HyperStore®. The NetBackup Cloud Storage Connector enables the NetBackup software to back up data to and from Cloudian HyperStore straight out of the box without additional software installations or plugins. HyperStore is an option in the “Cloud Storage Server Configuration Wizard”. Users can simply add their S3 account information such as endpoint, access key, and secret key to begin the process of backing up their data to Cloudian HyperStore storage.

cloudian hyperstore 4000

Cloudian HyperStore and Symantec NetBackup together deliver the following benefits:

  • Enterprise-level backup
  • Complete integrated data center solution: computing, networking, and storage
  • Reduced total cost of ownership (TCO) that continues to improve as the solution scales out
  • Operational efficiency
  • Agility and scalability with the scale-out architectures of Cloudian HyperStore
  • Complete Amazon Simple Storage Service (S3) API–compatible geographically federated object storage platform
  • Enterprise-class features: multi-tenancy, quality of service (QoS), and dynamic data placement in a completely software-defined package
  • Policy-based tiering between on-premises hybrid cloud storage platform and any S3 API–compliant private or public cloud
  • Investment protection: mix and match different generations and densities of computing platforms to build your storage environment; more than 400 application vendors support S3

The seamless integration allows IT Departments to manage cloud storage for backup and recovery as easily as on-premise storage, but with lower costs. Finally, this integrated solution helps deliver an automated and policy-based backup and recovery solution. Organizations can also leverage the cloud as a new storage tier or as a secondary off-site location for disaster recovery.

For more information, please see the Symantec NetBackup and Cloudian HyperStore Solution Brief.

 

Next Generation Storage: integration, scale & performance

Guest Blog Post by Colm Keegan from Storage Switzerland

Various industry sources estimate that data is doubling approximately every two years and the largest subset of that growth is coming from unstructured data. User files, images, rich multimedia, machine sensor data and anything that lives outside of a database application can be referred to collectively as unstructured data.

Storage Scaling Dilemma

3d-man-growing-data-centerThe challenge is that traditional storage systems, which rely on “scale-up” architectures (populating disk drives behind a dual controller system) to increase storage capacity, typically don’t scale well to meet the multi PB data growth which is now occurring within most enterprise data centers. On the other hand, while some “scale-out” NAS systems can scale to support multiple PB’s of storage within a single filesystem, they are often not a viable option since adding storage capacity to these systems often requires adding CPU and memory resources at the same time – resulting in a high total cost of ownership.

Commoditized Storage Scaling

Businesses need a way to cost effectively store and protect their unstructured data repositories utilizing commodity, off the shelf storage resources and/or low cost cloud storage capacity. In addition, these repositories need to be capable of scaling massively to support multiple PB’s of data and enable businesses to seamlessly share this information across wide geographical locations. But in addition to storage scale and economy, these resources should also be easy to integrate with existing business applications. And ideally, they should be performance optimized for unstructured data files.

Software Driven Capacity

Software defined storage (SDS) technologies are storage hardware agnostic solutions which allow businesses to use any form of storage to build-out a low cost storage infrastructure. Internal server disk, conventional hard disk drives inside a commodity disk array or even a generic disk enclosure populated with high density disk can be used. Likewise, with some SDS offerings, disk resources in the data center can be pooled with storage in secondary data center facilities located anywhere in the world and be combined with cloud storage to give businesses a virtually unlimited pool of low-cost storage capacity.

Plug-and-Play Integration

From an integration perspective, some of these solutions provide seamless integration between existing business applications and cloud storage by providing native support for NFS and CIFS protocols. So instead of going through the inconvenience and expense of re-coding applications with cloud storage compatible API’s like REST, SWIFT or Amazon’s S3 protocol, these technologies essentially make a private or hybrid cloud data center object storage deployment a plug-and-play implementation but still provide the option to go “native” in the future.

Tapping Into Cloud Apps

But storage integration isn’t just limited to on premise applications, it also applies to cloud based applications as well. Today there is a large ecosystem of Amazon S3 compatible applications that businesses may want to leverage. Examples include backup and recovery, archiving, file sync and share, etc. Gaining access to these software offerings by utilizing an S3 compatible object storage framework, gives businesses even more use cases and value for leveraging low-cost hybrid cloud storage.

Data Anywhere Access

Now businesses can provision object storage resources on-premises and/or out across public cloud infrastructure to give their end-users ubiquitous access to data regardless of their physical location. This enables greater data mobility and can serve to enhance collaborative activities amongst end-users working across all corners of the globe. Furthermore, by replicating data across geographically dispersed object storage systems, businesses can automatically backup data residing in remote offices/branch offices to further enhance data resiliency.

With data intensive applications like big data analytic systems and data mining applications clamoring for high speed access to information, object storage repositories need to be capable of providing good performance as well. Ideally, the storage solution should be tuned to read, write and store large objects very efficiently while still providing high performance.

 Stem The Data Tide

Businesses today need a seamless way to grow out low-cost abundant, hybrid cloud storage resources across the enterprise to meet the unstructured data tsunami that is flooding their data center environments. In addition to providing virtually unlimited scaling from a storage capacity perspective, these resources need to easily integrate into existing application environments and provide optimal performance access to large unstructured data objects. Cloudian’s HyperStore solution provides all of these capabilities through a software defined storage approach which gives businesses the flexibility to choose amongst existing commodity disk assets in the data center and/or low cost object storage in the cloud, to help stem the unstructured data tide.

 

About Author

Colm Keegan is a 23 year IT veteran, Colm’s focus is in the enterprise storage, backup and disaster recovery solutions space at Storage Switzerland.