| from Johannes Peter

Kerberize Apache ZooKeeper & Apache Solr:
Issues, solutions and recommendations

Apache ZooKeeper and Apache Solr kerberization

Web services such as Apache Solr are frequently secured by two way SSL encryption. Although this way ensures save communication and controlled access to Solr, in many cases a more fine-grained way to secure Solr is needed in order not only to secure the entire application, but also to manage access to collections. For this purpose, Solr can be additionally secured by the network authentication protocol Kerberos. In contrast to other articles describing how to kerberize Apache Solr, this blog post does not aim to provide a step by step guide, but rather intends to provide some descriptions about issues I was faced with when I tried to follow these guides as well as to provide probable solutions and recommendations for solving these issues.

Firstly, it is important to emphasize that a kerberized Solr requires a kerberized ZooKeeper to be secure. Solr manages the access to its collections via a file called security.json. This file is distributed to the Solr nodes via ZooKeeper. Therefore, a kerberized Solr is only secure as long as the write access to the security.json file in ZooKeeper is also secured. This means that we do not only have to consider how to kerberize Solr and how to communicate to a kerberized Solr, but also how to kerberize ZooKeeper, how to communicate to a kerberized ZooKeeper and how to allow kerberized communication between Solr and ZooKeeper.

The communication to a kerberized ZooKeeper and the communication between Solr and ZooKeeper require a Kerberos principal for a user as well as a keytab for that user. For creating a keytab file, the vast majority of guides consider to use the kadmin or the kadmin.local command-line interface. However, it is not very likely that you will have admin access to the Kerberos system in a production environment. Therefore, the following example focuses on creating a keytab file for a user called solr-ext using ktutil.

Executing the command ktutil opens the command interface where keytabs can be created or changed. The command add_entry adds a key to the keytab, and, finally, the command write_kt writes the keytab file to the local filesystem.

ktutil: add_entry -password -p solr-ext@EXAMPLE.COM -k 1 -e aes256-cts-hmac-sha1-96
Password for solr-ext@EXAMPLE.COM: 
ktutil: write_kt solr-ext.keytab

In the previous example, only one encryption type is added to the keytab file. In many cases, it might be reasonable to add more encryption types, depending on which encryption types are supported by the Kerberos system and the application. All available encryption types can be evaluated at https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/kdc_conf.html#encryption-types. Before the keytab file is used within an application, I strongly recommend to test it directly in order to check whether the password has been entered correctly or the like. This can be done easily by destroying the current Kerberos ticket (if there is one) and initializing a new one by using the keytab instead of providing a password.

kinit solr-ext@EXAMPLE.COM -k -t solr-ext.keytab

However, it has to be annotated that for a kerberized Solr you additionally need keytab files for user HTTP/host for every host of the cluster, which probably only can be created or provided by a Kerberos admin.

In some environments (e. g. HDP or CDH), a kerberized ZooKeeper ensemble might already be running. Nevertheless, it is worth considering to install a separate ensemble for the Solr cluster, as installing an own ZooKeeper ensemble is not too complicated, but allows more experimentation during the installation and improves the coordination of the Solr cluster (e. g. due to lower latencies). A guide about installing a kerberized ZooKeeper instance can be found at https://lucene.apache.org/solr/guide/6_6/kerberos-authentication-plugin.html. Irrespective of whether a new or an already running ZooKeeper installation is used for the new kerberized Solr, a ZooKeeper client should be configured as its command-line interface is the most direct way to check, create or change directories and files as well as their corresponding access control lists. The client can be installed on any server as long as its IP and port are not blocked for TCP on the host servers of the ZooKeeper ensemble (e. g. due to a firewall). For its usage, simply download the ZooKeeper binaries, untar them and configure the client. You don’t have to run a ZooKeeper instance on the clients’ server. A guide about how to configure the ZooKeeper client for Kerberos authentication can be found at https://docs.cloudera.com/documentation/cdh/5-1-x/CDH5-Security-Guide/cdh5sg_zookeeper_security.html. For proper authentication, I personally recommend to create a keytab for a principal as described above. Using the ticket cache did not work for me even after explicitly specifying the ticket cache file. Moreover, I had to disable the usage of the ticket cache. Additionally, I strongly recommend to enable debugging as this can lead to very valuable information when the Kerberos authentication fails. Finally, a jaas configuration file considering these recommendations could look like this:

Client {
  com.sun.security.auth.module.Krb5LoginModule required

The ZooKeeper client can be started by executing the zkCli.sh or the zkCli.bat script in the /bin folder with the ZooKeeper connect string for the parameter -server:

./zkCli.sh -server host:port,host:port,host:port

For the case that the ZooKeeper ensemble is also used by other applications, I recommend initially to create a subfolder within the ZooKeeper ensemble (e. g. /solr-ext, see below), and from then on to join the subfolder directly when starting the client:

./zkCli.sh -server host:port,host:port,host:port/solr-ext

This way ensures that the subdirectories of other applications cannot be harmed in any manner. The success of the Kerberos authentication during the login of the client can be checked by evaluating the logs prompted to the shell. For the case of a successful authentication, no WARN messages about a failed SASL authentication or configuration should occur in the shell. For the case of an unsuccessful authentication, the debug message above the warnings can be evaluated for more detailed information about the login process.

The ZooKeeper command line interface provides several commands, their usage can be looked up by executing help. To create a subdirectory as mentioned above, the create command has to be used, followed by parameters to name the directory, to associate some data with the directory, as well as to define the access control list (ACL). At this point, it should be annotated that the ZooKeeper file system doesn’t differ between files and directories. Each directory entry leads to some data - see:https://zookeeper.apache.org/doc/current/zookeeperOver.html#sc_dataModelNameSpace. The create command therefore contains not only the directory to create, but also the data linked with the directory. Finally, the access control list is appended to the command. Its rules consist of the schema, the user (or group), and the kind of access. The rules are separated by comma.

create /solr-ext some_data sasl:solr-ext:cdrwa,world:anyone:r

ZooKeeper supports several authentication schemes (see https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#sc_BuiltinACLSchemes) for the authentication with Kerberos, the Simple Authentication Security Layer (sasl) scheme has to be used. Nevertheless, in some cases the digest scheme might also be useful even within a Kerberos environment, as there is a way to configure a super user for a ZooKeeper ensemble, which requires the digest scheme - please also see https://community.cloudera.com/t5/Community-Articles/Zookeeper-Super-User-Authentication-and-Authorization/ta-p/246020. The access control list in the above example includes a rule using the sasl scheme (see: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+and+SASL) granting the user solr-ext create, delete, read, write, and authorize access to the /solr-ext directory (as well as to its linked data). It also includes a rule using the world schema, which grants read access to anyone. As the ZooKeeper file system doesn’t differ between files and directories, the create command could also be used to create files such as security.json. However, managing the ZooKeeper files is way easier using the Solr Control Script (see below). Nonetheless, the ZooKeeper command line interface is the most direct way to check and set access controls for the ZooKeeper data (commands getAcl and setAcl) to ensure that everything is secure.

For kerberizing a Solr instance, a keytab file for user HTTP/host is required in order to enable kerberized communication via Solr’s REST API. Furthermore, a jaas file is required, which is used when Solr authenticates to other Solr nodes or to the ZooKeeper ensemble. The jaas file could be the same file which is used to authenticate to ZooKeeper manually (see above):

Client {
  com.sun.security.auth.module.Krb5LoginModule required

Subsequently, some properties in solr.in.sh have to be appended to tell Solr the location of the node-specific HTTP principal and the spnego keytab, as well as the location of the jaas file:

SOLR_AUTHENTICATION_OPTS="-Djava.security.auth.login.config=/path/to/solr_jaas.conf -Dsolr.kerberos.cookie.domain=host -Dsolr.kerberos.cookie.portaware=true -Dsolr.kerberos.principal=HTTP/host@EXAMPLE.COM -Dsolr.kerberos.keytab=/path/to/spnego.service.keytab"

Additionally, the way to authenticate to the ZooKeeper ensemble has to be specified:

SOLR_ZK_CREDS_AND_ACLS="-DzkACLProvider=org.apache.solr.common.cloud.SaslZkACLProvider \

As already mentioned, the access to the Solr collections is managed via the file security.json. In order to manage files via ZooKeeper, many guides recommend to use the zkcli.sh script, which can be found at $SOLR_HOME/server/scripts/cloud-scripts/zkcli.sh. However, this script apparently does not authenticate to a kerberized ZooKeeper properly (at least this did not work for me). Therefore, I recommend to use the Solr Control Script (see: https://lucene.apache.org/solr/guide/6_6/solr-control-script-reference.html#SolrControlScriptReference-ZooKeeperOperations) which is capable to do so. Using this script does not require any additional configuration, as it is the same script that is already used for starting and stopping Solr ($SOLR_HOME/bin/solr). Initially, the simplest kind of security.json could be uploaded (more information about configuring the file security.json can be found at https://lucidworks.com/post/securing-solr-tips-tricks-and-other-things-you-really-need-to-know/:

'{"authentication":{"class": "org.apache.solr.security.KerberosPlugin"}}'

This can be done using the copy command of the Solr Control Script (after creating it locally):

bin/solr zk cp file:/path/to/security.json zk:/security.json -z host:port,host:port,host:port/solr-ext

Configurations can be uploaded similarly:

bin/solr zk upconfig -d /path/to/config -n config_name -z host:port,host:port,host:port/solr-ext

Now, Solr can be started. Unfortunately, you have to configure your browser properly before being enabled to access the Solr UI. A good description for this can be found here. For creating collections or ingesting some data, you could use cURL:

curl -i --negotiate -u : 'http://host:port/solr/admin/collections?action=CREATE&name=col_name&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=config_name'

Please note that you will have to adjust the file security.json and to restart Solr before you can access the collection or ingest data into it.

Apache, Apache ZooKeeper and Apache Solr are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Share this article with others


About the author

Johannes war bis Ende 2017 Berater und Architekt im Bereich Search und Big Data bei der Woodmark. Sein Spezialgebiet umfasste die Verarbeitung unstrukturierter Daten. Dazu zählen Suche, Log Analyse und Natural Language Processing.

To overview blog posts