Cassandra Log4j appender solutions

DataStax Enterprise allows you to stream your web and application log information into a database cluster via Apache log4j.

DataStax Enterprise allows you to stream your web and application log information into a database cluster via Apache log4j.

About the log4j Utility 

Apache log4j is a Java-based logging framework that provides runtime application feedback. It provides the ability to control the granularity of log statements using an external configuration file (log4j.properties).

With the Cassandra Appender, you can store the log4j messages in a table where they're available for in-depth analysis using the Hadoop and Solr capabilities provided by DataStax Enterprise. For information about Cassandra logging, see Logging Configuration. Additionally, DataStax provides a Log4j Search Demo.

The log4j utility has three main components: loggers, appenders, and layouts. Loggers are logical log file names. They are the names known to the Java application. Each logger is independently configurable for the level of logging. Outputs are controlled by Appenders. Numerous Appenders are available and multiple Appenders can be attached to any Logger. This makes it possible to log the same information to multiple outputs. Appenders use Layouts to format log entries. In the example below, messages show the level, the thread name, the message timestamp, the source code file, the line number, and the log message.

Log levels

The available levels are:
  • All - turn on all logging
  • OFF - no logging
  • FATAL - severe errors causing premature termination
  • ERROR - other runtime errors or unexpected conditions
  • WARN - use of deprecated APIs, poor use of API, near errors, and other undesirable or unexpected runtime situations
  • DEBUG - detailed information on the flow through the system
  • TRACE - more detailed than DEBUG
  • INFO - highlight the progress of the application at a coarse-grained level

Datastax does not recommend using TRACE or DEBUG in production due to verbosity and performance.

Log messages

As mentioned above, the messages that appear in the log are controlled via the conf/log4j.properties file. Using this properties file, you can control the granularity to the Java package and class levels. For example, DEBUG messages from a particular class can be included in the log while messages from others remain at a higher level. This is helpful to reduce clutter and to identify messages. The log is most commonly a file and/or stdout. The format, behavior (such as file rolling), and so on is also configurable at runtime.

Below are sample log messages from a Cassandra node startup:

INFO [main ] 2012-02-10 09:15:33,112 DatabaseDescriptor.java  (line 495 )
     Found table data in data directories. Consider using the CLI to define your schema.
INFO  [main ] 2012-02-10 09:15:33,135 CommitLog.java  (line 166 )
     No commitlog files found; skipping replay
INFO  [main ] 2012-02-10 09:15:33,150 StorageService.java  (line 400 )
     Cassandra version: 1.0.7
INFO  [main ] 2012-02-10 09:15:33,150 StorageService.java  (line 401 )
     Thrift API version: 19.20.0
INFO  [main ] 2012-02-10 09:15:33,150 StorageService.java  (line 414 )
     Loading persisted ring state
...

Storing log4j messages in a table 

The Cassandra Appender provides the capability to store log4j messages in a Cassandra table.

Procedure

  1. Add resources/log4j-appender/lib/ to your application classpath.
  2. Modify the conf/log4j.properties file, as shown in the example below:
    # Cassandra Appender
    log4j.appender.CASS = com.datastax.logging.appender.CassandraAppender
    log4j.appender.CASS.hosts = 127.0.0.1
    log4j.appender.CASS.port = 9160
    #log4j.appender.CASS.appName = "myApp" 
    #log4j.appender.CASS.keyspaceName = "Logging" #log4j.appender.CASS.columnFamily = "log_entries" 
    #log4j.appender.CASS.placementStrategy = "org.apache.cassandra.locator.NetworkTopologyStrategy" 
    #log4j.appender.CASS.strategyOptions = {"DC1" : "1", "DC2" : "3" } 
    #log4j.appender.CASS.replicationFactor = 1 
    #log4j.appender.CASS.consistencyLevelWrite = ONE 
    #log4j.appender.CASS.maxBufferedRows = 256
    
    log4j.logger.com.foo.bar = INFO, CASS

    Commented lines are included for reference and to show the default values.

    • log4j.appender.CASS = com.datastax.logging.appender.CassandraAppender specifies the CassandraAppender class and assigns it the CASS alias. This alias is referenced in the last line.
    • log4j.appender.CASS.hosts = 127.0.0.1 allows using a comma delimited list of Cassandra nodes (in case a node goes down).
    • Specify replication options in:
      log4j.appender.CASS.placementStrategy = "org.apache.cassandra.locator.NetworkTopologyStrategy" log4j.appender.CASS.strategyOptions = {"DC1" : "1", "DC2" : "3" }.
    • log4j.logger.com.foo.bar = INFO, CASS specifies that all log messages of level INFO and higher, which are generated from the classes and sub-packages within the com.foo.barpackage, are sent to the Cassandra server by the Appender.

    By default, the CassandraAppender records log messages in the table log_entries in the Logging keyspace. The definition of this table is as follows:

    cqlsh:Logging> DESCRIBE TABLE log_entries;
    
    CREATE TABLE log_entries (
      KEY uuid PRIMARY KEY,
      app_start_time bigint,
      app_name text,
      class_name text,
      file_name text,
      level text,
      line_number text,
      log_timestamp bigint,
      logger_class_name text,
      host_ip text,
      host_name text,
      message text,
      method_name text,
      ndc text,
      thread_name text,
      throwable_str_rep text
    ) WITH
      comment = '' AND
      comparator = text AND
      row_cache_provider = 'ConcurrentLinkedHashCacheProvider' AND
      key_cache_size = 200000.000000 AND
      row_cache_size = 0.000000 AND
      read_repair_chance = 1.000000 AND
      gc_grace_seconds = 864000 AND
      default_validation = text AND
      min_compaction_threshold = 4 AND
      max_compaction_threshold = 32 AND
      row_cache_save_period_in_seconds = 0 AND
      key_cache_save_period_in_seconds = 14400 AND
      replication_on_write = True;

    Example

    Consider the following log snippet:

    09:20:55,470  WARN SchemaTest:68 - This is warn message #163
    09:20:55,470  INFO SchemaTest:71 - This is info message  #489
    09:20:55,471 ERROR SchemaTest:59 - Test exception.
    java.io.IOException: Danger Will Robinson, Danger!
        at com.datastax.logging.SchemaTest.testSavedEntries (SchemaTest.java:58 )
        at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method )
        ...
    

    Note that the ERROR entry above includes the stack trace associated with an Exception. The associated rows in the log_entries table appear as follows (queried using cqlsh):

    KEY,eea1256e-db24-4cef-800b-843b3b2fb72c | app_start_time,1328894454774 | level,WARN |
     log_timestamp,1328894455391 | logger_class_name,org.apache.log4j.Category | message,
     This is warn message #163 | thread_name,main |
    
    KEY,f7283a71-32a2-43cf-888a-0c1d3328548d | app_start_time,1328894454774 | level,INFO |
    log_timestamp,1328894455064 | logger_class_name,org.apache.log4j.Category | message,
    This is info message  #489 | thread_name,main |
    
    KEY,37ba6b9c-9fd5-4dba-8fbc-51c1696bd235 | app_start_time,1328894454774 | level,ERROR |
     log_timestamp,1328894455392 | logger_class_name,org.apache.log4j.Category | message,
     Test exception. | thread_name,main | throwable_str_rep,java.io.IOException: Danger
     Will Robinson, Danger!
        at com.datastax.logging.SchemaTest.testSavedEntries (SchemaTest.java:58 )
        at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method )

    Not all columns have values because the set of values in logging events depends on the manner in which the event was generated, that is, which logging method was used in the code and the configuration of the table.

    Storing logging information in Cassandra provides the capability to do in-depth analysis via the DataStax Enterprise platform using Hadoop and Solr.