Ticket #600 (new enhancement)

Opened 4 months ago

Last modified 6 weeks ago

Summarization Framework

Reported by: admin Owned by: padams
Priority: critical Milestone: 1.7.0
Component: base module Version:
Keywords: Cc:

Description (last modified by admin) (diff)

Add a plug-able summarization framework to OWA so that reports can be driven from summary tables where possible.

In addition to a standard SQL based summary facility, we should make it easy for developers to write external programs to perform the summarization tasks so that integration with hadoop/map-reduce type platforms is possible.

Existing Table Modifications


  • add insertion_timestamp column to all fact and dimension tables.
  • add modified_timestamp column to all fact and dimension tables.
  • modify handlers to populate both new columns with current unix time on insert
  • modify handlers to populate modified_timestamp on updates


Summary Meta Table (new)


  • create a new owa_summary_meta table that will be used to control the summary processes.
    • table_name
    • frequency
    • last updated_timestamp

Summary Tables


  • create new summary_table abstract entity class
  • dynamically create concrete entity classes for each new summary table through a registerSummaryTable method in owa_module. Should be abel to pass in a list of metrics that are supported by the summary table and have those metric classes dynamically created and automatically registered.
  • Summary tables to register should include:
    • owa_sites_sum
    • owa_document_sum
    • owa_ua_sum
    • owa_referer_sum
    • owa_os_sum


SQL Based Summarizations


  • Use metric and entity classes to dynamically build the SQL to perform the summarization for each summary table.
  • create a 'summarize' CLI command that will:
    • query the summarization_meta table for a list of summarization tables and the last time each was updated
    • kick off the summarization jobs if there are rows in the fact tables that were inserted after the last time summarization for that table occurred.
      • Write out the summarized data to load files (one file per table, one line per day) that can then be bulk loaded into the DB by another CLI command.

Summary Loader


  • create a 'load_summary' CLI command that takes a summary table name and comma separated list of values (or a file path) and inserts/updates/loads it into the summary table.
  • Loads should update existing rows as summaries can be re-run.

Change History

Changed 4 months ago by admin

  • description modified (diff)

Changed 4 months ago by admin

  • description modified (diff)

Changed 4 months ago by admin

  • description modified (diff)

Changed 4 months ago by admin

  • description modified (diff)

Changed 4 months ago by admin

  • description modified (diff)

Changed 4 months ago by admin

  • description modified (diff)

Changed 6 weeks ago by admin

  • milestone changed from 1.6.0 to 1.7.0
Note: See TracTickets for help on using tickets.