Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • A minimized Javascript library and tracking snippet which can be added to any webpage web page and will collect and send data to a predefined Caskalytics endpoint.
  • A service that will handle the requests from the tracking code, store the required data, and return a 1x1.gif
  • A job that runs periodically that will process the new data written to the table and split the raw information into the secondary metrics. This job should also attempt to identify bot traffic.
  • A job that will run periodically to process additional calculated metrics such as pages per session and bounce rate
  • A service that exposes this data via a RESTful interface

...

  • Responsible for collecting information from beacon request and storing that information in a data store
  • Additional metrics gathered in this service
    • Requester IP
    • User Agent String
    • Time of request in UTC
  • Data is written to raw data table using the the key of <full page url>-<user-id>-<timestampInMilliseconds>
  • Each piece of data is stored in it's its own column

Dimension Splitter / Bot Filter Job

  • Responsible for splitting data into smaller dimensions, performing any external lookups on ids, and flagging bot traffic
  • Data Splitter Job
    • Splits full page url into hostname, path, query string
    • Pulls customizable url params from query string such as campaign and source
    • Splits referrer into referrer hostname and path
    • Runs geo ip lookup on ip address to find geography information
    • Runs dns lookup on IP to find ISP information
    • Parses User-Agent string into OS, Version, Browser, Version

 

Metrics Calculator Job

RESTful Service

Unanswered Questions

  • How do we handle sessions? Is it calculated real-time or after the fact?
  • What criteria do we use to find bots?
  • Is there other information we can gather that Google Analytics doesn't right now?
  • Will the tracking script include an async queue similar to GA's?
  • How is the endpoint for the service exposed to the web? Any security concerns?