I can tell you what I implemented but it may be too long to really post here. First off I really only do clickstream for user sessions after they have logged into site and have built and tested but not launched yet but site is in beta and should be live soon for you to get better feel for what I am describing. I've implemented two separate forms of clickstream ... lite and heavy. First the lite version
The lite version is only concerned with user page paths and is as follows:
- When the server starts up I create a server session id and a directory which holds all sorts of session forensic data including this clickstream data. At startup I create a user_session_path file in this directory. As a side note I also put the access logs, a periodic telemetry dumps, url profiling data, and nsv variable dumps into this directory.
- When the user logs in I add an nsv variable (user_session_id) to my nsv array user_session_paths.
- At each pageload (I run a preauth filter on all pages) I determine which page the user is on and map to a page/id hash that is loaded also at server startup. I then lappend (page_id, ns_time) to this user_session_path nsv variable.
- When user log out or is auto logged out I append their individ nsv session path array to the file that I created in the server session dir. Also unset the corresponding nsv var
- At server shutdown dump any remaining var using ns_shutdown proc.
You can then parse the session path file into a graph which represents user paths. I keep a database copy of the individual graphs and a cumulative graph that holds prev/next data. You can then see all pages before / after and the time between pages can be delved into. This is not necessarily a links to/from because a user can have multiple browser windows open and may switch between.
Thats the lite version the heavy version is much more detailed and contains about 40 different db tables. If you want I can post short descript of it or email you.
Best Regards,