Skip to content

Guess activity #4

@meedstrom

Description

@meedstrom

For background, see the README for all the theory.

Current questions on the stats theory

Re. the model for realtime guesses:

  • What kind of model can it be? A HMM (hidden Markov model)?
  • When the realtime guess is wrong and the user fixes it, how long should we respect that fix?
    • Can we use some indicators the fix is no longer valid?
    • Simple approach: after some time, ask the user if their org clock is still pointing at the correct activity, and increment the wait time between each repeat of this question.
      • Most users who correct the clock will probably also use org-clock-out when done (whereupon we take over), so it's not a major problem.

Re. the model for classifying the last 24-48 hours all at once:

  • Can the realtime guesses from the other model be used to inform this model?
    • Probably not directly, but if the user corrects it on its mistakes, this produces extra activity_verified datapoints. Including that data makes it a non-random sample with regards to time, but I don't think we'll use it in a way that needs it to be random.
  • Can this model use the activity_verified data at all, since they're only about single instants "I'm doing X right now this nanosecond"?
    • Probably. Assume the verification is good at least until the next time buffer_kind changes.
    • Should we assume that the verification stays good past that point, with an exponentially decaying effect?
      • Yes, if we use this data at all. See the issue of minimizing Org-clock lines #issuecomment-903853120. I guess the parameter to this exponential function would be determined by how large chunks we want to see in our agenda log.
  • Let's say it's Day 2 and we run a model overnight that classifies all of Day 1 and 2. The user wakes up on Day 3 and sees the results in their agenda log (or whatever visualization they prefer). At this point, Day 1 and 2 are "locked" from the VA's perspective, it'll never attempt to re-classify them, but they're still free for the user to modify.
    • Should we consider the locked days as all "verified", like observed data, or as still unverified (invisible to future runs of the model) and let the user verify blocks if they feel like it (signing off on them as "yes this is definitely what happened during that time"?
  • When it comes to verified time chunks from the past, how do we use it exactly? (We can insert the information as an activity_verified value attached to every buffer focus during these chunks.)

General questions:

  • Clarify the causal relation between activity and time.since.bufkind.change. (use an exponential decay)
  • Is there any causal relation between buffer_kind and time.since.bufkind.change?
    • This is like asking if there's a causal relation between buffer and another constructed variable time.since.buf.change. Tentatively, I think not.
  • How does activity_verified's missingness process missingness_verification be modeled?
    • Perhaps we could just eliminate it together with the idea of asking at random times at all, as I believe we don't need the info to be gathered at random times.
  • See also questions raised in README#DAG
  • See also my confusion under README#Rubin's basic questions

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions