Now that the syntax has been described and the toolbox laid out, how do you practically go about using and configuring arbtt?
After installing arbtt, you need to configure it to run. There
are many ways you can run the arbtt-capture
daemon. One standard way is to include the command
arbtt-capture &
in your desktop environments startup script, e.g.
~/.xinitrc
or similar.
Another trick is add it as a cron
job. To do so, edit your crontab file (crontab -e
) and
add a line like this:
DISPLAY=:0 @reboot arbtt-capture --logfile=/home/username/doc/arbtt/capture.log
At boot, arbtt-capture
will be run in the
background and will capture a snapshot of the X metadata for
active windows every 60 seconds (the default). If you want more
fine-grained time data at the expense of doubling storage use,
you could increase the sampling rate with an option like
--sample-rate=30
. To be resilient to any errors
or segfaults, you could also wrap it in an infinite loop to restart
the daemon should it ever crash, with a command like
DISPLAY=:0 @reboot while true; do arbtt-capture --sample-rate=30; sleep 1m; done
arbtt tracks X properties like window title, class, and running program, and you write rules to classify those strings as you wish; but this assumes that the necessary data is present in those properties.
For some programs, this is the case. For example, web browsers
like Firefox typically set the X title to the
HTML <title>
element of the web page in the
currently-focused tab, which is enough for classification.
Some programs have title-setting available as plugins. The IRC client irssi
in a GNU screen or X terminal usually sets the title to just "irssi
",
which blocks more accurate time-classification based on IRC channel (one channel may be for
recreation, another for programming, and yet another for work), but can be easily configured
to set the title using the extension
title.pl
.
Some programs do not set titles or class, and all arbtt sees is
empty strings like ""
; or they may set the title/class
to a constant like "Liferea"
, which may be acceptable if
that program is used for only one purpose, but if it is used for
many purposes, then you cannot write a rule matching it without
producing highly-misleading time analyses. (For example, a web
browser may be used for countless purposes, ranging from work to
research to music to writing to programming; but if the web
browser's title/class were always just "Web browser"
,
how would you classify 5 hours spent using the web browser? If the
5 hours are classified as any or all of those purposes, then the
results will be misleading garbage - you probably did not spend 5
hours just listening to music, but a mixture of those purposes,
which changes from day to day.)
You should check for such problematic programs upon starting using arbtt. It would be unfortunate if you were to log for a few months, go back for a detailed report for some reason, and discover that the necessary data was never available for arbtt to log!
These programs can sometimes be customized internally, a bug
report filed with the maintainers, or their titles can be
externally set by
wmctrl
or
xprop
.
You can check the X properties of a running window by running
the command
xprop
and clicking on the window; xprop
will print
out all the relevant X information. For example, the output for
Emacs might look like this
$ xprop | tail -5 WM_CLASS(STRING) = "emacs", "Emacs" WM_ICON_NAME(STRING) = "emacs@elan" _NET_WM_ICON_NAME(UTF8_STRING) = "emacs@elan" WM_NAME(STRING) = "emacs@elan" _NET_WM_NAME(UTF8_STRING) = "emacs@elan"
This is not very helpful: it does not tell us the filename being
edited, the mode being used, or anything. You could classify
time spent in Emacs as "programming" or
"writing", but this would be imperfect, especially if
you do both activities regularly. However, Emacs can be
customized by editing ~/.emacs
, and after
some searching with queries like "setting Emacs window
title", the
Emacs
wiki and
manual
advise us to put something like this Elisp in our
.emacs
file:
(setq frame-title-format "%f")
Now the output looks different:
$ xprop | tail -5 WM_CLASS(STRING) = "emacs", "Emacs" WM_ICON_NAME(STRING) = "/home/gwern/arbtt.page" _NET_WM_ICON_NAME(UTF8_STRING) = "/home/gwern/arbtt.page" WM_NAME(STRING) = "/home/gwern/arbtt.page" _NET_WM_NAME(UTF8_STRING) = "/home/gwern/arbtt.page"
With this, we can usefully classify all such time samples as being “writing”:
current window $title == "/home/gwern/arbtt.page" ==> tag Writing,
Another common gap is terminals/shells: they often do not include information in the title like the current working directory or last shell command. For example, urxvt/Bash:
WM_COMMAND(STRING) = { "urxvt" } _NET_WM_ICON_NAME(UTF8_STRING) = "urxvt" WM_ICON_NAME(STRING) = "urxvt" _NET_WM_NAME(UTF8_STRING) = "urxvt" WM_NAME(STRING) = "urxvt"
Programmers may spend many hours in the shell doing a variety of
things (like Emacs), so this is a problem. Fortunately, this is
also solvable by customizing one's .bashrc
to
set the prompt to emit an escape code interpreted by the
terminal (baroque, but it works). The following will include the
working directory, a timestamp, and the last command:
trap 'echo -ne "\033]2;$(pwd); $(history 1 | sed "s/^[ ]*[0-9]*[ ]*//g")\007"' DEBUG
Now the urxvt samples are useful:
_NET_WM_NAME(UTF8_STRING) = "/home/gwern/wiki; 2014-09-03 13:39:32 arbtt-stats --help"
Some distributions (e.g. Debian) already provide the relevant configuration for this to happen. If it does not work for you, you can try to add
. /etc/profile.d/vte.sh
to your ~/.bashrc
.
A rule could classify based on the directory you are working in, the command one ran, or both. Other shells like zsh can be fixed this way too but the exact command may differ; you will need to research and experiment.
Some programs can be tricky to set. The
X image viewer
feh has a --title
option but it
cannot be set in the configuration file,
.config/feh/themes
, because it needs to be
specified dynamically; so you need to set up a shell alias or
script to wrap the command like
feh --title "$(pwd) / %f / %n"
.
xprop
can be tedious to use on every running
window and you may forget to check seldomly used programs. A better
approach is to use arbtt-stats
’s
--dump-samples
option: this option will print
out the collected data for specified time periods, allowing you
to examine the X properties en masse. This option can be used
with the --exclude=
option to print the samples for samples not matched
by existing rules as well, which is indispensable for
improving coverage and suggesting ideas for new rules. A good
way to figure out what customizations to make is to run arbtt as
a daemon for a day or so, and then begin examining the raw
samples for problems.
Example 2. An initial configuration session
An example: suppose I create a simple category file named
foo
with just the line
$idle > 30 ==> tag inactive
I can then dump all my arbtt samples for the past day with a command like this:
arbtt-stats --categorizefile=foo --m=0 --filter='$sampleage <24:00' --dump-samples
Because there are so many open windows, this produces a large amount (26586 lines) of hard-to-read output:
... ( ) Navigator: /r/Touhou's Favorite Arranges! Part 71: Retribution for the Eternal Night ~ Imperishable Night : touhou - Iceweasel ( ) Navigator: Configuring the arbtt categorizer (arbtt-stats) - Iceweasel ( ) evince: ATTACHMENT02 ( ) evince: 2009-geisler.pdf — Heart rate variability predicts self-control in goal pursuit ( ) urxvt: /home/gwern; arbtt-stats --categorizefile=foo --m=0 --filter='$sampleage <24:00' --dump-samples ( ) mnemosyne: Mnemosyne ( ) urxvt: /home/gwern; 2014-09-03 13:11:45 xprop ( ) urxvt: /home/gwern; 2014-09-03 13:42:17 history 1 | cut --delimiter=' ' --fields=5- ( ) urxvt: /home/gwern; 2014-09-03 13:12:21 git log -p .emacs (*) emacs: emacs@elan ( ) urxvt: /home/gwern/blackmarket-mirrors/silkroad2-forums; 2014-08-31 23:20:10 mv /home/gwern/cookies.txt ./; http_proxy="localhost:8118" wget... ( ) urxvt: /home/gwern/blackmarket-mirrors/agora; 2014-08-31 23:15:50 mv /home/gwern/cookies.txt ./; http_proxy="localhost:8118" wget --mirror ... ( ) urxvt: /home/gwern/blackmarket-mirrors/evolution-forums; 2014-08-31 23:04:10 mv ~/cookies.txt ./; http_proxy="localhost:8118" wget --mirror ... ( ) puddletag: puddletag: /home/gwern/music
Active windows are denoted by an asterisk, so I can focus &
simplify by adding a pipe like | fgrep '(*)'
,
producing more manageable output like
(*) urxvt: irssi (*) urxvt: irssi (*) urxvt: irssi (*) Navigator: Pyramid of Technology - NextNature.net - Iceweasel (*) Navigator: Search results - gwern0@gmail.com - Gmail - Iceweasel (*) Navigator: [New comment] The Wrong Path - gwern0@gmail.com - Gmail - Iceweasel (*) Navigator: Iceweasel (*) Navigator: Litecoin Exchange Rate - $4.83 USD - litecoinexchangerate.org - Iceweasel (*) Navigator: PredictionBook: LiteCoin will trade at >=10 USD per ltc in 2 years, - Iceweasel (*) urxvt: irssi (*) Navigator: Bug#691547 closed by Mikhail Gusarov <dottedmag@dottedmag.net> (Re: s3cmd: Man page: --default-mime-type documentation incomplete...) (*) Navigator: Bug#691547 closed by Mikhail Gusarov <dottedmag@dottedmag.net> (Re: s3cmd: Man page: --default-mime-type documentation incomplete...) (*) Navigator: Bug#691547 closed by Mikhail Gusarov <dottedmag@dottedmag.net> (Re: s3cmd: Man page: --default-mime-type documentation incomplete...) (*) urxvt: /home/gwern; 2014-09-02 14:25:17 man s3cmd (*) evince: bayesiancausality.pdf (*) evince: bayesiancausality.pdf (*) puddletag: puddletag: /home/gwern/music (*) puddletag: puddletag: /home/gwern/music (*) evince: bayesiancausality.pdf (*) Navigator: ▶ Umineko no Naku Koro ni Music Box 4 - オルガン小曲 第2億番 ハ短調 - YouTube - Iceweasel ...
This is better. We can see a few things: the windows all now
produce enough information to be usefully classified (Gmail can
be classified under email, irssi can be classified as IRC, the
urxvt usage can clearly be classified as programming, the PDF
being read is statistics, etc) in part because of customizations
to bash/urxvt. The duplication still impedes focus, and we don't
know what's most common. We can use another pipeline to sort,
count duplicates, and sort by number of duplicates
(| sort | uniq --count | sort --general-numeric-sort
),
yielding:
... 14 (*) Navigator: A Bluer Shade of White Chapter 4, a frozen fanfic | FanFiction - Iceweasel 14 (*) Navigator: Iceweasel 15 (*) evince: 2009-geisler.pdf — Heart rate variability predicts self-control in goal pursuit 15 (*) Navigator: Tool use by animals - Wikipedia, the free encyclopedia - Iceweasel 16 (*) Navigator: Hacker News | Add Comment - Iceweasel 17 (*) evince: bayesiancausality.pdf 17 (*) Navigator: Comments - Less Wrong Discussion - Iceweasel 17 (*) Navigator: Keith Gessen · Why not kill them all?: In Donetsk · LRB 11 September 2014 - Iceweasel 17 (*) Navigator: Notes on the Celebrity Data Theft | Hacker News - Iceweasel 18 (*) Navigator: A Bluer Shade of White Chapter 1, a frozen fanfic | FanFiction - Iceweasel 19 (*) gl: mplayer2 19 (*) Navigator: Neural networks and deep learning - Iceweasel 20 (*) Navigator: Harry Potter and the Philosopher's Zombie, a harry potter fanfic | FanFiction - Iceweasel 20 (*) Navigator: [OBNYC] Time tracking app - gwern0@gmail.com - Gmail - Iceweasel 25 (*) evince: ps2007.pdf — untitled 35 (*) emacs: /home/gwern/arbtt.page 43 (*) Navigator: CCC comments on The Octopus, the Dolphin and Us: a Great Filter tale - Less Wrong - Iceweasel 62 (*) evince: The physics of information processing superobjects - Anders Sandberg - 1999.pdf — Brains2 69 (*) liferea: Liferea 82 (*) evince: BMS_raftery.pdf — untitled 84 (*) emacs: emacs@elan 87 (*) Navigator: overview for gwern - Iceweasel 109 (*) puddletag: puddletag: /home/gwern/music 150 (*) urxvt: irssi
Put this way, we can see what rules we should write to
categorize: we could categorize the activities here into a few
categories of "recreational", "statistics",
"music", "email", "IRC",
"research", and "writing"; and add to the
categorize.cfg
some rules like thus:
$idle > 30 ==> tag inactive, current window $title =~ [/.*Hacker News.*/, /.*Less Wrong.*/, /.*overview for gwern.*/, /.*[fF]an[fF]ic.*/, /.* LRB .*/] || current window $program == "liferea" ==> tag Recreation, current window $title =~ [/.*puddletag.*/, /.*mplayer2.*/] ==> tag Music, current window $title =~ [/.*[bB]ayesian.*/, /.*[nN]eural [nN]etworks.*/, /.*ps2007.pdf.*/, /.*[Rr]aftery.*/] ==> tag Statistics, current window $title =~ [/.*Wikipedia.*/, /.*Heart rate variability.*/, /.*Anders Sandberg.*/] ==> tag Research, current window $title =~ [/.*Gmail.*/] ==> tag Email, current window $title =~ [/.*arbtt.*/] ==> tag Writing, current window $title == "irssi" ==> tag IRC,
If we reran the command, we'd see the same output, so we need to leverage our new rules and exclude any samples matching our current tags, so now we run a command like:
arbtt-stats --categorizefile=foo --filter='$sampleage <24:00' --dump-samples --exclude=Recreation --exclude=Music --exclude=Statistics --exclude=Research --exclude=Email --exclude=Writing --exclude=IRC | fgrep '(*)' | sort | uniq --count | sort --general-numeric-sort
Now the previous samples disappear, leaving us with a fresh batch of unclassified samples to work with:
9 (*) Navigator: New Web Order > Nik Cubrilovic - - Notes on the Celebrity Data Theft - Iceweasel 9 ( ) urxvt: /home/gwern; arbtt-stats --categorizefile=foo --filter='$sampleage <24:00' --dump-samples | fgrep '(*)' | less 10 (*) evince: ATTACHMENT02 10 (*) Navigator: These Giant Copper Orbs Show Just How Much Metal Comes From a Mine | Design | WIRED - Iceweasel 12 (*) evince: [Jon_Elster]_Alchemies_of_the_Mind_Rationality_an(BookFi.org).pdf — Alchemies of the mind 12 (*) Navigator: Morality Quiz/Test your Morals, Values & Ethics - YourMorals.Org - Iceweasel 33 ( ) urxvt: /home/gwern; arbtt-stats --categorizefile=foo --filter='$sampleage <24:00' --dump-samples | fgrep '(*)'...
We can add rules categorizing these as 'Recreational',
'Writing', 'Research', 'Recreational', 'Research', 'Writing',
and 'Writing' respectively; and we might decide at this point
that 'Writing' is starting to become overloaded, so we'll split
it into two tags, 'Writing' and 'Programming'. And then after
tossing another --exclude=Programming
into
our rules, we can repeat the process.
As we refine our rules, we will quickly spot instances where the title/class/program are insufficient to allow accurate classification, and we will figure out the best collection of tags for our particular purposes. A few iterations is enough for most purposes.
When building up rules, a few rules of thumb should be kept in mind:
This leads to misleading time reports. Avoid, for example, lumping all web browser time into a single category named 'Internet'; this is more misleading than helpful. Good categories describe an activity or goal, such as 'Work' or 'Recreation', not a tool, like 'Emacs' or 'Vim'.
Regexps are tricky and it can be easy to write rules far
broader than one intended. The --exclude
filters mean that one will never see samples which are matched
accidentally. If one is in doubt, it can be helpful to take a
specific sample one wants to match and several similar strings
and look at how well one's regexp rule works in Emacs's
regexp-builder
or online regexp-testers like
regexpal.
You will never classify 100% of samples because sometimes programs do not include useful X properties and cannot be fixed, you have samples from before you fixed them, or they are too transient (like popups and dialogues) to be worth fixing. It is not necessary to classify 100% of your time, since as long as the most common programs and, say, 80% of your time is classified, then you have most of the value. It is easy to waste more time tweaking arbtt than one gains from increased accuracy or more finely-grained tags.
If a tag takes up more than a third or so of your time, it is probably too large, masks variation, and can be broken down into more meaningful tags. Conversely, a tag too narrow to show up regularly in reports (because it is below the default 1% filter) may not be helpful because it is usually tiny, and can be combined with the most similar tag to yield more compact and easily interpreted reports.
Each halving of the sampling rate doubles the number of samples
taken and hence the storage requirement; sampling rates below 20s
are probably wasteful. But even the default 60s can accumulate
into a nontrivial amount of data over a year. A
constantly-changing binary file can interact poorly with backup
systems, may make arbtt analyses slower, and if one's system
occasionally crashes or experiences other problems, cause some
corruption of the log and be a nuisance in having to run
arbtt-recover
.
Thus it may be a good idea to archive one's
capture.log
on an annual basis. If one needs to
query the historical data, the particular log file can be
specified as an option like
--logfile=/home/gwern/doc/arbtt/2013-2014.log
arbtt supports CSV export of time by category in various levels of granularity in a 'long' format (multiple rows for each day, with n row specifying a category's value for that day). These CSV exports can be imported into statistical programs like R or Excel and manipulated as desired.
R users may prefer to have their time data in a 'wide' format
(each row is 1 day, with n columns for each
possible category); this can be done with the
reshape
default library. After reading in the
CSV, the time-intervals can be converted to counts and the data to
a wide data-frame with R code like the following:
arbtt <- read.csv("arbtt.csv") interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x)) as.integer(sub(" s","",x)) else { y <- unlist(strsplit(x, ":")); as.integer(y[[1]])*3600 + as.integer(y[[2]])*60 + as.integer(y[[3]]); } } else NA } arbtt$Time <- sapply(as.character(arbtt$Time), interval) library(reshape) arbtt <- reshape(arbtt, v.names="Time", timevar="Tag", idvar="Day", direction="wide")