DrWatson. The scientific project assistant

George Datseris, May 29, 2019


Hello everyone,

I’d like to present a new software that was recently released in open beta, called DrWatson, the perfect sidekick to your scientific inquiries.

DrWatson is what I’ll call a “scientific project assistant”. It is software package created to help people “deal” with their simulations, simulation parameters, where are files saved, experimental data, scripts, existing simulations, project source code and in general their scientific projects. Although it is written in Julia, I believe its principles, practices and how it can assist you in managing a project can be applied to any other language.

You can find the full documentation here:

https://juliadynamics.github.io/DrWatson.jl/dev/

In this blogpost I will summarize what this software can do for you and how can it help you by reducing the stress of managing a scientific project. Please be aware that DrWatson is not a database management system.

Functionality

DrWarson is a scientific project assistant software package. Its core functionality is separated into 4, almost completely independent parts:

What makes DrWatson different

are the principles that we based our design decisions.

  1. Non-Invasive. DrWatson does not require you to follow strict rules or change the way you work and do science in order to use it. In addition DrWatson is function-based: you only have to call a function and everything else just works; you do not have to create additional special struct or other data types.
  2. Simple. The functionality offered is a baseline from where you handle your project as you wish. This makes it more likely to be of general use but also means that you don’t have to “study” to learn DrWatson: all concepts are simple, everything is easy to understand.
  3. Consistent. The functionality is identical across all projects and DrWatson offers a universal base project structure.
  4. Allows increments. You didn’t plan your project well enough? Want to add more folders, more files, more variables to your simulations? It’s fine.
  5. Reproducibility. DrWatson aims to make your projects fully reproducible using Git, Julia’s package manager and consistent naming schemes.
  6. Modular. DrWatson has a flexible modular design which means you only have to use what fits your project.

Examples

Let me now show two real life examples of using DrWatson.

1. Getting a name out of a container

In simulations it is often the case that we used a named container (like a dictionary) as something to hold our parameters in. For example

c = (T = 100, dt = 0.1, N = 25, mode = “bi”)

then calling the function savename that DrWatson offers results in

savename(c)
"N=25_T=100_dt=0.1_mode=bi"

while you can also add prefix/suffix

savename(datadir(), c, “dat”)
"path/to/data/N=25_T=100_dt=0.1_mode=bi.dat"

What is important is that the function savename works for any named container, which includes dictionaries, named tuples or any Julia composite structure (existing or to-be-created in the future).

In addition, it sensibly excludes non-fitting fields (e.g. vector-valued fields) but can be customized to full extent for your custom type.

2. Producing a dataframe from saved files

Let’s say that you have run some simulations with parameters a = 1 or 2, b = “x” or “y” and c = 0.3. You save these simulations in the directory “data”. for the purpose of this example you only save the parameter values but no other further data (like actual results)

You can then call the function

df1 = collect_results("data/")
4×3 DataFrame
│ Row │ a      │ b       │ c        │
│     │ Int64⍰ │ String⍰ │ Float64⍰ │
├─────┼────────┼─────────┼──────────┤
│ 1   │ 1      │ x       │ 0.3      │
│ 2   │ 1      │ y       │ 0.3      │
│ 3   │ 2      │ x       │ 0.3      │
│ 4   │ 2      │ y       │ 0.3      │

This is already quite convenient. But where collect_results really shine is the following. You now run a new simulation with a=4, b=“z”, a new parameter d=0.5 or 0.6 and the parameter c does not exist anymore! You again save these simulations in “data”. Then you call collect_results again:

df2 = collect_results("data/")
6×4 DataFrame
│ Row │ a      │ b       │ c        │ d        │
│     │ Int64⍰ │ String⍰ │ Float64⍰ │ Float64⍰ │
├─────┼────────┼─────────┼──────────┼──────────┤
│ 1   │ 1      │ x       │ 0.3      │ missing  │
│ 2   │ 1      │ y       │ 0.3      │ missing  │
│ 3   │ 2      │ x       │ 0.3      │ missing  │
│ 4   │ 2      │ y       │ 0.3      │ missing  │
│ 5   │ 4      │ z       │ missing  │ 0.5      │
│ 6   │ 4      │ z       │ missing  │ 0.6      │

As you can see the collection scheme has no problem with combining simulations with different parameters. This is especially useful for a scientific project, since it is almost impossible to know in advance all simulations you would have to run by the end of the project!

Conclusion

I hope that DrWatson can help you and I hope you will be interested to discuss during the deRSE2019 meeting! See you all there! If you are eager to contact us about DrWatson before deRSE2019, please use the GitHub repository:

https://github.com/JuliaDynamics/DrWatson.jl

best regards, George Datseris