ParamTools 0.10.0


Highlights:

Better data access and CSV/spreadsheet compatibility

Many projects and their users are more familiar with spreadsheets than JSON files. Although ParamTools does not have built-in support for spreadsheets, it tries to make it easy for projects to write custom adjust methods for this type of use case. The example in the custom adjustment docs shows how to utilize this flexibility to read in CSV files and convert them into ParamTools adjustments. Tax-Cruncher, an open-source tax-policy model, uses this approach so that its users can adjust the default parameters with CSV files. This real-world example showed that ParamTools needed to make it easier for projects to loop over their parameters and their values. This release adds a few methods like .items() and .keys() that make Parameters objects feel more like Python dictionaries, yielding a familiar, intuitive interface:

for param in params:
  print(param)

# one_param
# two_param

for param, value in params.items():
    print(param, value)

# one_param [{'value': 'hello'}]
# two_param [{'value': 'world'}]

params.keys()

# odict_keys(['one_param', 'two_param'])

params.to_dict()

# {
#     "one_param": [{"value": "hello"}],
#     "two_param": [{"value": "world"}]
# }
      

An unexpected advantage of adding these methods is that a pandas DataFrame can be created directly from a Parameters instance:

import pandas as pd
import paramtools

class Params(paramtools.Parameters):
    defaults = {
        "a": {
            "title": "A",
            "description": "",
            "type": "int",
            "number_dims": 1,
            "value": [0]
        },
        "b": {
            "title": "B",
            "description": "",
            "type": "int",
            "number_dims": 1,
            "value": [0]
        }
    }
    array_first = True  # values stored as numpy arrays by default


params = Params()

params.adjust({
    "a": [1, 2, 3, 4, 5],
    "b": [6, 7, 8, 9, 10]
})


params_df = pd.DataFrame.from_dict(
    params.to_dict()
)

print(params_df)

#    a   b
# 0  1   6
# 1  2   7
# 2  3   8
# 3  4   9
# 4  5  10

      

Performance Improvements

ParamTools 0.10.0 brings significant performance improvements for parameter lookups and updates. Prior to this release, ParamTools used a simple but naive approach for searching over lists of value objects. For updates, the logic looked like this:

# loop over all new values
for i in range(len(new_values)):
    curr_vals = self._data[param]["value"]
    matched_at_least_once = False
    labels_to_check = tuple(k for k in new_values[i] if k != "value")
    to_delete = []
    # for each loop over the new values, loop over the existing values
    # to look for matches.
    for j in range(len(curr_vals)):
        # in our THIRD nested for loop, loop over all of the labels
        # in the new values.
        for label in labels_to_check:
            if curr_vals[j][label] == new_values[i][label]:
                match = True
                break
        if match:
            matched_at_least_once = True
            curr_vals[j]["value"] = new_values[i]["value"]
    if not matched_at_least_once:
        curr_vals.append(new_values[i])
      

The search function used a similar algorithm. The new approach sorts the incoming value objects and the existing value objects by their labels and their labels' values. It then does a single loop over the labels and keeps track of the matches between the incoming value objects and the existing value objects. Not only is there less algorithmic complexity, but much of the computational work utilizes Python's set data structure and its fast union and intersection operations. The result is a speed up of 5-14x for updates and 2-4x on searches, depending on how many value objects need to be searched or updated. I detailed the profiling approach and results in PR #74, and I plan to do a more in depth write-up in a follow-up post. Until then, the tree.py module (and its extensive doc strings and comments) is the best source of documentation for the new implementation.

Specify ParamTools operators like label_to_extend in the defaults object

The "schema" object in the defaults.json file now includes an "operators" object for specifying the value for attributes like label_to_extend or array_first. The new dump method is helpful for converting a Parameters instance's data to a JSON sting. This makes it easier for a webapp like Compute Studio to consume a project's ParamTools configuration even if the project itself is not installed on the webserver.

defaults = {
    "schema": {
        "operators": {
            "array_first": true,
            "label_to_extend": "year"
        },
    },
    "standard_deduction": {
        "title": "Standard deduction amount",
        "description": "Amount filing unit can use as a standard deduction.",
        "type": "float",
        "value": 10000
    },
}
      

Bug fixes