-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default dtypes & core.construction #188
Comments
@nvictus @golobor any suggestions? _rc = {"colnames": {"chrom": "chrom", "start": "start", "end": "end"},
"col_dtpyes": {"chrom": str, "start": pd.Int64Dtype(), "end": pd.Int64Dtype()}
} and something like: def from_any(regions, fill_null=False, name_col="name", cols=None, col_dtypes=None):
ck1_dtype, sk1_dtype, ek1_dtype = _get_default_col_dtypes() if cols is None else cols |
Why |
I guess we could say the same thing about |
rc is borrowed from matplotlib which borrowed it from Linux: https://stackoverflow.com/questions/11030552/what-does-rc-mean-in-dot-files . It doesn't have to be named like that, can be, for example, _conf. pd.Int64Dtype() is probably better than int, right? I think we use it quite extensively throughout the library, so we might as well make it the default - this way we'll have more consistent NaNs. names/dtypes is fine with me! We can also use a nested dict or a SimpleNamespace/NamedTuple (not bad, since we only need to read/modify variables with existing keys): _conf = dict(
names = dict(chrom='chrom', start='start', end='end'),
dtypes = dict(...)
) or: import types
_conf = types.SimpleNamespace()
_conf.col = types.SimpleNamespace()
_conf.col.names = types.SimpleNamespace()
_conf.col.names.chrom=chrom
... |
looking at tests, there is a lot of boilerplate that could be reduced, and tests could be made more readable, if we could specify dtypes for functions in core.construction (including
from_any
,from_list
, andfrom_series
).for example:
would become
We provide a dictionary for default columns names in
core.specs
, however there does not seem to be a dictionary (or other specification) for default dtypes.One option would be to add them right after the default column names in
core.specs
:https://github.com/open2c/bioframe/blob/main/bioframe/core/specs.py#L11C1-L12C1
If added, should they be
int
,pd.Int64Dtype()
, or something else forstart
andend
?The text was updated successfully, but these errors were encountered: