treedata.TreeData

Contents

treedata.TreeData#

class treedata.TreeData(X=None, obs=None, var=None, uns=None, *, obsm=None, obst=None, varm=None, vart=None, layers=None, raw=None, dtype=None, shape=None, filename=None, filemode=None, asview=False, label='tree', alignment='leaves', allow_overlap=True, obsp=None, varp=None, oidx=None, vidx=None)#

AnnData with trees.

TreeData is a light-weight wrapper around AnnData which adds two additional attributes, obst and vart, to store trees for observations and variables. A TreeData object can be used just like an AnnData object and stores a data matrix X together with annotations of observations obs (obsm, obsp, obst), variables var (varm, varp, vart), and unstructured annotations uns.

Parameters:
  • X (ndarray | spmatrix | sparray | DataFrame | None (default: None)) – A #observations × #variables data matrix. A view of the data is used if the data type matches, otherwise, a copy is made.

  • obs (DataFrame | Mapping[str, Iterable[Any]] | None (default: None)) – Key-indexed one-dimensional observations annotation of length #observations.

  • var (DataFrame | Mapping[str, Iterable[Any]] | None (default: None)) – Key-indexed one-dimensional variables annotation of length #variables.

  • uns (Mapping[str, Any] | None (default: None)) – Key-indexed unstructured annotation.

  • obsm (ndarray | Mapping[str, Sequence[Any]] | None (default: None)) – Key-indexed multi-dimensional observations annotation of length #observations. If passing a ndarray, it needs to have a structured datatype.

  • obst (Mapping[str, DiGraph] | None (default: None)) – Key-indexed DiGraph trees leaf nodes in the observations axis.

  • varm (ndarray | Mapping[str, Sequence[Any]] | None (default: None)) – Key-indexed multi-dimensional variables annotation of length #variables. If passing a ndarray, it needs to have a structured datatype.

  • vart (Mapping[str, DiGraph] | None (default: None)) – Key-indexed DiGraph trees leaf nodes in the variables axis.

  • layers (Mapping[str, ndarray | spmatrix | sparray] | None (default: None)) – Key-indexed multi-dimensional arrays aligned to dimensions of X.

  • dtype (dtype | type | str | None (default: None)) –

    Deprecated since version The: dtype argument is deprecated and will be removed in a future version.

  • shape (tuple[int, int] | None (default: None)) – Shape tuple (#observations, #variables). Can only be provided if X is None.

  • filename (PathLike | None (default: None)) – Name of backing file. See h5py.File.

  • filemode (Optional[Literal['r', 'r+']] (default: None)) – Open mode of backing file. See h5py.File.

  • asview (bool (default: False)) – Initialize as view. X has to be an TreeData object.

  • label (str | None (default: 'tree')) – Columns in .obs and .var to place tree key in. Default is “tree”. If it’s None, no column is added.

  • alignment (Literal['leaves', 'nodes', 'subset'] (default: 'leaves')) –

    Alignment between trees and observations/variables. One of the following:

    • leaves: All leaf names are present in the observation/variable names.

    • nodes: All leaf and internal node names are present in the observation/variable names.

    • subset: A subset of leaf and internal node names are present in the observation/variable names.

  • allow_overlap (bool (default: True)) – Whether trees containing overlapping sets of leaves or nodes are allowed. Default is True.

chunk_X(select=1000, *, replace=True)#

Return a chunk of the data matrix X with random or specified indices.

Parameters:
  • select (int | Sequence[int] | ndarray (default: 1000)) –

    Depending on the type:

    int

    A random chunk with select rows will be returned.

    sequence (e.g. a list, tuple or numpy array) of int

    A chunk with these indices will be returned.

  • replace (bool (default: True)) – If select is an integer then True means random sampling of indices with replacement, False without replacement.

chunked_X(chunk_size=None)#

Return an iterator over the rows of the data matrix X.

Parameters:

chunk_size (int | None (default: None)) – Row size of a single chunk.

concatenate()#

Concatenate deprecated, use treedata.concat instead.

Return type:

None

copy(filename=None)#

Full copy, optionally on disk.

Return type:

TreeData

obs_keys()#

List keys of observation annotation obs.

Deprecated since version 0.12.3: Use obs instead of obs_keys. (e.g. k in adata.obs or str(adata.obs.columns.tolist()))

Return type:

list[str]

obs_names_make_unique(join='-')#

Makes the index unique by appending a number string to each duplicate index element: ‘1’, ‘2’, etc.

If a tentative name created by the algorithm already exists in the index, it tries the next integer in the sequence.

The first occurrence of a non-unique value is ignored.

Parameters:

join (str (default: '-')) – The connecting string between name and integer.

Return type:

None

Examples

>>> from anndata import AnnData
>>> adata = AnnData(np.ones((2, 3)), var=pd.DataFrame(index=["a", "a", "b"]))
>>> adata.var_names.astype("string")
Index(['a', 'a', 'b'], dtype='string')
>>> adata.var_names_make_unique()
>>> adata.var_names.astype("string")
Index(['a', 'a-1', 'b'], dtype='string')
obs_vector(k, *, layer=None)#

Convenience function for returning a 1 dimensional ndarray of values from X, layers[k], or obs.

Made for convenience, not performance. Intentionally permissive about arguments, for easy iterative use.

Parameters:
  • k (str) – Key to use. Should be in var_names or obs.columns.

  • layer (str | None (default: None)) – What layer values should be returned from. If None, X is used.

Return type:

ndarray

Returns:

A one dimensional ndarray, with values for each obs in the same order as obs_names.

obsm_keys()#

List keys of observation annotation obsm.

Deprecated since version 0.12.3: Use obsm instead of obsm_keys. (e.g. k in adata.obsm or adata.obsm.keys() | {'u'})

Return type:

list[str]

obst_keys()#

List keys of variable annotation obst.

Return type:

list[str]

rename_categories(key, categories)#

Rename categories of annotation key in obs, var, and uns.

Only supports passing a list/array-like categories argument.

Besides calling self.obs[key].cat.categories = categories – similar for var - this also renames categories in unstructured annotation that uses the categorical annotation key.

Parameters:
  • key (str) – Key for observations or variables annotation.

  • categories (Sequence[Any]) – New categories, the same number as the old categories.

strings_to_categoricals(df=None)#

Transform string annotations to categoricals.

Only affects string annotations that lead to less categories than the total number of observations.

Parameters:

df (DataFrame | None (default: None)) – If df is None, modifies both obs and var, otherwise modifies df inplace.

Notes

Turns the view of an AnnData into an actual AnnData.

to_adata()#

Convert this TreeData object to an AnnData object.

Return type:

AnnData

to_df(layer=None)#

Generate shallow DataFrame.

The data matrix X is returned as DataFrame, where obs_names initializes the index, and var_names the columns.

  • No annotations are maintained in the returned object.

  • The data matrix is densified in case it is sparse.

Parameters:

layer (str | None (default: None)) – Key for .layers.

Return type:

DataFrame

Returns:

Pandas DataFrame of specified data matrix.

to_memory(copy=False)#

Return a new AnnData object with all backed arrays loaded into memory.

Parameters:

copy (default: False) – Whether the arrays that are already in-memory should be copied.

Return type:

TreeData

transpose()#

Transpose whole object

Data matrix is transposed, observations and variables are interchanged. Ignores .raw.

Return type:

TreeData

uns_keys()#

List keys of unstructured annotation.

Deprecated since version 0.13: Use uns instead of uns_keys. (e.g. k in adata.uns or sorted(adata.uns))

Return type:

list[str]

var_keys()#

List keys of variable annotation var.

Deprecated since version 0.12.3: Use var instead of var_keys. (e.g. k in adata.var or str(adata.var.columns.tolist()))

Return type:

list[str]

var_names_make_unique(join='-')#

Makes the index unique by appending a number string to each duplicate index element: ‘1’, ‘2’, etc.

If a tentative name created by the algorithm already exists in the index, it tries the next integer in the sequence.

The first occurrence of a non-unique value is ignored.

Parameters:

join (str (default: '-')) – The connecting string between name and integer.

Return type:

None

Examples

>>> from anndata import AnnData
>>> adata = AnnData(np.ones((2, 3)), var=pd.DataFrame(index=["a", "a", "b"]))
>>> adata.var_names.astype("string")
Index(['a', 'a', 'b'], dtype='string')
>>> adata.var_names_make_unique()
>>> adata.var_names.astype("string")
Index(['a', 'a-1', 'b'], dtype='string')
var_vector(k, *, layer=None)#

Convenience function for returning a 1 dimensional ndarray of values from X, layers[k], or obs.

Made for convenience, not performance. Intentionally permissive about arguments, for easy iterative use.

Parameters:
  • k – Key to use. Should be in obs_names or var.columns.

  • layer (str | None (default: None)) – What layer values should be returned from. If None, X is used.

Return type:

ndarray

Returns:

A one dimensional ndarray, with values for each var in the same order as var_names.

varm_keys()#

List keys of variable annotation varm.

Deprecated since version 0.12.3: Use varm instead of varm_keys. (e.g. k in adata.varm or adata.varm.keys() | {'u'})

Return type:

list[str]

vart_keys()#

List keys of variable annotation vart.

Return type:

list[str]

write(filename=None, compression=None, compression_opts=None, **kwargs)#

Write .h5td-formatted hdf5 file.

Parameters:
write_csvs(dirname, *, skip_data=True, sep=',')#

Write annotation to .csv files.

It is not possible to recover the full AnnData from these files. Use write() for this.

Parameters:
  • dirname (PathLike[str] | str) – Name of directory to which to export.

  • skip_data (bool (default: True)) – Skip the data matrix X.

  • sep (str (default: ',')) – Separator for the data.

write_h5ad(filename=None, *, convert_strings_to_categoricals=True, compression=None, compression_opts=None, as_dense=())#

Write .h5ad-formatted hdf5 file.

Note

Setting compression to 'gzip' can save disk space but will slow down writing and subsequent reading. Prior to v0.6.16, this was the default for parameter compression.

Generally, if you have sparse data that are stored as a dense matrix, you can dramatically improve performance and reduce disk space by converting to a csr_matrix:

from scipy.sparse import csr_matrix
adata.X = csr_matrix(adata.X)
Parameters:
  • filename (PathLike[str] | str | None (default: None)) – Filename of data file. Defaults to backing file.

  • convert_strings_to_categoricals (bool (default: True)) – Convert string columns to categorical.

  • compression (Optional[Literal['gzip', 'lzf']] (default: None)) –

    For [lzf, gzip], see the h5py Filter pipeline.

    Alternative compression filters such as zstd can be passed from the hdf5plugin library. Experimental.

    Usage example:

    import hdf5plugin
    adata.write_h5ad(
        filename,
        compression=hdf5plugin.FILTERS["zstd"]
    )
    

    Note

    Datasets written with hdf5plugin-provided compressors cannot be opened without first loading the hdf5plugin library using import hdf5plugin. When using alternative compression filters such as zstd, consider writing to zarr format instead of h5ad, as the zarr library provides a more transparent compression pipeline.

  • compression_opts (int | Any (default: None)) –

    For [lzf, gzip], see the h5py Filter pipeline.

    Alternative compression filters such as zstd can be configured using helpers from the hdf5plugin library. Experimental.

    Usage example (setting zstd compression level to 5):

    import hdf5plugin
    adata.write_h5ad(
        filename,
        compression=hdf5plugin.FILTERS["zstd"],
        compression_opts=hdf5plugin.Zstd(clevel=5).filter_options
    )
    

  • as_dense (Sequence[str] (default: ())) – Sparse arrays in AnnData object to write as dense. Currently only supports X and raw/X.

write_h5td(filename=None, compression=None, compression_opts=None, **kwargs)#

Write .h5td-formatted hdf5 file.

Parameters:
write_loom(filename, *, write_obsm_varm=False)#

Write .loom-formatted hdf5 file.

Parameters:

filename (PathLike[str] | str) – The filename.

write_zarr(store, chunks=None, **kwargs)#

Write a hierarchical Zarr array store.

Parameters:
property T: TreeData#

Transpose whole object

Data matrix is transposed, observations and variables are interchanged. Ignores .raw.

property X: XDataType | None#

Data matrix of shape n_obs × n_vars.

property alignment: Literal['leaves', 'nodes', 'subset']#

Mapping between trees and observations/variables.

property allow_overlap: bool#

Whether overlapping trees are allowed.

property filename: Path | None#

Change to backing mode by setting the filename of a .h5ad file.

  • Setting the filename writes the stored data to disk.

  • Setting the filename when the filename was previously another name moves the backing file from the previous file to the new file. If you want to copy the previous file, use copy(filename='new_filename').

property has_overlap: bool#

Flag indicating whether stored trees contain overlapping nodes.

Returns:

bool - True when any stored trees share nodes, False otherwise.

property is_view: bool#

True if object is view of another TreeData object, False otherwise.

property isbacked: bool#

True if object is backed on disk, False otherwise.

property isview: bool#

Whether or not this object is a view.

Deprecated since version 0.7.2: Use is_view instead of isview.

property label: str | None#

Column in .obs and .`obs` with tree keys

property layers: Layers | LayersView#

A property that creates an ephemeral AlignedMapping.

The actual data is stored as f'_{self.name}' in the parent object.

property n_obs: int#

Number of observations.

property n_vars: int#

Number of variables/features.

property obs: DataFrame | Dataset2D#

One-dimensional annotation of observations (pd.DataFrame).

property obs_names: Index#

Names of observations (alias for .obs.index).

property obsm: AxisArrays | AxisArraysView#

A property that creates an ephemeral AlignedMapping.

The actual data is stored as f'_{self.name}' in the parent object.

property obsp: PairwiseArrays | PairwiseArraysView#

A property that creates an ephemeral AlignedMapping.

The actual data is stored as f'_{self.name}' in the parent object.

property obst: AxisTrees | AxisTreesView#

Tree annotation of observations

Stores for each key a DiGraph with leaf nodes in obs_names. Is subset and pruned with data but behaves otherwise like a alignment.

property raw: Raw#

Store raw version of X and var as .raw.X and .raw.var.

The raw attribute is initialized with the current content of an object by setting:

adata.raw = adata.copy()

Its content can be deleted:

adata.raw = None
# or
del adata.raw

Upon slicing an AnnData object along the obs (row) axis, raw is also sliced. Slicing an AnnData object along the vars (columns) axis leaves raw unaffected. Note that you can call:

adata.raw[:, 'orig_variable_name'].X

to retrieve the data associated with a variable that might have been filtered out or “compressed away” in X.

property shape: tuple[int, int]#

Shape of data matrix (n_obs, n_vars).

property uns: MutableMapping#

Unstructured annotation (ordered dictionary).

property var: DataFrame | Dataset2D#

One-dimensional annotation of variables/ features (pd.DataFrame).

property var_names: Index#

Names of variables (alias for .var.index).

property varm: AxisArrays | AxisArraysView#

A property that creates an ephemeral AlignedMapping.

The actual data is stored as f'_{self.name}' in the parent object.

property varp: PairwiseArrays | PairwiseArraysView#

A property that creates an ephemeral AlignedMapping.

The actual data is stored as f'_{self.name}' in the parent object.

property vart: AxisTrees | AxisTreesView#

Tree annotation of variables

Stores for each key a DiGraph with leaf nodes in var_names. Is subset and pruned with data but behaves otherwise like a alignment.

Attributes table#

T

Transpose whole object

X

Data matrix of shape n_obs × n_vars.

alignment

Mapping between trees and observations/variables.

allow_overlap

Whether overlapping trees are allowed.

filename

Change to backing mode by setting the filename of a .h5ad file.

has_overlap

Flag indicating whether stored trees contain overlapping nodes.

is_view

True if object is view of another TreeData object, False otherwise.

isbacked

True if object is backed on disk, False otherwise.

isview

Whether or not this object is a view.

label

Column in .obs and .`obs` with tree keys

layers

A property that creates an ephemeral AlignedMapping.

n_obs

Number of observations.

n_vars

Number of variables/features.

obs

One-dimensional annotation of observations (pd.DataFrame).

obs_names

Names of observations (alias for .obs.index).

obsm

A property that creates an ephemeral AlignedMapping.

obsp

A property that creates an ephemeral AlignedMapping.

obst

Tree annotation of observations

raw

Store raw version of X and var as .raw.X and .raw.var.

shape

Shape of data matrix (n_obs, n_vars).

uns

Unstructured annotation (ordered dictionary).

var

One-dimensional annotation of variables/ features (pd.DataFrame).

var_names

Names of variables (alias for .var.index).

varm

A property that creates an ephemeral AlignedMapping.

varp

A property that creates an ephemeral AlignedMapping.

vart

Tree annotation of variables

Methods table#

chunk_X([select, replace])

Return a chunk of the data matrix X with random or specified indices.

chunked_X([chunk_size])

Return an iterator over the rows of the data matrix X.

concatenate()

Concatenate deprecated, use treedata.concat instead.

copy([filename])

Full copy, optionally on disk.

obs_keys()

List keys of observation annotation obs.

obs_names_make_unique([join])

Makes the index unique by appending a number string to each duplicate index element: '1', '2', etc.

obs_vector(k, *[, layer])

Convenience function for returning a 1 dimensional ndarray of values from X, layers[k], or obs.

obsm_keys()

List keys of observation annotation obsm.

obst_keys()

List keys of variable annotation obst.

rename_categories(key, categories)

Rename categories of annotation key in obs, var, and uns.

strings_to_categoricals([df])

Transform string annotations to categoricals.

to_adata()

Convert this TreeData object to an AnnData object.

to_df([layer])

Generate shallow DataFrame.

to_memory([copy])

Return a new AnnData object with all backed arrays loaded into memory.

transpose()

Transpose whole object

uns_keys()

List keys of unstructured annotation.

var_keys()

List keys of variable annotation var.

var_names_make_unique([join])

Makes the index unique by appending a number string to each duplicate index element: '1', '2', etc.

var_vector(k, *[, layer])

Convenience function for returning a 1 dimensional ndarray of values from X, layers[k], or obs.

varm_keys()

List keys of variable annotation varm.

vart_keys()

List keys of variable annotation vart.

write([filename, compression, compression_opts])

Write .h5td-formatted hdf5 file.

write_csvs(dirname, *[, skip_data, sep])

Write annotation to .csv files.

write_h5ad([filename, ...])

Write .h5ad-formatted hdf5 file.

write_h5td([filename, compression, ...])

Write .h5td-formatted hdf5 file.

write_loom(filename, *[, write_obsm_varm])

Write .loom-formatted hdf5 file.

write_zarr(store[, chunks])

Write a hierarchical Zarr array store.

Attributes#

TreeData.T#

Transpose whole object

Data matrix is transposed, observations and variables are interchanged. Ignores .raw.

TreeData.X#

Data matrix of shape n_obs × n_vars.

TreeData.alignment#

Mapping between trees and observations/variables.

TreeData.allow_overlap#

Whether overlapping trees are allowed.

TreeData.filename#

Change to backing mode by setting the filename of a .h5ad file.

  • Setting the filename writes the stored data to disk.

  • Setting the filename when the filename was previously another name moves the backing file from the previous file to the new file. If you want to copy the previous file, use copy(filename='new_filename').

TreeData.has_overlap#

Flag indicating whether stored trees contain overlapping nodes.

Returns:

bool - True when any stored trees share nodes, False otherwise.

TreeData.is_view#

True if object is view of another TreeData object, False otherwise.

TreeData.isbacked#

True if object is backed on disk, False otherwise.

TreeData.isview#

Whether or not this object is a view.

Deprecated since version 0.7.2: Use is_view instead of isview.

TreeData.label#

Column in .obs and .`obs` with tree keys

TreeData.layers: AlignedMappingProperty[Layers | LayersView]#

Dictionary-like object with values of the same dimensions as X.

Layers in AnnData are inspired by loompy’s loomlayers.

Return the layer named "unspliced":

adata.layers["unspliced"]

Create or replace the "spliced" layer:

adata.layers["spliced"] = ...

Assign the 10th column of layer "spliced" to the variable a:

a = adata.layers["spliced"][:, 10]

Delete the "spliced" layer:

del adata.layers["spliced"]

Return layers’ names:

adata.layers.keys()
TreeData.n_obs#

Number of observations.

TreeData.n_vars#

Number of variables/features.

TreeData.obs#

One-dimensional annotation of observations (pd.DataFrame).

TreeData.obs_names#

Names of observations (alias for .obs.index).

TreeData.obsm: AlignedMappingProperty[AxisArrays | AxisArraysView]#

Multi-dimensional annotation of observations (mutable structured ndarray).

Stores for each key a two or higher-dimensional ndarray of length n_obs. Is sliced with data and obs but behaves otherwise like a mapping.

TreeData.obsp: AlignedMappingProperty[PairwiseArrays | PairwiseArraysView]#

Pairwise annotation of observations, a mutable mapping with array-like values.

Stores for each key a two or higher-dimensional ndarray whose first two dimensions are of length n_obs. Is sliced with data and obs but behaves otherwise like a mapping.

TreeData.obst#

Tree annotation of observations

Stores for each key a DiGraph with leaf nodes in obs_names. Is subset and pruned with data but behaves otherwise like a alignment.

TreeData.raw#

Store raw version of X and var as .raw.X and .raw.var.

The raw attribute is initialized with the current content of an object by setting:

adata.raw = adata.copy()

Its content can be deleted:

adata.raw = None
# or
del adata.raw

Upon slicing an AnnData object along the obs (row) axis, raw is also sliced. Slicing an AnnData object along the vars (columns) axis leaves raw unaffected. Note that you can call:

adata.raw[:, 'orig_variable_name'].X

to retrieve the data associated with a variable that might have been filtered out or “compressed away” in X.

TreeData.shape#

Shape of data matrix (n_obs, n_vars).

TreeData.uns#

Unstructured annotation (ordered dictionary).

TreeData.var#

One-dimensional annotation of variables/ features (pd.DataFrame).

TreeData.var_names#

Names of variables (alias for .var.index).

TreeData.varm: AlignedMappingProperty[AxisArrays | AxisArraysView]#

Multi-dimensional annotation of variables/features (mutable structured ndarray).

Stores for each key a two or higher-dimensional ndarray of length n_vars. Is sliced with data and var but behaves otherwise like a mapping.

TreeData.varp: AlignedMappingProperty[PairwiseArrays | PairwiseArraysView]#

Pairwise annotation of variables/features, a mutable mapping with array-like values.

Stores for each key a two or higher-dimensional ndarray whose first two dimensions are of length n_var. Is sliced with data and var but behaves otherwise like a mapping.

TreeData.vart#

Tree annotation of variables

Stores for each key a DiGraph with leaf nodes in var_names. Is subset and pruned with data but behaves otherwise like a alignment.

Methods#

TreeData.chunk_X(select=1000, *, replace=True)#

Return a chunk of the data matrix X with random or specified indices.

Parameters:
  • select (int | Sequence[int] | ndarray (default: 1000)) –

    Depending on the type:

    int

    A random chunk with select rows will be returned.

    sequence (e.g. a list, tuple or numpy array) of int

    A chunk with these indices will be returned.

  • replace (bool (default: True)) – If select is an integer then True means random sampling of indices with replacement, False without replacement.

TreeData.chunked_X(chunk_size=None)#

Return an iterator over the rows of the data matrix X.

Parameters:

chunk_size (int | None (default: None)) – Row size of a single chunk.

TreeData.concatenate()#

Concatenate deprecated, use treedata.concat instead.

Return type:

None

TreeData.copy(filename=None)#

Full copy, optionally on disk.

Return type:

TreeData

TreeData.obs_keys()#

List keys of observation annotation obs.

Deprecated since version 0.12.3: Use obs instead of obs_keys. (e.g. k in adata.obs or str(adata.obs.columns.tolist()))

Return type:

list[str]

TreeData.obs_names_make_unique(join='-')#

Makes the index unique by appending a number string to each duplicate index element: ‘1’, ‘2’, etc.

If a tentative name created by the algorithm already exists in the index, it tries the next integer in the sequence.

The first occurrence of a non-unique value is ignored.

Parameters:

join (str (default: '-')) – The connecting string between name and integer.

Return type:

None

Examples

>>> from anndata import AnnData
>>> adata = AnnData(np.ones((2, 3)), var=pd.DataFrame(index=["a", "a", "b"]))
>>> adata.var_names.astype("string")
Index(['a', 'a', 'b'], dtype='string')
>>> adata.var_names_make_unique()
>>> adata.var_names.astype("string")
Index(['a', 'a-1', 'b'], dtype='string')
TreeData.obs_vector(k, *, layer=None)#

Convenience function for returning a 1 dimensional ndarray of values from X, layers[k], or obs.

Made for convenience, not performance. Intentionally permissive about arguments, for easy iterative use.

Parameters:
  • k (str) – Key to use. Should be in var_names or obs.columns.

  • layer (str | None (default: None)) – What layer values should be returned from. If None, X is used.

Return type:

ndarray

Returns:

A one dimensional ndarray, with values for each obs in the same order as obs_names.

TreeData.obsm_keys()#

List keys of observation annotation obsm.

Deprecated since version 0.12.3: Use obsm instead of obsm_keys. (e.g. k in adata.obsm or adata.obsm.keys() | {'u'})

Return type:

list[str]

TreeData.obst_keys()#

List keys of variable annotation obst.

Return type:

list[str]

TreeData.rename_categories(key, categories)#

Rename categories of annotation key in obs, var, and uns.

Only supports passing a list/array-like categories argument.

Besides calling self.obs[key].cat.categories = categories – similar for var - this also renames categories in unstructured annotation that uses the categorical annotation key.

Parameters:
  • key (str) – Key for observations or variables annotation.

  • categories (Sequence[Any]) – New categories, the same number as the old categories.

TreeData.strings_to_categoricals(df=None)#

Transform string annotations to categoricals.

Only affects string annotations that lead to less categories than the total number of observations.

Parameters:

df (DataFrame | None (default: None)) – If df is None, modifies both obs and var, otherwise modifies df inplace.

Notes

Turns the view of an AnnData into an actual AnnData.

TreeData.to_adata()#

Convert this TreeData object to an AnnData object.

Return type:

AnnData

TreeData.to_df(layer=None)#

Generate shallow DataFrame.

The data matrix X is returned as DataFrame, where obs_names initializes the index, and var_names the columns.

  • No annotations are maintained in the returned object.

  • The data matrix is densified in case it is sparse.

Parameters:

layer (str | None (default: None)) – Key for .layers.

Return type:

DataFrame

Returns:

Pandas DataFrame of specified data matrix.

TreeData.to_memory(copy=False)#

Return a new AnnData object with all backed arrays loaded into memory.

Parameters:

copy (default: False) – Whether the arrays that are already in-memory should be copied.

Return type:

TreeData

TreeData.transpose()#

Transpose whole object

Data matrix is transposed, observations and variables are interchanged. Ignores .raw.

Return type:

TreeData

TreeData.uns_keys()#

List keys of unstructured annotation.

Deprecated since version 0.13: Use uns instead of uns_keys. (e.g. k in adata.uns or sorted(adata.uns))

Return type:

list[str]

TreeData.var_keys()#

List keys of variable annotation var.

Deprecated since version 0.12.3: Use var instead of var_keys. (e.g. k in adata.var or str(adata.var.columns.tolist()))

Return type:

list[str]

TreeData.var_names_make_unique(join='-')#

Makes the index unique by appending a number string to each duplicate index element: ‘1’, ‘2’, etc.

If a tentative name created by the algorithm already exists in the index, it tries the next integer in the sequence.

The first occurrence of a non-unique value is ignored.

Parameters:

join (str (default: '-')) – The connecting string between name and integer.

Return type:

None

Examples

>>> from anndata import AnnData
>>> adata = AnnData(np.ones((2, 3)), var=pd.DataFrame(index=["a", "a", "b"]))
>>> adata.var_names.astype("string")
Index(['a', 'a', 'b'], dtype='string')
>>> adata.var_names_make_unique()
>>> adata.var_names.astype("string")
Index(['a', 'a-1', 'b'], dtype='string')
TreeData.var_vector(k, *, layer=None)#

Convenience function for returning a 1 dimensional ndarray of values from X, layers[k], or obs.

Made for convenience, not performance. Intentionally permissive about arguments, for easy iterative use.

Parameters:
  • k – Key to use. Should be in obs_names or var.columns.

  • layer (str | None (default: None)) – What layer values should be returned from. If None, X is used.

Return type:

ndarray

Returns:

A one dimensional ndarray, with values for each var in the same order as var_names.

TreeData.varm_keys()#

List keys of variable annotation varm.

Deprecated since version 0.12.3: Use varm instead of varm_keys. (e.g. k in adata.varm or adata.varm.keys() | {'u'})

Return type:

list[str]

TreeData.vart_keys()#

List keys of variable annotation vart.

Return type:

list[str]

TreeData.write(filename=None, compression=None, compression_opts=None, **kwargs)#

Write .h5td-formatted hdf5 file.

Parameters:
TreeData.write_csvs(dirname, *, skip_data=True, sep=',')#

Write annotation to .csv files.

It is not possible to recover the full AnnData from these files. Use write() for this.

Parameters:
  • dirname (PathLike[str] | str) – Name of directory to which to export.

  • skip_data (bool (default: True)) – Skip the data matrix X.

  • sep (str (default: ',')) – Separator for the data.

TreeData.write_h5ad(filename=None, *, convert_strings_to_categoricals=True, compression=None, compression_opts=None, as_dense=())#

Write .h5ad-formatted hdf5 file.

Note

Setting compression to 'gzip' can save disk space but will slow down writing and subsequent reading. Prior to v0.6.16, this was the default for parameter compression.

Generally, if you have sparse data that are stored as a dense matrix, you can dramatically improve performance and reduce disk space by converting to a csr_matrix:

from scipy.sparse import csr_matrix
adata.X = csr_matrix(adata.X)
Parameters:
  • filename (PathLike[str] | str | None (default: None)) – Filename of data file. Defaults to backing file.

  • convert_strings_to_categoricals (bool (default: True)) – Convert string columns to categorical.

  • compression (Optional[Literal['gzip', 'lzf']] (default: None)) –

    For [lzf, gzip], see the h5py Filter pipeline.

    Alternative compression filters such as zstd can be passed from the hdf5plugin library. Experimental.

    Usage example:

    import hdf5plugin
    adata.write_h5ad(
        filename,
        compression=hdf5plugin.FILTERS["zstd"]
    )
    

    Note

    Datasets written with hdf5plugin-provided compressors cannot be opened without first loading the hdf5plugin library using import hdf5plugin. When using alternative compression filters such as zstd, consider writing to zarr format instead of h5ad, as the zarr library provides a more transparent compression pipeline.

  • compression_opts (int | Any (default: None)) –

    For [lzf, gzip], see the h5py Filter pipeline.

    Alternative compression filters such as zstd can be configured using helpers from the hdf5plugin library. Experimental.

    Usage example (setting zstd compression level to 5):

    import hdf5plugin
    adata.write_h5ad(
        filename,
        compression=hdf5plugin.FILTERS["zstd"],
        compression_opts=hdf5plugin.Zstd(clevel=5).filter_options
    )
    

  • as_dense (Sequence[str] (default: ())) – Sparse arrays in AnnData object to write as dense. Currently only supports X and raw/X.

TreeData.write_h5td(filename=None, compression=None, compression_opts=None, **kwargs)#

Write .h5td-formatted hdf5 file.

Parameters:
TreeData.write_loom(filename, *, write_obsm_varm=False)#

Write .loom-formatted hdf5 file.

Parameters:

filename (PathLike[str] | str) – The filename.

TreeData.write_zarr(store, chunks=None, **kwargs)#

Write a hierarchical Zarr array store.

Parameters: