API Reference¶
Reference documentation is generated from Python docstrings in the grumpy package. For narrative tutorials, start at Home and follow the section links at the bottom of each page.
Top-level API¶
Grumpy: high-performance numerical computing on ragged and nested data.
Grumpy provides Awkward-like layout semantics with strong typing, explicit nullability, mutable arrays, Zarr-backed I/O, and optional compilation of streaming transforms.
Layouts
Arrays use either list-chains (ListOffset -> … -> Leaf) or **UnionScalarList``**
(mixed scalar/list rows at one axis). Both are constructed with :func:array`, persisted to
Zarr, streamed, and used in dataframes.
Notes
- Streaming supports axis-0 and
batch_onbatching, shuffle, DDP, and I/O prefetch on both layout paths. gr.compileaccepts a restricted subset of Python (see :func:compile); scalar elementwise opcodes fuse on union batches as well as list-chains.
add ¶
Elementwise add with optional pre-allocated out.
array ¶
Construct a GrumpyArray from Python scalars / nested lists or tuples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj
|
Python scalar or nested Python sequences (lists/tuples) of arbitrary depth. |
required | |
dtype
|
DType | None
|
Optional explicit dtype. If omitted, dtype is inferred from non-null leaves. |
None
|
bincount ¶
can_cast ¶
Return whether from_dtype can be cast to to_dtype under casting.
cat ¶
Concatenate arrays along a ragged dimension.
full_like ¶
Create an array with the same ragged structure as x, filled with fill_value.
gpu_available ¶
Return True when a GPU backend (Metal or CUDA) is available.
gpu_backend ¶
Return 'metal', 'cuda', or None if no GPU backend is active.
grid_pool ¶
grid_pool(x: GrumpyArray, grid_size: tuple[int, int, int], *, origin: tuple[float, float, float] | None = None, voxel_size: tuple[float, float, float] | None = None, dim: int = 1) -> GrumpyArray
Voxelize point clouds by counting points per grid cell (occupancy pooling).
Returns (n_groups, nx*ny*nz) occupancy grids per group.
histogram ¶
histogram(x: GrumpyArray, bins: int = 10, range: tuple[float, float] | None = None, density: bool = False, weights: GrumpyArray | None = None) -> tuple[GrumpyArray, GrumpyArray]
multiply ¶
Elementwise multiply with optional pre-allocated out (NumPy out= style).
neighbors ¶
neighbors(query: GrumpyArray, data: GrumpyArray, k: int | None = None, radius: float | None = None, dim: int = 0, loop: bool = True, return_distances: bool = False, gpu: bool | str = False)
Compute neighbors and return an edge index (and optionally distances).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
GrumpyArray
|
Coordinate arrays. For |
required |
data
|
GrumpyArray
|
Coordinate arrays. For |
required |
k
|
int | None
|
Number of nearest neighbors (mutually exclusive with |
None
|
radius
|
float | None
|
Include all neighbors within this distance (mutually exclusive with |
None
|
dim
|
int
|
Axis along which groups of points live ( |
0
|
loop
|
bool
|
If |
True
|
return_distances
|
bool
|
If |
False
|
gpu
|
bool | str
|
|
False
|
Returns:
| Name | Type | Description |
|---|---|---|
edge_index |
Ragged edge index with last axis length 2: |
|
distances (optional):
|
Returned when |
ones_like ¶
Create an array with the same ragged structure as x, filled with ones.
pairwise_distances ¶
All-pairs Euclidean distances within each point cloud (group).
For dim=1, input shape is (n_groups, n_points, d); output is
(n_groups, n_points, n_points) distance matrices.
promote_types ¶
NumPy-style binary result dtype for two dtypes.
save ¶
Save a GrumpyArray/DataFrame, or incrementally write batches from a generator.
subtract ¶
Elementwise subtract with optional pre-allocated out.
zeros_like ¶
Create an array with the same ragged structure as x, filled with zeros.
Core types¶
Streaming¶
Streaming iterators and parallel batch transforms for saved Grumpy datasets.
This module provides :class:Stream and :class:StreamApply for batching
over Zarr-backed stores written by :func:grumpy.save.
Features
- Axis-0 batching with optional
batch_onschema-level packing (list-chain and union layouts) - Reproducible batch-order shuffle and within-batch shuffle on a schema level
- DDP sharding via
world_size/rank - I/O prefetch via
workers(distinct fromStreamApplytransform parallelism) - Partial batch reads (leaf ranges only) via the Rust
StreamBatchesIter - Compact union partial I/O: slice tags/index and referenced scalar/list pools only
- Subset iteration via
st[index](int, slice, or sequence of batch indices)
Notes
Indexedlayouts are not yet supported for streaming slice loads.- Compiled Rust scheduling supports a restricted opcode set (see
compiler.py); scalar elementwise opcodes work onUnionScalarListbatches.
Stream
dataclass
¶
Stream(path: str, batch_size: int, drop_last: bool = False, batch_on: Optional[str] = None, shuffle: Optional[Union[str, bool]] = None, seed: Optional[int] = None, workers: int = 0, in_memory: bool = False, gpu: Union[bool, str] = False, world_size: int = 1, rank: int = 0, batch_indices: Optional[tuple[int, ...]] = None)
Iterator over batches of a saved :class:~grumpy.GrumpyArray or dataframe.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path passed to :func: |
required |
batch_size
|
int
|
Maximum number of axis-0 elements (or |
required |
drop_last
|
bool
|
If |
False
|
batch_on
|
Optional[str]
|
Optional schema level name (e.g. |
None
|
shuffle
|
Optional[Union[str, bool]]
|
If set (e.g. |
None
|
seed
|
Optional[int]
|
Random seed for |
None
|
workers
|
int
|
Number of background I/O prefetch slots and parallel loader threads ( |
0
|
in_memory
|
bool
|
If |
False
|
world_size
|
int
|
DDP world size; batches are partitioned as |
1
|
rank
|
int
|
DDP rank in |
0
|
Examples:
>>> import grumpy as gr
>>> gr.save(gr.array(list(range(100))), 'data.gr')
>>> st = gr.stream('data.gr', batch_size=32)
>>> len(st)
4
__getitem__ ¶
Return a stream over a subset of batches (after DDP sharding).
apply ¶
apply(fns: Union[Callable[[T], T], Sequence[Callable[[T], T]]], cpu: int = 1, prefetch: Optional[int] = None, compile: Union[bool, str] = 'auto', scheduler: str = 'auto') -> 'StreamApply[T]'
Apply one or more batch transforms, optionally compiled and parallelized.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fns
|
Union[Callable[[T], T], Sequence[Callable[[T], T]]]
|
Callable or sequence of callables |
required |
cpu
|
int
|
Worker count for parallel apply ( |
1
|
prefetch
|
Optional[int]
|
Max in-flight batches for threaded scheduling (default |
None
|
compile
|
Union[bool, str]
|
|
'auto'
|
scheduler
|
str
|
|
'auto'
|
Returns:
| Type | Description |
|---|---|
StreamApply
|
Lazy iterable of transformed batches. |
StreamApply
dataclass
¶
StreamApply(base: Stream, fns: list[Callable[[T], T]], cpu: int = 1, prefetch: Optional[int] = None, compile: Union[bool, str] = 'auto', scheduler: str = 'auto', gpu: Union[bool, str] = False)
Bases: Iterable[T]
Lazy iterable of transformed batches produced from a :class:Stream.
Compilation¶
Compile restricted batch transforms into fused Rust execution plans.
The compiler analyzes straight-line Python functions (typically def f(batch): ...)
and builds :class:~grumpy._core.CompiledPlan opcode lists for use with
:meth:~grumpy.stream.Stream.apply or the :func:compile decorator.
Supported inputs
- List-chain and
UnionScalarListbatches for scalar elementwise opcodes (batch * 2,batch + 1, …). - Reduction and neighbor opcodes when the underlying Rust kernel supports the layout.
Known limitations
- No control flow (
if/for/try), no imports, singlebatchparameter. - Rust scheduling supports only a fixed opcode set (see
stream.py).
CompiledTransform ¶
Callable wrapper that runs a compiled Rust plan when possible.
Instances are returned by :func:compile and used internally by
:meth:~grumpy.stream.Stream.apply.
Attributes:
| Name | Type | Description |
|---|---|---|
is_compiled |
bool
|
Whether a Rust :class: |
compile_error |
str or None
|
Compilation failure message when |
Examples:
>>> import grumpy as gr
>>> @gr.compile
... def scale(batch):
... return batch * 2
...
>>> scale.is_compiled
True
>>> scale(gr.array([1, 2])).to_list()
[2, 4]
compile_error
property
¶
is_compiled
property
¶
__call__ ¶
Run the compiled plan or fall back to the original Python function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
GrumpyArray or GrumpyDataFrame
|
Input batch. |
required |
Returns:
| Type | Description |
|---|---|
GrumpyArray or GrumpyDataFrame
|
Transformed batch. |
Examples:
compile ¶
Compile a restricted batch transform into a Rust execution plan.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fn
|
callable
|
Function |
required |
Returns:
| Type | Description |
|---|---|
CompiledTransform
|
Callable wrapper that executes the plan when compilation succeeds. |
Examples:
Next: Developer — repository layout, implementation notes, and error handling.