FeatureSketch API¶
FeatureSketch is a sparse streaming anomaly detector for events whose feature
names can change over time. It is useful for logs, counters, request metadata,
or one-hot encoded categories where new keys may appear and old keys may stop
appearing.
This detector is an independent rcf3 implementation. It is inspired by sparse
projection and sketch-based stream detectors, but it is not a paper-faithful
implementation of xStream or any other specific algorithm. The design notes and
literature background live in research.md.
Event Format¶
Each event is a sparse collection of (feature_name, value) pairs:
- feature names are strings
- values must be finite numbers
- duplicate feature names are summed before scoring or update
- an empty event is valid
- missing and zero-valued features are different
A feature with value 0.0 is still present. It contributes to the feature-set
signal even though its numeric magnitude is zero. For categorical values, encode
only active categories:
Do not emit inactive categories as 0.0 unless you intentionally want them to
count as observed features.
Configuration¶
| Parameter | Rust builder method | Python parameter | Default | Description | Constraints |
|---|---|---|---|---|---|
| Value projection dims | value_projection_dims |
value_projection_dims |
32 | Random projection dimensions for values | Must be > 0 |
| Presence projection dims | presence_projection_dims |
presence_projection_dims |
32 | Random projection dimensions for feature keys | Must be > 0 |
| Chains per ensemble | chains_per_ensemble |
chains_per_ensemble |
16 | Number of sketch chains per signal | Must be > 0 |
| Chain depth | chain_depth |
chain_depth |
8 | Multi-scale binning depth | Must be > 0 |
| Sketch rows | sketch_rows |
sketch_rows |
2 | Count-min rows per chain level | Must be > 0 |
| Sketch buckets | sketch_buckets |
sketch_buckets |
2048 | Buckets per count-min row | Must be >= 2 |
| Decay half-life | decay_half_life |
decay_half_life |
2048 | Event-count half-life for adaptation | Must be > 0 |
| Seed | seed |
seed |
random | Deterministic projection and sketch layout | Optional |
Larger projection dimensions, more chains, deeper chains, more rows, and more buckets can improve stability or reduce sketch collisions, but they increase CPU and memory cost. A shorter half-life adapts faster to changing streams; a longer half-life keeps older behavior influential for longer.
Implementation Choices¶
FeatureSketch keeps no feature-name registry. Feature names are hashed into a
fixed projection and sketch layout, so long-running memory does not grow with
the number of names seen historically.
The detector maintains two internal signals:
- value projections for unusual numeric magnitudes
- presence projections for unusual observed feature sets
Presence tracking is why feature shrink is visible: if a previously common key disappears from an event, the feature set changes even when remaining numeric values look ordinary.
Values are normalized with asinh(x) before projection, so all finite signed
values are accepted, including large positive and negative values. The anomaly
score is a sketch-density surprise score; higher values mean more anomalous
events. It is useful for ranking or thresholding within one configured detector,
not as a calibrated probability.
Rust API¶
Creating a detector¶
use rcf3::FeatureSketch;
let mut detector = FeatureSketch::builder()
.value_projection_dims(32)
.presence_projection_dims(32)
.chains_per_ensemble(16)
.chain_depth(8)
.sketch_rows(2)
.sketch_buckets(2048)
.decay_half_life(2048)
.seed(42)
.build()?;
Update and preview scoring¶
Use update_and_score when you want the pre-ingest anomaly score and want to
commit the same event:
let score = detector.update_and_score([
("endpoint:/login", 1.0),
("status:200", 1.0),
("bytes", 812.0),
])?;
assert!(score >= 0.0);
Use score to preview an event without mutating detector state:
let event = [
("endpoint:/admin", 1.0),
("status:401", 1.0),
("bytes", 12000.0),
];
let preview = detector.score(event)?;
let committed = detector.update_and_score(event)?;
assert_eq!(preview, committed);
Use update when you want to ingest without returning the score.
Status accessors¶
is_ready() becomes true after the first processed event. Early scores are
still warmup scores; production pipelines commonly ignore an initial warmup
period before thresholding.
Serialization¶
With the serde feature enabled:
With both serde and std enabled, file helpers are also available:
detector.save_json("featuresketch.json")?;
let restored_from_file = FeatureSketch::load_json("featuresketch.json")?;
Practical example¶
use rcf3::FeatureSketch;
let mut detector = FeatureSketch::builder()
.seed(2026)
.sketch_buckets(512)
.build()?;
for _ in 0..64 {
detector.update([
("endpoint:/login", 1.0),
("status:200", 1.0),
("bytes", 750.0),
])?;
}
let normal = detector.score([
("endpoint:/login", 1.0),
("status:200", 1.0),
("bytes", 790.0),
])?;
let suspicious = detector.score([
("endpoint:/admin", 1.0),
("status:401", 1.0),
("bytes", 12000.0),
])?;
println!("normal={normal}, suspicious={suspicious}");
Python API¶
Creating a detector¶
from rcf3 import FeatureSketch
detector = FeatureSketch(
value_projection_dims=32,
presence_projection_dims=32,
chains_per_ensemble=16,
chain_depth=8,
sketch_rows=2,
sketch_buckets=2048,
decay_half_life=2048,
seed=42,
)
Update and preview scoring¶
Python accepts either a mapping or a sequence of (name, value) pairs:
score = detector.update_and_score({
"endpoint:/login": 1.0,
"status:200": 1.0,
"bytes": 812.0,
})
assert score >= 0.0
event = {
"endpoint:/admin": 1.0,
"status:401": 1.0,
"bytes": 12000.0,
}
preview = detector.score(event)
committed = detector.update_and_score(event)
assert preview == committed
Use update(event) when you want to ingest without returning the score.
Status accessors¶
Serialization¶
json_str = detector.to_json()
restored = FeatureSketch.from_json(json_str)
detector.save_json("featuresketch.json")
restored_from_file = FeatureSketch.load_json("featuresketch.json")
FeatureSketch also supports normal Python pickle round-trips through its
JSON-backed state hooks.
Practical example¶
from rcf3 import FeatureSketch
detector = FeatureSketch(seed=2026, sketch_buckets=512)
for _ in range(64):
detector.update({
"endpoint:/login": 1.0,
"status:200": 1.0,
"bytes": 750.0,
})
normal = detector.score({
"endpoint:/login": 1.0,
"status:200": 1.0,
"bytes": 790.0,
})
suspicious = detector.score({
"endpoint:/admin": 1.0,
"status:401": 1.0,
"bytes": 12000.0,
})
print(f"normal={normal}, suspicious={suspicious}")