1 min readfrom Machine Learning

Anomaly Detection Belongs in Your Database — built SIMD-accelerated isolation forests into Stratum's SQL engine [P]

Our take

Anomaly detection is now seamlessly integrated into Stratum, our columnar analytics engine for the JVM, allowing you to train and score isolation forest models directly from SQL—no need for Python or export pipelines. With SIMD-accelerated performance, you can achieve remarkable processing speeds of just six microseconds per transaction. This innovative feature enhances your data analysis capabilities, enabling you to identify anomalies with ease. For in-depth insights on its implementation and performance benchmarks, explore our full write-up at [datahike.io](https://datahike.io/notes/anomaly-detection-in-your-database/

We added native anomaly detection in Stratum, our columnar analytics engine for the JVM. Train and score isolation forest models entirely from SQL — no Python, no export pipeline:

SELECT * FROM transactions WHERE ANOMALY_SCORE('fraud_model') > 0.7; 

6 microseconds per transaction, SIMD-accelerated, runs inside the query engine. The full write-up covers why we built it, how isolation forests work, and benchmarks against PyOD/scikit-learn:

https://datahike.io/notes/anomaly-detection-in-your-database/

Stratum is open source (Apache 2.0): https://github.com/replikativ/stratum

Happy to answer questions about the implementation — the isolation forest is pure Java with Vector API SIMD, scoring is fused into the query execution pipeline so it benefits from zone map pruning and chunked streaming.

submitted by /u/flyingfruits
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Tagged with

#automated anomaly detection#rows.com#natural language processing for spreadsheets#self-service analytics tools#generative AI for data analysis#AI-native spreadsheets#Excel alternatives for data analysis#financial modeling with spreadsheets#no-code spreadsheet solutions#predictive analytics in spreadsheets#cloud-native spreadsheets#predictive analytics#self-service analytics#spreadsheet API integration#anomaly detection#isolation forests#SQL#Stratum#SIMD#columnar analytics engine