A clear, structured overview of core ML topics, explained with lists, examples, and practical intuition.
Classical Machine Learning algorithms are the foundation of data science. They're fast, interpretable, and often the best starting point before trying complex deep learning.
Imagine you have data about house sizes and their prices. Linear Regression draws the best straight line through those points so you can predict the price of a new house just from its size. It's the simplest and most fundamental prediction method in all of ML.
The model finds the line that minimizes prediction errors (MSE = average of squared errors). Variants: Ridge adds a penalty to keep coefficients small (prevents overfitting), Lasso can set some coefficients to zero (automatic feature selection), ElasticNet combines both. Quality measured with R² (1.0 = perfect predictions).
Sometimes a straight line doesn't fit your data well because the relationship is curved (for example, a plant grows fast at first, then slows down). Polynomial Regression lets you fit curves instead of straight lines by adding squared or cubed terms to the equation.
Despite its name, Logistic Regression is used to classify things into categories (spam or not spam, approved or rejected). Instead of predicting a number, it outputs a probability between 0% and 100% that tells you how confident the model is about each category.
Trained with Cross-Entropy loss (penalizes confident wrong predictions heavily). For multiple categories, uses Softmax to distribute probabilities across classes. Key metrics: Precision (of predicted positives, how many are correct), Recall (of actual positives, how many were found), F1 (balance of both).
Picture two groups of dots on a page (cats vs dogs based on weight and height). SVM finds the line that separates them with the widest possible gap. A clever math trick called a 'kernel' lets it handle cases where the groups aren't separable by a straight line, by bending the boundary into curves.
The simplest ML idea: to classify a new data point, just look at its k closest neighbors in the dataset and go with the majority vote. If 3 out of 5 nearest neighbors are cats, predict cat. There's no actual training phase; it simply memorizes all the data and compares at prediction time.
Imagine each data point as a dot on a 2D map. If you have two features (say height and weight), each person becomes a point on that map. KNN literally measures the distance between your new point and every existing point, then looks at which class dominates among the k nearest ones. When you have more than 2 features (say 10 medical measurements), the concept is the same but in 10-dimensional space. We can't visualize 10D, but math works the same way. Tools like PCA or t-SNE can project those 10D points back onto a 2D plane so you can visually check if clusters are well-separated.
Uses probability rules (Bayes' theorem) to classify data, especially text. For example, if an email contains "free" and "winner", what's the probability it's spam? Called 'naive' because it assumes each word contributes independently, which is rarely true in real life but works surprisingly well in practice.
Works exactly like a flowchart of yes/no questions: "Is age > 30? Yes. Is income > 50k? No. Then predict: won't buy." The algorithm figures out which questions to ask and in what order to get the most accurate predictions. Easy to visualize and explain to anyone, even non-technical people.
Instead of relying on one decision tree (which can be unreliable), Random Forest builds hundreds of trees, each trained on a random portion of the data. To make a prediction, all trees vote and the majority wins. This 'wisdom of the crowd' approach is much more accurate and stable than any single tree alone.
Builds trees one after another, where each new tree specifically focuses on correcting the mistakes of the previous ones. Think of it like a team of students where each one studies what the previous student got wrong. This is the go-to method for tabular data (spreadsheets, databases) and dominates competitions and real-world production systems.
XGBoost: exact/histogram split, most battle-tested. LightGBM: leaf-wise growth, faster on large data, GOSS sampling. CatBoost: ordered boosting, best native categorical handling.
Groups similar data points together without any labels. You tell it how many groups (k) you want, and it figures out how to split the data. For example, given customer purchase data, it could automatically discover "budget shoppers", "premium buyers", and "occasional visitors" without you defining those categories.
Imagine dropping ink on paper and watching it spread: wherever there's a dense concentration of points packed together, DBSCAN calls that a cluster. Points sitting alone far away are flagged as outliers. Unlike K-Means, you don't need to specify the number of groups, and it can find clusters of any shape (circles, crescents, blobs).
When your data has hundreds of features (columns), it's impossible to visualize. These techniques compress all those dimensions into just 2 or 3, while keeping similar points close together. The result is a 2D scatter plot where you can actually see clusters, patterns, and outliers at a glance. Essential for understanding what's really going on in your data.
Deep learning uses neural networks with many stacked layers to automatically learn patterns from raw data. Instead of hand-crafting features (like in classical ML), you feed the raw data (pixels, text, audio) and the network discovers the relevant features by itself. Different architectures are designed for different types of data: images (CNNs), sequences (RNNs), and the Transformer, which now dominates nearly everything.
The classic neural network: layers of interconnected neurons that learn to map inputs to outputs. Think of it as a chain of simple math operations: each neuron takes numbers in, multiplies them by weights, adds them up, and passes the result through a function. Stack enough of these, and the network can learn incredibly complex patterns. Every modern deep learning model (CNNs, Transformers, etc.) has MLP components inside.
Training a neural network is a loop of three steps, repeated thousands of times:
After thousands of these loops (called epochs), the weights converge to values that make good predictions on new, unseen data.
A neural network specifically designed for images. Instead of connecting every neuron to every pixel (which would be millions of connections), CNNs use small sliding filters (e.g., 3x3 pixels) that scan across the image. The first layers detect simple patterns like edges and corners. Deeper layers combine those into textures, shapes, and finally recognizable objects like faces or cars. This hierarchical feature learning is what makes CNNs so powerful for anything visual.
Imagine a tiny 3x3 grid of numbers (the "filter" or "kernel") sliding across your image pixel by pixel. At each position, it multiplies the 9 overlapping pixels by the filter values and sums the result into a single number. This produces a new, smaller image called a "feature map" that highlights specific patterns the filter is looking for.
LeNet (1998): the original CNN, digit recognition. AlexNet (2012): won ImageNet, launched the deep learning revolution. ResNet (2015): skip connections enabling 100+ layer networks. EfficientNet (2019): smartly scales width, depth, and resolution together. ConvNeXt (2022): modernized CNN that competes with Vision Transformers by borrowing Transformer design ideas.
Neural networks with memory, designed for sequential data (text, time series, audio). They process inputs one step at a time, maintaining a hidden state that acts as a "memory" of what came before. When reading a sentence, each word updates the memory, so by the end, the network has a summary of the entire sequence. Largely replaced by Transformers for NLP, but still relevant for streaming and edge applications.
A basic RNN struggles with long sequences: as information passes through many time steps, the gradients used for learning become smaller and smaller (vanish), so the network "forgets" early inputs. LSTM (Long Short-Term Memory) solves this with a clever gating mechanism:
GRU (Gated Recurrent Unit) is a simplified version with only two gates (reset + update), making it faster to train with similar performance.
The architecture behind ChatGPT, BERT, Stable Diffusion, and virtually all modern AI. Introduced in 2017 with the paper "Attention Is All You Need," the Transformer replaced RNNs by allowing every element in the input to interact with every other element simultaneously through a mechanism called "self-attention." This parallelism made training much faster and enabled models to scale to billions of parameters.
Imagine reading the sentence: "The cat sat on the mat because it was tired." What does "it" refer to? You instantly know it's "the cat." Self-attention lets the model learn these connections:
Multi-head attention: run several attention patterns in parallel (e.g., 12 heads). One head might focus on grammar (subject-verb agreement), another on semantics (word meaning), another on position (nearby words). The outputs are concatenated and projected.
A single Transformer block repeats this pattern: Multi-Head Self-Attention → Add & Layer Norm (residual connection) → Feed-Forward Network (two-layer MLP) → Add & Layer Norm. A model like GPT-4 stacks ~120 of these blocks. The residual connections (skip connections, like ResNet) are critical: they allow gradients to flow directly through the network, enabling very deep models.
Three families of models that create new content (images, music, video, 3D). Each takes a fundamentally different approach to generation: GANs pit two networks against each other in a competition, VAEs learn a compressed representation and sample from it, and Diffusion models learn to gradually remove noise from a random image until a clean result emerges. Diffusion is the current state-of-the-art (Stable Diffusion, DALL-E 3, Midjourney).
Neural networks designed for data that's naturally a graph: nodes connected by edges. Social networks (people connected by friendships), molecules (atoms connected by bonds), maps (intersections connected by roads). Each node learns a representation by collecting and combining information from its neighbors, propagating knowledge across the graph.
In each layer, every node: (1) collects messages from its neighbors (their current representations), (2) aggregates them (sum, mean, or attention-weighted), (3) combines the result with its own representation to produce an updated one. After several layers, each node's representation encodes information from its extended neighborhood.
Computer Vision teaches machines to understand images and video. Modern systems combine deep neural networks with massive pretraining to achieve human-level (or better) performance on many visual tasks. The field has evolved from hand-crafted features (HOG, SIFT) to CNNs to Transformers, and now to foundation models that work zero-shot on tasks they've never been explicitly trained for.
ResNet introduced "skip connections" (shortcuts that let information bypass layers), solving the problem of training very deep networks (100+ layers). Without skip connections, deeper networks actually performed worse because gradients disappeared. ConvNeXt is a 2022 modernization: same CNN principles but borrowing design ideas from Transformers (larger kernels, fewer activations, LayerNorm), making it competitive with ViT.
These are backbone networks: the feature extraction engine used by detection, segmentation, and other downstream tasks. When you use YOLO or U-Net, the first half of the model is typically a ResNet or similar backbone that converts raw pixels into meaningful features.
Takes the Transformer architecture from NLP and applies it to images. The trick: cut the image into small patches (e.g., 16x16 pixels), flatten each patch into a vector, and treat them like words in a sentence. Then self-attention lets every patch interact with every other patch from the very first layer, giving the model global understanding of the whole image instantly.
YOLO (You Only Look Once) detects and locates objects in images in a single forward pass, fast enough for real-time video (30-300+ FPS). Unlike older two-stage detectors (Faster R-CNN) that first propose regions then classify them, YOLO predicts bounding boxes and classes simultaneously across the whole image. The most popular and battle-tested choice for production object detection.
The image is divided into a grid. For each cell, the model predicts: (1) bounding box coordinates (x, y, width, height), (2) confidence score (is there an object here?), and (3) class probabilities (what object is it?). Non-Maximum Suppression (NMS) removes duplicate detections. Modern versions (YOLOv8+) are anchor-free, simplifying the pipeline.
Applies the Transformer to object detection with a clean, elegant design: no anchors, no NMS, no hand-crafted rules. The model uses learned "object queries" that each attend to different parts of the image and directly output detections. RT-DETR (Real-Time DETR) now matches YOLO speed with better accuracy on some benchmarks, making it a serious production alternative.
Assigns a class label to every single pixel in an image (this pixel is "road", this one is "car", this one is "sky"). U-Net has an encoder-decoder architecture with skip connections that preserve fine details, making it excellent on small datasets like medical scans. SegFormer brings Transformer efficiency to dense pixel-level predictions with multi-scale features.
SAM (Segment Anything Model) by Meta is a foundation model for segmentation: click a point, draw a box, or type text on any image and it segments the object with zero training. It was trained on 11 million images with 1 billion masks. Mask2Former unifies all segmentation types (semantic, instance, panoptic) into a single architecture.
Trains an image encoder and a text encoder together so they produce similar vectors for matching image-text pairs. The result: images and text live in the same embedding space. You can search images with text ("a red sports car at sunset"), classify images into any category without training data, or use it as the "eyes" for a multimodal LLM like LLaVA or GPT-4V.
During training, CLIP sees millions of (image, caption) pairs from the internet. For each batch, it computes image embeddings and text embeddings, then pushes matching pairs close together in vector space and non-matching pairs apart. After training, the model can compare any image to any text description by measuring the distance between their vectors.
Choosing the right metric is as important as choosing the right model. A metric tells you what "good" means for your specific task. Using the wrong metric can make a terrible model look great (e.g., 99% accuracy on a dataset where 99% of samples are one class).
The loss function defines what the model optimizes during training. It's the mathematical expression of "how wrong is this prediction?" Different tasks need different loss functions, and choosing the right one directly impacts model quality. Understanding why each loss works helps you debug training issues.
Deploying a CV model is only half the job. In production, image distributions shift (new cameras, lighting changes, seasonal differences), and model performance silently degrades if you're not monitoring. A structured monitoring pipeline catches issues before users do.
Natural Language Processing (NLP) teaches machines to understand, generate, and reason about human language. Today, Large Language Models (LLMs) power most NLP applications, but the ecosystem is much broader: encoder models for understanding, embedding models for search, and retrieval pipelines for grounding answers in real data.
Assign a label to a piece of text: is this email spam? Is this review positive or negative? What topic does this article cover? Text classification is one of the most common NLP tasks in production, with approaches ranging from simple keyword rules to fine-tuned Transformers.
Named Entity Recognition finds and labels key information in text: names of people, companies, places, dates, amounts. It's essential for turning unstructured text (contracts, resumes, news articles, invoices) into structured data you can store in a database.
Embedding models convert text into numerical vectors (arrays of numbers, typically 768-1024 dimensions) that capture meaning. Texts with similar meanings end up as vectors that are close together in this high-dimensional space. This is the foundation of semantic search: instead of matching exact keywords, you find texts that mean the same thing.
Each sentence is transformed into a vector (e.g., 1024 numbers). Think of it as coordinates in a 1024-dimensional space. While we can't visualize 1024D, the principle is the same as 2D/3D: similar texts cluster together, different texts are far apart. You can use UMAP or t-SNE to project these vectors onto a 2D plane and literally see clusters of related documents. When a user asks a question, you embed it into the same space and find the nearest document vectors using cosine similarity or dot product.
Qdrant (Rust, HNSW+quantization, hybrid search, used in MIRROR). Pinecone (fully managed, serverless). Weaviate (modular, GraphQL). ChromaDB (simple, great for prototyping). pgvector (PostgreSQL extension, use your existing DB).
BERT reads text in both directions simultaneously (bidirectional), giving it deep understanding of context. Unlike LLMs which generate text left-to-right, BERT is designed to understand text: classify it, extract information from it, compare sentences. It's small (110M params), fast (runs on CPU), and still the go-to model for many production NLP tasks where you need understanding, not generation.
BERT is a Transformer encoder pretrained on two tasks: (1) Masked Language Modeling: randomly hide 15% of words and predict them from surrounding context (both left and right). This forces the model to deeply understand language. (2) Next Sentence Prediction: determine if two sentences follow each other naturally. After pretraining, you add a small classification head on top and fine-tune on your specific task with as few as 500 labeled examples.
RoBERTa: same architecture, better pretraining (more data, longer training, no NSP task). DeBERTa: improved attention with disentangled position encoding, often outperforms RoBERTa. CamemBERT: French BERT, pretrained on French web data. DistilBERT: 40% smaller, 60% faster, retains 97% of BERT's accuracy.
Large Language Models (ChatGPT, Claude, Llama, Phi, Qwen) are massive Transformer decoder networks trained to predict the next word. By reading billions of documents, they develop an understanding of language, facts, and reasoning. After pretraining, they're refined with human feedback (RLHF/DPO) so they actually follow instructions and give helpful, safe answers instead of just completing text.
Step 1: Pretraining reads trillions of words from the internet, learning to predict the next word. This gives the model broad knowledge of language, facts, and reasoning patterns. Step 2: Instruction tuning (SFT) fine-tunes on curated instruction/response pairs so the model learns to follow instructions instead of just completing text. Step 3: Alignment (RLHF/DPO) trains with human preference data to be helpful, harmless, and honest. Step 4: Quantization compresses the model (GGUF/GPTQ/AWQ) to run on smaller hardware without significant quality loss.
Adapt a pre-trained LLM to your specific domain by training only a tiny fraction of its parameters (~1%). Instead of updating all billions of weights, LoRA injects small trainable matrices ("adapters") into each layer. QLoRA loads the frozen model in 4-bit, making fine-tuning possible on a single consumer GPU (24GB VRAM for a 70B model).
The original weight matrix W (e.g., 4096x4096) is frozen. LoRA adds two small matrices: A (4096x16) and B (16x4096), where 16 is the "rank." The output becomes W*x + A*B*x. Only A and B are trained, which is a tiny fraction of the total parameters. After training, A*B can be merged back into W, so inference has zero additional cost.
QLoRA goes further: the frozen W is loaded in 4-bit NF4 precision (instead of 16-bit), cutting base model memory by 4x. The adapters A and B are still trained in FP16/BF16 for precision. This means a 70B model that normally needs 140GB VRAM can be fine-tuned in ~40GB.
1. Prepare 500-10k instruction/response pairs. 2. Choose a base model (Llama, Phi, Qwen). 3. Configure LoRA: rank (8-64), alpha (2x rank), target modules (attention layers). 4. Train 1-3 epochs with QLoRA via Unsloth or PEFT. 5. Evaluate on held-out test set + human review. 6. Merge adapter into base model and quantize (GGUF) for deployment.
RAG (Retrieval Augmented Generation) gives an LLM access to your own documents. When a user asks a question, the system first searches for relevant passages in your document index, then feeds those passages to the LLM along with the question. The LLM writes an answer grounded in your actual data, with citations. This is how most enterprise AI assistants handle company-specific knowledge without retraining the model.
Always use hybrid search (dense + sparse) for better recall on technical queries. Cache embeddings for frequent queries. Cap top-k after reranking (3-5) to control latency and cost. Stream tokens for responsive UX. Monitor retrieval quality (MRR, recall@k) and user feedback. Add fallback to direct LLM chat when no relevant documents are found.
Before any model can process text, it must be split into "tokens" (small pieces: words, subwords, or characters). The tokenizer converts text into a sequence of integer IDs that the model understands. How text is tokenized affects everything: cost (more tokens = higher price), speed (longer sequences = slower), and quality (non-English languages often need more tokens per word, reducing effective context).
Evaluating LLMs is notoriously difficult because there's no single metric that captures "quality." You need a combination: standardized benchmarks for model selection, golden test sets for regression testing, LLM-as-judge for scalable scoring, and human evaluation for final validation. In production, user feedback and monitoring complete the picture.
Protecting LLM applications from misuse and failure in production: preventing prompt injection attacks, filtering harmful outputs, detecting hallucinations, and adding guardrails to keep the AI safe, honest, and on-topic.
MLOps bridges the gap between building ML models and running them reliably in production. It borrows practices from DevOps (CI/CD, monitoring, infrastructure as code) and applies them to the unique challenges of machine learning: model versioning, data drift, GPU resource management, and continuous retraining.
Docker packages your application and all its dependencies (Python, CUDA, libraries, model files) into a single portable container that runs identically on any machine. No more "it works on my machine" problems. Every ML service in production should be containerized.
When your application has multiple services (API server, vector database, embedding model, LLM, reverse proxy), Docker Compose lets you define and run them all together in a single YAML file. One command starts everything with the right networking, volumes, and environment variables.
Kubernetes is the industry standard for running containers at scale across multiple servers. It automatically handles load balancing, scaling (up and down based on demand), self-healing (restarts crashed containers), rolling updates (zero-downtime deploys), and GPU scheduling. Essential when you need to serve ML models to hundreds or thousands of concurrent users.
FastAPI is the go-to Python framework for building production ML APIs. It's async (handles many concurrent requests), generates OpenAPI docs automatically, and has built-in data validation with Pydantic. For LLM serving specifically, vLLM and TGI (Text Generation Inference) provide optimized inference servers with OpenAI-compatible APIs.
The industry standard for collecting and storing time-series metrics. Prometheus scrapes /metrics HTTP endpoints from your services at regular intervals, stores the data efficiently, and fires alerts when thresholds are breached. Used by virtually every Kubernetes cluster and production ML system.
Use Prometheus whenever you need to answer "how is my system performing right now?" It's the source of truth for latency, throughput, error rates, and resource utilization. For ML: track tokens/sec, time-to-first-token, GPU memory, queue depth, and model confidence distributions.
rate(http_requests_total[5m]) gives requests/sec over 5 minutes.The visualization layer for your entire infrastructure. Grafana connects to Prometheus (metrics), Loki (logs), and dozens of other data sources to create beautiful, interactive dashboards. For ML teams: build dashboards showing model latency, GPU utilization, request throughput, error rates, and data drift all in one view.
Use Grafana whenever you need a visual overview of system health. It's the single pane of glass where everyone (engineers, PMs, execs) checks production status. Create team-specific dashboards: one for infra (CPU, memory, disk), one for ML (model performance, drift), one for business (requests, users, cost).
Log aggregation system designed by Grafana Labs. Like Prometheus but for logs: lightweight, label-based indexing (not full-text), and native Grafana integration. Collects logs from all your containers and lets you search, filter, and correlate them with metrics on the same timeline.
Use Loki when you need to debug issues by reading logs across services. "The model returned an error at 14:32 — what happened?" Loki lets you jump from a Grafana alert to the exact logs in the same dashboard. Much cheaper than Elasticsearch for log storage because it only indexes labels (service, level, pod), not the full text.
{job="mirror"} |= "error" finds all error logs from the mirror service.Specialized observability for LLM applications. While Prometheus monitors infrastructure, Langfuse and LangSmith trace the AI layer: every prompt, context, tool call, response, and latency. Essential for debugging RAG quality, hallucinations, and cost optimization.
Use LLM tracing tools as soon as you deploy any LLM-powered feature. Without them, you're blind to prompt quality, retrieval relevance, hallucination rate, and per-request cost. They let you replay any conversation, see what context was retrieved, and score response quality.
CI/CD built directly into GitHub. Define workflows in YAML that trigger on push, PR, schedule, or manual dispatch. Free for open source. The most popular choice for teams already on GitHub — runs lint, tests, Docker builds, and deployments without any external infrastructure.
Use GitHub Actions as your default CI/CD if your code lives on GitHub. It's zero-setup (no server to manage), has a massive marketplace of reusable actions, and integrates natively with GitHub PRs (status checks, comments, deploy environments). For ML: trigger model evaluation on every PR, build and push Docker images, deploy to staging.
The veteran CI/CD server, self-hosted and extremely customizable. Jenkins has been the backbone of enterprise CI/CD for 15+ years. Groovy-based pipelines, thousands of plugins, and complete control over the build environment. Best for on-prem, air-gapped, or complex multi-team enterprise workflows.
Use Jenkins when you need full control: on-premises infrastructure, strict security requirements (air-gapped networks), complex approval chains, or legacy systems integration. It's also the choice when you need GPU build agents with custom CUDA setups that cloud CI can't provide easily.
GitOps continuous delivery for Kubernetes. The desired state of your cluster (all YAML manifests, Helm charts, Kustomize configs) lives in a Git repo. ArgoCD watches that repo and automatically syncs changes to the cluster. Push to Git = deploy to production. No manual kubectl apply.
Use ArgoCD when you run Kubernetes and want declarative, auditable deployments. Every change goes through a PR, gets reviewed, and is automatically applied. Rollback = git revert. For ML: update model image tag in Git, ArgoCD deploys the new version with a rolling update or canary strategy.
The most popular open-source workflow orchestrator. Define data and ML pipelines as DAGs (directed acyclic graphs) in pure Python. Schedule, retry, monitor, and backfill. Used by Airbnb, Spotify, and thousands of companies to manage ETL, training pipelines, and data quality checks.
Use Airflow when you need to orchestrate multi-step workflows that run on a schedule or on trigger: data ingestion → validation → feature engineering → model training → evaluation → deployment. It's the glue that connects all your tools into a reliable, monitored pipeline.
Prefect: modern Python-native with better DX (decorators, built-in retries, UI). Dagster: asset-centric, strong typing, great for data-aware pipelines. Kubeflow Pipelines: Kubernetes-native, designed for ML workflows with GPU support.
Track every experiment you run: hyperparameters, metrics, artifacts, and model versions. Without experiment tracking, you're lost after a few dozen training runs ("which config gave the best F1?"). MLflow is open-source and self-hosted, Weights & Biases (W&B) is cloud-hosted with richer visualization.
Use experiment tracking from day 1 of any ML project. It pays off immediately: compare runs side-by-side, reproduce results, share with the team. MLflow also includes a model registry (version, stage, approve) and a deployment component. W&B adds real-time dashboards, hyperparameter sweeps, and collaborative reports.
Choosing the right database is one of the most impactful architectural decisions in any project. Each type of database excels at specific access patterns and fails at others. Understanding these tradeoffs helps you pick the right tool instead of forcing one database to do everything.
The most advanced open-source relational database. Supports structured data with strong consistency (ACID transactions), complex queries (JOINs, CTEs, window functions), and extensions for nearly anything: full-text search, JSON, geospatial (PostGIS), and even vector search (pgvector). If you're unsure which database to pick, PostgreSQL is almost always a safe default.
User accounts, application state, metadata, configuration, anything with relationships. For ML: store experiment metadata, model registry entries, user feedback, evaluation results. With pgvector: can even serve as a simple vector database for small-scale RAG.
A serverless, file-based SQL database embedded directly in your application. No separate database server needed: the entire database is a single file on disk. Perfect for local apps, prototypes, mobile apps, and anywhere you need SQL without the overhead of a full database server. Used in MIRROR for local data storage.
Stores data as flexible JSON-like documents (BSON) instead of rigid tables with fixed columns. Each document can have a different structure, making it great for evolving schemas, nested data, and rapid prototyping. Very popular for web applications where the data model changes frequently.
An in-memory key-value store that is incredibly fast (sub-millisecond latency). Used primarily as a cache (store frequently accessed data in memory instead of hitting the database), session store, rate limiter, message broker, and real-time leaderboard. Essential in any high-performance system.
Purpose-built databases for storing and searching high-dimensional vectors (embeddings). Essential for RAG, semantic search, recommendation systems, and any application that needs to find "similar" items. They use approximate nearest neighbor (ANN) algorithms like HNSW to search billions of vectors in milliseconds.
pgvector: <1M vectors, already using PostgreSQL, simple use case. Dedicated: >1M vectors, need advanced features (hybrid search, quantization, sharding), or vector search is your primary access pattern.
When your database can't handle the load, you have two fundamental options: make the machine bigger (vertical) or add more machines (horizontal). This is one of the most important architectural decisions because it affects cost, complexity, consistency guarantees, and which databases you can use.
A distributed database can only guarantee two of three: Consistency (all nodes see the same data), Availability (every request gets a response), Partition tolerance (system works despite network failures). Since network partitions are unavoidable in production, you're really choosing between CP (consistent but may reject requests: PostgreSQL, CockroachDB) and AP (always available but may return stale data: Cassandra, DynamoDB).
A cloud-native data warehouse that separates storage and compute, allowing you to scale each independently. SQL-based, fully managed, supports semi-structured data (JSON, Parquet). Widely used for analytics, feature engineering, and as the data layer feeding ML pipelines. Snowpark lets you run Python ML code directly inside the warehouse.
Use Snowflake (or BigQuery/Redshift) when you need to analyze terabytes of data with SQL, build feature pipelines for ML, or serve analytics dashboards. The separation of storage and compute means you only pay for compute when queries run. Not for low-latency transactional workloads (use PostgreSQL for that).
Specialized databases optimized for specific access patterns. TimescaleDB is a PostgreSQL extension for time-series data (metrics, IoT, logs) with automatic partitioning and compression. Neo4j is a graph database for data with complex relationships (social networks, knowledge graphs, fraud detection) where traversal queries need to be fast.