Back to catalog
Home/Skills/Model Architecture/rwkv-architecture
SKILL 01Model Architecture

rwkv-architecture

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

◆ Demo

Live demo — recorded in Claude

◆ Works with
Claude CodeCodexGemini CLICursor

One command install — skill auto-loads when your agent needs it.

◆ Output

What it produces

Coming soon — check SKILL.md for detailed output specification.

◆ Audience

Best for

Coming soon
◆ Example Prompts

Use it like this

Coming soon — check SKILL.md for example prompts
◆ Installation

How to use this skill

A
Path A: CLI

For developers & coding agents

npx ai-research-skills install rwkv
B
Path B: Upload to Claude

For Claude chat users (no coding required)

  1. Download SKILL.md from GitHub
  2. Upload to claude.ai/customize/skills
  3. Use in chat: /rwkv
More in Model Architecture
Related skills
implementing-llms-litgpt

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

mamba-architecture

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

nanogpt

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).

distributed-llm-pretraining-torchtitan

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

◆ Get this skill
rwkv-architecture
$npx ai-research-skills install rwkv
◆ Part of category
Model Architecture

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral).

Browse all skills →