Skip to content

CI Integration

Add Assay to your CI/CD pipeline for zero-flake AI agent testing.


Why CI Integration?

Traditional approach:

PR opened → Run LLM tests → Wait 3 minutes → Random failure → Retry → Trust erodes

With Assay:

PR opened → Replay traces → 3ms → Deterministic pass/fail → Trust restored

GitHub Actions

# .github/workflows/assay.yml
name: AI Agent Security

on:
  push:
    branches: [main]
  pull_request:

permissions:
  contents: read
  security-events: write
  pull-requests: write

jobs:
  assay:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run tests with Assay
        run: |
          curl -fsSL https://getassay.dev/install.sh | sh
          assay ci --config ci-eval.yaml --trace-file traces/ci.jsonl --sarif .assay/reports/sarif.json --junit .assay/reports/junit.xml

      - name: Verify AI agent behavior
        uses: Rul1an/assay-action@v2
        with:
          fail_on: error

Canonical public slug: Rul1an/assay-action@v2 (Marketplace). Internal monorepo contract tests may reference ./assay-action only inside this repository.

Action Inputs

Input Description Default
bundles Glob pattern for evidence bundles Auto-detect
fail_on Fail threshold: error, warn, info, none error
sarif Upload to GitHub Security tab true
comment_diff Post PR comment (only if findings) true
baseline_key Key for baseline comparison -
write_baseline Save baseline (main branch only) false

Action Outputs

Output Description
verified true if all bundles verified
findings_error Count of error-level findings
findings_warn Count of warning-level findings

SARIF Integration (Automatic)

The action automatically uploads SARIF results to GitHub Code Scanning. Findings appear in the Security tab and inline in PR diffs.

No manual SARIF upload step needed - just add security-events: write permission.


GitLab CI

# .gitlab-ci.yml
stages:
  - test

assay:
  stage: test
  image: rust:latest
  before_script:
    - cargo install assay
  script:
    - assay ci --config eval.yaml --trace-file traces/golden.jsonl --junit .assay/reports/junit.xml
    - assay ci --config eval.yaml --trace-file traces/golden.jsonl --sarif .assay/reports/sarif.json
  artifacts:
    reports:
      junit: .assay/reports/junit.xml
    when: always

GitLab Security Report (SARIF)

assay:
  script:
    - assay ci --config eval.yaml --trace-file traces/golden.jsonl --sarif .assay/reports/sarif.json
  artifacts:
    paths:
      - .assay/reports/sarif.json

Azure Pipelines

# azure-pipelines.yml
trigger:
  - main

pool:
  vmImage: 'ubuntu-latest'

steps:
  - script: cargo install assay
    displayName: 'Install Assay'

  - script: assay ci --config eval.yaml --trace-file traces/golden.jsonl --strict --junit .assay/reports/junit.xml
    displayName: 'Run Assay Tests'

  - task: PublishTestResults@2
    inputs:
      testResultsFormat: 'JUnit'
      testResultsFiles: '.assay/reports/junit.xml'
    condition: always()

CircleCI

# .circleci/config.yml
version: 2.1

jobs:
  assay:
    docker:
      - image: rust:latest
    steps:
      - checkout
      - run:
          name: Install Assay
          command: cargo install assay
      - run:
          name: Run Tests
          command: assay ci --config eval.yaml --trace-file traces/golden.jsonl --strict --junit .assay/reports/junit.xml
      - store_test_results:
          path: .assay/reports

workflows:
  version: 2
  test:
    jobs:
      - assay

Jenkins

// Jenkinsfile
pipeline {
    agent any

    stages {
        stage('Install Assay') {
            steps {
                sh 'cargo install assay'
            }
        }

        stage('Run Tests') {
            steps {
                sh 'assay ci --config eval.yaml --trace-file traces/golden.jsonl --junit .assay/reports/junit.xml'
            }
        }
    }

    post {
        always {
            junit '.assay/reports/junit.xml'
        }
    }
}

Docker-Based CI

For environments without Rust:

# Any CI system
steps:
  - run: |
      docker run --rm \
        -v $(pwd):/workspace \
        ghcr.io/rul1an/assay:latest \
        run --config /workspace/eval.yaml --strict

Best Practices

1. Store Golden Traces in Git

your-repo/
├── eval.yaml
├── policies/
│   └── discount.yaml
└── traces/
    └── golden.jsonl  # ← Commit this

2. Use fail_on for Strict Mode

- uses: Rul1an/assay-action@v2
  with:
    fail_on: warn  # Fail on warnings AND errors

3. Cache Cargo Installation

- uses: actions/cache@v3
  with:
    path: ~/.cargo
    key: cargo-${{ runner.os }}-assay

4. Run on Relevant Changes Only

on:
  push:
    paths:
      - 'agents/**'
      - 'prompts/**'
      - 'eval.yaml'

5. Separate Fast and Slow Tests

jobs:
  assay:
    # Evidence verification (fast)
    steps:
      - uses: actions/checkout@v4
      - uses: Rul1an/assay-action@v2

  integration:
    needs: assay
    # Real LLM tests (slow) — only if Assay passes
    steps:
      - run: pytest tests/integration

Debugging CI Failures

View Detailed Output

- run: assay doctor --config eval.yaml --trace-file traces/golden.jsonl

Download Artifacts

- uses: actions/upload-artifact@v3
  with:
    name: assay-reports
    path: .assay/reports/

Local Reproduction

# Same command as CI
assay ci --config eval.yaml --trace-file traces/golden.jsonl --strict --db :memory: --sarif .assay/reports/sarif.json --junit .assay/reports/junit.xml

Performance

Metric GitHub Actions GitLab CI
Install time ~60s (cached: 2s) ~60s
Test time (100 tests) ~50ms ~50ms
Total job time ~70s ~70s

Compare to LLM-based tests: 3-10 minutes, \(0.50-\)5.00 per run.


Next Steps