Files
mai-bot/agentlite/TEST_PLAN.md
2026-04-03 22:15:53 +08:00

34 KiB

AgentLite Comprehensive Test Suite Plan

Context

AgentLite is a lightweight, async-first Agent component library for LLM applications. It provides:

  • Agent: Main agent class with tool calling loop and streaming support
  • OpenAIProvider: OpenAI-compatible provider implementation
  • Tool System: @tool decorator, CallableTool, CallableTool2, SimpleToolset
  • MCPClient: MCP server integration
  • Message Types: ContentPart, Message, ToolCall, etc.
  • Configuration: Pydantic-based config models

Test Location

/home/tcmofashi/proj/general_agent/agentlite/tests/

Task Dependency Graph

Task Depends On Reason
1. Test Configuration Setup None Foundation for all tests
2. Message Types Unit Tests Task 1 Core data structures
3. Tool System Unit Tests Task 1 Core tool abstractions
4. Configuration Unit Tests Task 1 Config validation
5. Provider Protocol Unit Tests Task 1 Provider interface
6. Mock Provider Implementation Task 1 Required for integration tests
7. Agent Integration Tests Tasks 2, 3, 6 Tests agent with mocked provider
8. Tool Calling Loop Tests Tasks 3, 6 Tests tool execution flow
9. Streaming Response Tests Tasks 2, 6 Tests streaming functionality
10. Conversation History Tests Task 7 Tests history management
11. Real-World Scenario: Data Quality Agent Tasks 7, 8 Practical use case
12. Real-World Scenario: Fact-Checking Agent Tasks 7, 8 Practical use case
13. Real-World Scenario: Multi-Agent Workflow Tasks 7, 10 Practical use case
14. MCP Mock Tests Tasks 3, 6 Tests MCP integration with mocks
15. Error Handling Tests Tasks 6, 7 Tests error scenarios
16. Test Coverage Analysis All above Verify coverage targets

Parallel Execution Graph

Wave 1 (Foundation - Start immediately):
├── Task 1: Test Configuration Setup
├── Task 2: Message Types Unit Tests
├── Task 3: Tool System Unit Tests
├── Task 4: Configuration Unit Tests
├── Task 5: Provider Protocol Unit Tests
└── Task 6: Mock Provider Implementation

Wave 2 (Core Integration - After Wave 1):
├── Task 7: Agent Integration Tests (depends: 1, 2, 3, 6)
├── Task 8: Tool Calling Loop Tests (depends: 3, 6)
└── Task 9: Streaming Response Tests (depends: 2, 6)

Wave 3 (Advanced Features - After Wave 2):
├── Task 10: Conversation History Tests (depends: 7)
├── Task 14: MCP Mock Tests (depends: 3, 6)
└── Task 15: Error Handling Tests (depends: 6, 7)

Wave 4 (Real-World Scenarios - After Wave 3):
├── Task 11: Data Quality Agent Scenario (depends: 7, 8)
├── Task 12: Fact-Checking Agent Scenario (depends: 7, 8)
└── Task 13: Multi-Agent Workflow Scenario (depends: 7, 10)

Wave 5 (Finalization - After Wave 4):
└── Task 16: Test Coverage Analysis (depends: all)

Critical Path: Task 1 → Task 6 → Task 7 → Task 10 → Task 13 → Task 16
Parallel Speedup: ~60% faster than sequential execution

Tasks

Task 1: Test Configuration Setup

Description: Create pytest configuration, conftest.py with shared fixtures, and test utilities.

Delegation Recommendation:

  • Category: quick - Configuration setup is straightforward
  • Skills: [python-programmer] - Python testing infrastructure knowledge

Skills Evaluation:

  • INCLUDED python-programmer: Required for pytest configuration and fixture design
  • OMITTED git-master: No git operations needed for this task
  • OMITTED frontend-ui-ux: No UI work involved

Depends On: None

Acceptance Criteria:

  • pytest.ini configured with asyncio mode
  • conftest.py with shared fixtures (mock_provider, sample_messages, temp_agent)
  • Test utilities module for common assertions
  • All tests can be run with pytest tests/

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/conftest.py
  • /home/tcmofashi/proj/general_agent/agentlite/tests/utils.py

Commit: YES

  • Message: test: setup pytest configuration and shared fixtures
  • Files: tests/conftest.py, tests/utils.py

Task 2: Message Types Unit Tests

Description: Test all message types: ContentPart, TextPart, ImageURLPart, AudioURLPart, ToolCall, ToolCallPart, Message.

Delegation Recommendation:

  • Category: quick - Unit tests for data structures
  • Skills: [python-programmer] - Python testing patterns

Skills Evaluation:

  • INCLUDED python-programmer: Required for writing unit tests
  • OMITTED frontend-ui-ux: No UI involved

Depends On: Task 1

Acceptance Criteria:

  • ContentPart polymorphic validation works correctly
  • TextPart merge_in_place works for streaming
  • ToolCall merge_in_place works with ToolCallPart
  • Message content coercion from string works
  • Message.extract_text() returns correct text
  • Message.has_tool_calls() returns correct boolean
  • All edge cases covered (empty content, None values)

Test Cases:

  1. test_content_part_registry - Verify subclass registration
  2. test_text_part_creation - Basic TextPart instantiation
  3. test_text_part_merge - Streaming text merge
  4. test_image_url_part - ImageURLPart creation and serialization
  5. test_audio_url_part - AudioURLPart creation and serialization
  6. test_tool_call_creation - ToolCall instantiation
  7. test_tool_call_merge - ToolCall merging with ToolCallPart
  8. test_message_string_content - Message with string content coercion
  9. test_message_list_content - Message with list of ContentParts
  10. test_message_extract_text - Text extraction from mixed content
  11. test_message_has_tool_calls - Tool call detection
  12. test_message_serialization - Pydantic model_dump works

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_message.py

Commit: YES

  • Message: test: add unit tests for message types
  • Files: tests/unit/test_message.py

Task 3: Tool System Unit Tests

Description: Test tool system: Tool, CallableTool, CallableTool2, SimpleToolset, @tool decorator, ToolResult types.

Delegation Recommendation:

  • Category: unspecified-low - Moderate complexity with async patterns
  • Skills: [python-programmer] - Python async testing

Skills Evaluation:

  • INCLUDED python-programmer: Required for async tool testing
  • OMITTED frontend-ui-ux: No UI involved

Depends On: Task 1

Acceptance Criteria:

  • Tool JSON schema validation works
  • CallableTool validates arguments against schema
  • CallableTool2 uses Pydantic for validation
  • SimpleToolset manages tools correctly
  • @tool decorator creates valid tools
  • Tool execution handles errors gracefully
  • Async tool execution works correctly

Test Cases:

  1. test_tool_schema_validation - Invalid schema raises ValueError
  2. test_tool_ok_result - ToolOk creation and properties
  3. test_tool_error_result - ToolError creation and properties
  4. test_callable_tool_validation - Argument validation against schema
  5. test_callable_tool_execution - Successful tool execution
  6. test_callable_tool_error_handling - Exception handling in tools
  7. test_callable_tool2_pydantic_validation - Pydantic model validation
  8. test_callable_tool2_execution - Type-safe tool execution
  9. test_simple_toolset_add_remove - Tool management
  10. test_simple_toolset_handle - Tool call handling
  11. test_simple_toolset_tool_not_found - Missing tool error
  12. test_tool_decorator_basic - @tool creates valid tool
  13. test_tool_decorator_with_params - @tool with custom name/description
  14. test_tool_decorator_type_hints - Type hint to schema conversion
  15. test_tool_concurrent_execution - Multiple tools execute concurrently

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_tool.py

Commit: YES

  • Message: test: add unit tests for tool system
  • Files: tests/unit/test_tool.py

Task 4: Configuration Unit Tests

Description: Test Pydantic configuration models: ProviderConfig, ModelConfig, ToolConfig, AgentConfig.

Delegation Recommendation:

  • Category: quick - Pydantic model validation tests
  • Skills: [python-programmer] - Pydantic testing

Skills Evaluation:

  • INCLUDED python-programmer: Required for Pydantic validation tests

Depends On: Task 1

Acceptance Criteria:

  • ProviderConfig validates base_url format
  • ProviderConfig stores api_key as SecretStr
  • ModelConfig validates temperature range
  • ModelConfig validates provider is not empty
  • AgentConfig validates default_model exists in models
  • AgentConfig validates all model providers exist
  • get_provider_config and get_model_config work correctly

Test Cases:

  1. test_provider_config_validation - Valid config creation
  2. test_provider_config_invalid_url - Invalid base_url raises error
  3. test_provider_config_secret_str - API key is SecretStr
  4. test_model_config_validation - Valid model config
  5. test_model_config_temperature_range - Temperature bounds checking
  6. test_model_config_empty_provider - Empty provider raises error
  7. test_agent_config_validation - Valid agent config
  8. test_agent_config_missing_default_model - Missing default_model raises error
  9. test_agent_config_unknown_provider - Unknown provider raises error
  10. test_agent_config_get_provider - get_provider_config works
  11. test_agent_config_get_model - get_model_config works

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_config.py

Commit: YES

  • Message: test: add unit tests for configuration models
  • Files: tests/unit/test_config.py

Task 5: Provider Protocol Unit Tests

Description: Test provider protocol and exception types: ChatProvider, StreamedMessage, TokenUsage, exception hierarchy.

Delegation Recommendation:

  • Category: quick - Protocol and exception testing
  • Skills: [python-programmer] - Python protocol testing

Skills Evaluation:

  • INCLUDED python-programmer: Required for protocol testing

Depends On: Task 1

Acceptance Criteria:

  • TokenUsage calculates total correctly
  • Exception hierarchy is correct
  • APIStatusError stores status_code
  • ChatProvider protocol can be implemented

Test Cases:

  1. test_token_usage_total - Total token calculation
  2. test_token_usage_defaults - Default cached_tokens = 0
  3. test_chat_provider_error_base - Base exception class
  4. test_api_connection_error - APIConnectionError creation
  5. test_api_timeout_error - APITimeoutError creation
  6. test_api_status_error - APIStatusError with status_code
  7. test_api_empty_response_error - APIEmptyResponseError creation
  8. test_chat_provider_protocol - Protocol implementation check

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_provider.py

Commit: YES

  • Message: test: add unit tests for provider protocol
  • Files: tests/unit/test_provider.py

Task 6: Mock Provider Implementation

Description: Create a comprehensive mock provider for testing that simulates OpenAI API responses without real API calls.

Delegation Recommendation:

  • Category: unspecified-low - Requires understanding of streaming and async patterns
  • Skills: [python-programmer] - Async generator implementation

Skills Evaluation:

  • INCLUDED python-programmer: Required for mock provider implementation

Depends On: Task 1

Acceptance Criteria:

  • MockProvider implements ChatProvider protocol
  • Can simulate text responses
  • Can simulate tool calls
  • Can simulate streaming responses
  • Can simulate errors
  • Configurable response sequences
  • Tracks calls for verification

Implementation Details:

class MockProvider:
    """Mock provider for testing.
    
    Usage:
        provider = MockProvider()
        provider.add_response("Hello!")
        provider.add_tool_call("add", {"a": 1, "b": 2}, "3")
        
        agent = Agent(provider=provider)
        response = await agent.run("Hi")
        
        assert provider.calls == [...]
    """

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/mocks/provider.py

Commit: YES

  • Message: test: add mock provider for testing
  • Files: tests/mocks/provider.py

Task 7: Agent Integration Tests

Description: Test Agent class with mocked provider: initialization, run(), generate(), history management.

Delegation Recommendation:

  • Category: unspecified-low - Integration testing with async
  • Skills: [python-programmer] - Async integration testing

Skills Evaluation:

  • INCLUDED python-programmer: Required for agent integration testing

Depends On: Tasks 1, 2, 3, 6

Acceptance Criteria:

  • Agent initializes correctly with provider
  • Agent.run() returns string response
  • Agent.run(stream=True) returns async iterator
  • Agent.generate() returns Message
  • Agent adds messages to history
  • Agent.clear_history() clears history
  • Agent respects max_iterations

Test Cases:

  1. test_agent_initialization - Basic agent creation
  2. test_agent_with_tools - Agent with toolset
  3. test_agent_run_simple - Simple non-streaming run
  4. test_agent_run_streaming - Streaming response
  5. test_agent_generate - Generate without tool loop
  6. test_agent_history_tracking - Messages added to history
  7. test_agent_clear_history - History cleared correctly
  8. test_agent_max_iterations - Respects iteration limit
  9. test_agent_system_prompt - System prompt used

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_agent.py

Commit: YES

  • Message: test: add agent integration tests
  • Files: tests/integration/test_agent.py

Task 8: Tool Calling Loop Tests

Description: Test the complete tool calling loop: agent requests tool, tool executes, result returned.

Delegation Recommendation:

  • Category: unspecified-low - Complex async flow testing
  • Skills: [python-programmer] - Async flow testing

Skills Evaluation:

  • INCLUDED python-programmer: Required for tool loop testing

Depends On: Tasks 3, 6

Acceptance Criteria:

  • Agent calls tool when requested by LLM
  • Tool result is added to history
  • Agent continues conversation after tool result
  • Multiple tool calls in one response handled
  • Tool errors are handled gracefully
  • Tool calls are concurrent

Test Cases:

  1. test_single_tool_call - One tool call in conversation
  2. test_multiple_tool_calls - Multiple tools in one response
  3. test_tool_call_chain - Sequential tool calls
  4. test_tool_error_handling - Tool returns error
  5. test_tool_not_found - Unknown tool requested
  6. test_tool_concurrent_execution - Tools execute concurrently
  7. test_tool_result_in_history - Tool results in conversation history
  8. test_tool_call_with_arguments - Arguments passed correctly

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_tool_loop.py

Commit: YES

  • Message: test: add tool calling loop tests
  • Files: tests/integration/test_tool_loop.py

Task 9: Streaming Response Tests

Description: Test streaming responses: text streaming, tool call streaming, mixed content.

Delegation Recommendation:

  • Category: unspecified-low - Async streaming testing
  • Skills: [python-programmer] - Async generator testing

Skills Evaluation:

  • INCLUDED python-programmer: Required for streaming testing

Depends On: Tasks 2, 6

Acceptance Criteria:

  • Text streams in chunks
  • Tool calls stream correctly
  • Mixed content (text + tool) streams correctly
  • Complete response can be reconstructed
  • Streaming works with tool calling loop

Test Cases:

  1. test_stream_text_only - Simple text streaming
  2. test_stream_tool_call - Tool call streaming
  3. test_stream_mixed_content - Text then tool call
  4. test_stream_reconstruction - Rebuild full response
  5. test_stream_with_tool_loop - Streaming in tool loop
  6. test_stream_empty_response - Empty stream handling

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_streaming.py

Commit: YES

  • Message: test: add streaming response tests
  • Files: tests/integration/test_streaming.py

Task 10: Conversation History Tests

Description: Test conversation history management: message ordering, role tracking, history limits.

Delegation Recommendation:

  • Category: quick - History management testing
  • Skills: [python-programmer] - State management testing

Skills Evaluation:

  • INCLUDED python-programmer: Required for history testing

Depends On: Task 7

Acceptance Criteria:

  • Messages added in correct order
  • Roles tracked correctly (user, assistant, tool)
  • Tool call IDs preserved
  • History can be inspected
  • History can be cleared
  • History persists across multiple runs

Test Cases:

  1. test_history_message_order - Messages in correct order
  2. test_history_roles - Correct role tracking
  3. test_history_tool_responses - Tool call IDs preserved
  4. test_history_persistence - History across multiple runs
  5. test_history_clear - Clear history works
  6. test_history_manual_add - Manually add messages
  7. test_history_copy - history property returns copy

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_history.py

Commit: YES

  • Message: test: add conversation history tests
  • Files: tests/integration/test_history.py

Task 11: Real-World Scenario - Data Quality Agent

Description: Test a realistic data quality improvement agent that validates and cleans data.

Delegation Recommendation:

  • Category: unspecified-high - Complex scenario testing
  • Skills: [python-programmer] - Complex test scenario design

Skills Evaluation:

  • INCLUDED python-programmer: Required for scenario implementation

Depends On: Tasks 7, 8

Acceptance Criteria:

  • Agent validates data format
  • Agent identifies data quality issues
  • Agent suggests corrections
  • Uses multiple tools (validate, clean, analyze)
  • Handles edge cases (empty data, invalid format)

Scenario:

# Data Quality Agent validates CSV data
# Tools: validate_csv, detect_anomalies, suggest_fixes
# Test with sample data containing errors

Test Cases:

  1. test_data_quality_valid_data - Clean data passes validation
  2. test_data_quality_detects_errors - Errors detected and reported
  3. test_data_quality_suggests_fixes - Corrections suggested
  4. test_data_quality_empty_data - Handles empty input
  5. test_data_quality_invalid_format - Handles format errors

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/scenarios/test_data_quality.py

Commit: YES

  • Message: test: add data quality agent scenario tests
  • Files: tests/scenarios/test_data_quality.py

Task 12: Real-World Scenario - Fact-Checking Agent

Description: Test a fact-checking agent that verifies claims using tools.

Delegation Recommendation:

  • Category: unspecified-high - Complex scenario testing
  • Skills: [python-programmer] - Complex test scenario design

Skills Evaluation:

  • INCLUDED python-programmer: Required for scenario implementation

Depends On: Tasks 7, 8

Acceptance Criteria:

  • Agent extracts claims from text
  • Agent uses search tool to verify
  • Agent provides verdict with evidence
  • Handles uncertain claims appropriately
  • Multiple claims in one text handled

Scenario:

# Fact-Checking Agent verifies statements
# Tools: search_facts, calculate_statistics, check_date
# Test with verifiable and unverifiable claims

Test Cases:

  1. test_fact_check_true_claim - Correctly identifies true claim
  2. test_fact_check_false_claim - Correctly identifies false claim
  3. test_fact_check_multiple_claims - Multiple claims in one text
  4. test_fact_check_uncertain - Handles uncertain claims
  5. test_fact_check_with_evidence - Provides supporting evidence

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/scenarios/test_fact_checking.py

Commit: YES

  • Message: test: add fact-checking agent scenario tests
  • Files: tests/scenarios/test_fact_checking.py

Task 13: Real-World Scenario - Multi-Agent Workflow

Description: Test multiple agents collaborating on a complex task.

Delegation Recommendation:

  • Category: unspecified-high - Complex multi-agent testing
  • Skills: [python-programmer] - Complex scenario design

Skills Evaluation:

  • INCLUDED python-programmer: Required for multi-agent testing

Depends On: Tasks 7, 10

Acceptance Criteria:

  • Multiple agents can share a provider
  • Agents maintain separate histories
  • Workflow stages execute in order
  • Output from one agent feeds into next
  • Each agent has specialized role

Scenario:

# Research → Write → Edit workflow
# Researcher gathers facts
# Writer creates content
# Editor reviews and improves

Test Cases:

  1. test_multi_agent_research_write - Research to writer flow
  2. test_multi_agent_with_editor - Three-agent workflow
  3. test_multi_agent_isolated_histories - Histories don't leak
  4. test_multi_agent_shared_provider - Provider shared correctly
  5. test_multi_agent_error_handling - Errors don't break workflow

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/scenarios/test_multi_agent.py

Commit: YES

  • Message: test: add multi-agent workflow scenario tests
  • Files: tests/scenarios/test_multi_agent.py

Task 14: MCP Mock Tests

Description: Test MCP integration with mocked MCP server.

Delegation Recommendation:

  • Category: unspecified-low - MCP protocol mocking
  • Skills: [python-programmer] - Protocol mocking

Skills Evaluation:

  • INCLUDED python-programmer: Required for MCP mocking

Depends On: Tasks 3, 6

Acceptance Criteria:

  • MCPClient connects to mock server
  • Tools load from mock server
  • MCP tools execute correctly
  • MCP errors handled gracefully
  • Connection cleanup works

Test Cases:

  1. test_mcp_connect_stdio - STDIO connection mock
  2. test_mcp_connect_sse - SSE connection mock
  3. test_mcp_load_tools - Load tools from mock
  4. test_mcp_tool_execution - Execute MCP tool
  5. test_mcp_error_handling - MCP errors handled
  6. test_mcp_context_manager - Async context manager works

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/mocks/mcp_server.py
  • /home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_mcp.py

Commit: YES

  • Message: test: add MCP integration tests with mocks
  • Files: tests/mocks/mcp_server.py, tests/integration/test_mcp.py

Task 15: Error Handling Tests

Description: Test error scenarios: provider errors, tool errors, timeout, connection issues.

Delegation Recommendation:

  • Category: unspecified-low - Error scenario testing
  • Skills: [python-programmer] - Error testing patterns

Skills Evaluation:

  • INCLUDED python-programmer: Required for error testing

Depends On: Tasks 6, 7

Acceptance Criteria:

  • APIConnectionError handled correctly
  • APITimeoutError handled correctly
  • APIStatusError handled correctly
  • Tool execution errors don't crash agent
  • Invalid tool arguments handled
  • Max iterations prevents infinite loops

Test Cases:

  1. test_provider_connection_error - Connection failure
  2. test_provider_timeout_error - Request timeout
  3. test_provider_status_error - HTTP error status
  4. test_provider_empty_response - Empty response handling
  5. test_tool_execution_error - Tool raises exception
  6. test_tool_invalid_arguments - Invalid args to tool
  7. test_tool_not_found_error - Unknown tool called
  8. test_max_iterations_reached - Loop prevention
  9. test_json_decode_error - Invalid JSON in tool args

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_errors.py

Commit: YES

  • Message: test: add error handling tests
  • Files: tests/integration/test_errors.py

Task 16: Test Coverage Analysis

Description: Analyze test coverage and ensure targets are met.

Delegation Recommendation:

  • Category: quick - Coverage analysis
  • Skills: [python-programmer] - Coverage tooling

Skills Evaluation:

  • INCLUDED python-programmer: Required for coverage analysis

Depends On: All previous tasks

Acceptance Criteria:

  • Overall coverage >= 80%
  • Core modules (message, tool, agent) >= 90%
  • Provider module >= 70%
  • MCP module >= 60%
  • Coverage report generated
  • Missing coverage documented

Coverage Targets:

Module Target Priority
agentlite.message 95% P0
agentlite.tool 95% P0
agentlite.agent 90% P0
agentlite.config 90% P0
agentlite.provider 80% P1
agentlite.providers.openai 70% P1
agentlite.mcp 60% P2

Files to Create:

  • /home/tcmofashi/proj/general_agent/agentlite/tests/.coveragerc

Commit: YES

  • Message: test: add coverage configuration and analysis
  • Files: tests/.coveragerc

Test File Structure

/home/tcmofashi/proj/general_agent/agentlite/tests/
├── conftest.py                    # Shared fixtures and configuration
├── utils.py                       # Test utilities and helpers
├── .coveragerc                    # Coverage configuration
├── unit/                          # Unit tests
│   ├── __init__.py
│   ├── test_message.py           # Message types tests
│   ├── test_tool.py              # Tool system tests
│   ├── test_config.py            # Configuration tests
│   └── test_provider.py          # Provider protocol tests
├── integration/                   # Integration tests
│   ├── __init__.py
│   ├── test_agent.py             # Agent integration tests
│   ├── test_tool_loop.py         # Tool calling loop tests
│   ├── test_streaming.py         # Streaming tests
│   ├── test_history.py           # History management tests
│   ├── test_mcp.py               # MCP integration tests
│   └── test_errors.py            # Error handling tests
├── scenarios/                     # Real-world scenario tests
│   ├── __init__.py
│   ├── test_data_quality.py      # Data quality agent
│   ├── test_fact_checking.py     # Fact-checking agent
│   └── test_multi_agent.py       # Multi-agent workflow
└── mocks/                         # Mock implementations
    ├── __init__.py
    ├── provider.py               # Mock OpenAI provider
    └── mcp_server.py             # Mock MCP server

Test Fixtures (conftest.py)

Core Fixtures

# Mock provider fixtures
@pytest.fixture
def mock_provider():
    """Create a mock provider with no responses configured."""
    return MockProvider()

@pytest.fixture
def mock_provider_with_response():
    """Create a mock provider that returns a simple text response."""
    provider = MockProvider()
    provider.add_text_response("Hello!")
    return provider

# Sample message fixtures
@pytest.fixture
def sample_text_message():
    """Create a sample text message."""
    return Message(role="user", content="Hello!")

@pytest.fixture
def sample_tool_call():
    """Create a sample tool call."""
    return ToolCall(
        id="call_123",
        function=ToolCall.FunctionBody(
            name="add",
            arguments='{"a": 1, "b": 2}'
        )
    )

# Tool fixtures
@pytest.fixture
def add_tool():
    """Create a simple add tool."""
    @tool()
    async def add(a: float, b: float) -> float:
        """Add two numbers."""
        return a + b
    return add

@pytest.fixture
def error_tool():
    """Create a tool that raises an error."""
    @tool()
    async def error() -> str:
        """Always raises an error."""
        raise ValueError("Test error")
    return error

# Agent fixtures
@pytest.fixture
async def simple_agent(mock_provider):
    """Create a simple agent with mocked provider."""
    return Agent(provider=mock_provider)

@pytest.fixture
async def agent_with_tools(mock_provider, add_tool):
    """Create an agent with tools."""
    return Agent(provider=mock_provider, tools=[add_tool])

Mock Implementations

MockProvider

class MockProvider:
    """Mock provider for testing AgentLite without real API calls.
    
    This provider simulates OpenAI API responses and allows:
    - Configuring response sequences
    - Simulating tool calls
    - Simulating errors
    - Tracking all calls for verification
    
    Example:
        provider = MockProvider()
        provider.add_text_response("Hello!")
        provider.add_tool_call("add", {"a": 1, "b": 2}, "3")
        
        agent = Agent(provider=provider)
        response = await agent.run("Hi")
        
        # Verify calls
        assert len(provider.calls) == 1
        assert provider.calls[0].system_prompt == "You are helpful."
    """
    
    def __init__(self):
        self.responses = []
        self.calls = []
        self.model = "mock-model"
    
    def add_text_response(self, text: str):
        """Add a text response to the queue."""
        self.responses.append({"type": "text", "content": text})
    
    def add_tool_call(self, name: str, arguments: dict, result: str):
        """Add a tool call response to the queue."""
        self.responses.append({
            "type": "tool_call",
            "name": name,
            "arguments": arguments,
            "result": result
        })
    
    def add_error(self, error: Exception):
        """Add an error response to the queue."""
        self.responses.append({"type": "error", "error": error})
    
    async def generate(self, system_prompt, tools, history):
        """Generate a mock response."""
        self.calls.append(MockCall(
            system_prompt=system_prompt,
            tools=tools,
            history=list(history)
        ))
        
        if not self.responses:
            return MockStreamedMessage([TextPart(text="Mock response")])
        
        response = self.responses.pop(0)
        
        if response["type"] == "error":
            raise response["error"]
        elif response["type"] == "text":
            return MockStreamedMessage([TextPart(text=response["content"])])
        elif response["type"] == "tool_call":
            return MockStreamedMessage([
                ToolCall(
                    id="call_123",
                    function=ToolCall.FunctionBody(
                        name=response["name"],
                        arguments=json.dumps(response["arguments"])
                    )
                )
            ])

Test Configuration (pytest.ini)

[pytest]
testpaths = tests
asyncio_mode = auto
asyncio_default_fixture_loop_scope = function
pythonpath = src
addopts = -v --tb=short --strict-markers
markers =
    unit: Unit tests
    integration: Integration tests
    scenario: Real-world scenario tests
    slow: Slow tests

Running Tests

# Run all tests
cd /home/tcmofashi/proj/general_agent/agentlite
pytest tests/

# Run with coverage
pytest tests/ --cov=agentlite --cov-report=html --cov-report=term

# Run specific test categories
pytest tests/unit/ -v
pytest tests/integration/ -v
pytest tests/scenarios/ -v

# Run with markers
pytest -m unit
pytest -m integration
pytest -m "not slow"

# Run specific test file
pytest tests/unit/test_message.py -v

# Run with debugging
pytest tests/ -v --pdb

Commit Strategy

After Task Commit Message Files
Task 1 test: setup pytest configuration and shared fixtures tests/conftest.py, tests/utils.py
Task 2 test: add unit tests for message types tests/unit/test_message.py
Task 3 test: add unit tests for tool system tests/unit/test_tool.py
Task 4 test: add unit tests for configuration models tests/unit/test_config.py
Task 5 test: add unit tests for provider protocol tests/unit/test_provider.py
Task 6 test: add mock provider for testing tests/mocks/provider.py
Task 7 test: add agent integration tests tests/integration/test_agent.py
Task 8 test: add tool calling loop tests tests/integration/test_tool_loop.py
Task 9 test: add streaming response tests tests/integration/test_streaming.py
Task 10 test: add conversation history tests tests/integration/test_history.py
Task 11 test: add data quality agent scenario tests tests/scenarios/test_data_quality.py
Task 12 test: add fact-checking agent scenario tests tests/scenarios/test_fact_checking.py
Task 13 test: add multi-agent workflow scenario tests tests/scenarios/test_multi_agent.py
Task 14 test: add MCP integration tests with mocks tests/mocks/mcp_server.py, tests/integration/test_mcp.py
Task 15 test: add error handling tests tests/integration/test_errors.py
Task 16 test: add coverage configuration and analysis tests/.coveragerc

Success Criteria

Verification Commands

# All tests pass
pytest tests/ -v

# Coverage meets targets
pytest tests/ --cov=agentlite --cov-report=term-missing

# No import errors
python -c "import agentlite; print('OK')"

# Type checking passes (if mypy configured)
mypy src/agentlite/

Final Checklist

  • All unit tests pass
  • All integration tests pass
  • All scenario tests pass
  • Coverage >= 80% overall
  • Core modules >= 90% coverage
  • All mocks work correctly
  • Tests run without real API keys
  • Tests are deterministic
  • Tests are well-documented
  • Test files follow naming convention

Notes

  1. No Real API Calls: All tests must work without real API keys using mocks
  2. Deterministic: Tests should produce consistent results
  3. Fast: Unit tests should complete in < 1 second each
  4. Isolated: Tests should not depend on each other
  5. Documented: Complex scenarios should have docstrings explaining the use case