mai-bot/agentlite/TEST_PLAN.md

# AgentLite Comprehensive Test Suite Plan

## Context

AgentLite is a lightweight, async-first Agent component library for LLM applications. It provides:
- **Agent**: Main agent class with tool calling loop and streaming support
- **OpenAIProvider**: OpenAI-compatible provider implementation
- **Tool System**: @tool decorator, CallableTool, CallableTool2, SimpleToolset
- **MCPClient**: MCP server integration
- **Message Types**: ContentPart, Message, ToolCall, etc.
- **Configuration**: Pydantic-based config models

## Test Location
`/home/tcmofashi/proj/general_agent/agentlite/tests/`

## Task Dependency Graph

| Task | Depends On | Reason |
|------|------------|--------|
| 1. Test Configuration Setup | None | Foundation for all tests |
| 2. Message Types Unit Tests | Task 1 | Core data structures |
| 3. Tool System Unit Tests | Task 1 | Core tool abstractions |
| 4. Configuration Unit Tests | Task 1 | Config validation |
| 5. Provider Protocol Unit Tests | Task 1 | Provider interface |
| 6. Mock Provider Implementation | Task 1 | Required for integration tests |
| 7. Agent Integration Tests | Tasks 2, 3, 6 | Tests agent with mocked provider |
| 8. Tool Calling Loop Tests | Tasks 3, 6 | Tests tool execution flow |
| 9. Streaming Response Tests | Tasks 2, 6 | Tests streaming functionality |
| 10. Conversation History Tests | Task 7 | Tests history management |
| 11. Real-World Scenario: Data Quality Agent | Tasks 7, 8 | Practical use case |
| 12. Real-World Scenario: Fact-Checking Agent | Tasks 7, 8 | Practical use case |
| 13. Real-World Scenario: Multi-Agent Workflow | Tasks 7, 10 | Practical use case |
| 14. MCP Mock Tests | Tasks 3, 6 | Tests MCP integration with mocks |
| 15. Error Handling Tests | Tasks 6, 7 | Tests error scenarios |
| 16. Test Coverage Analysis | All above | Verify coverage targets |

## Parallel Execution Graph

```
Wave 1 (Foundation - Start immediately):
├── Task 1: Test Configuration Setup
├── Task 2: Message Types Unit Tests
├── Task 3: Tool System Unit Tests
├── Task 4: Configuration Unit Tests
├── Task 5: Provider Protocol Unit Tests
└── Task 6: Mock Provider Implementation

Wave 2 (Core Integration - After Wave 1):
├── Task 7: Agent Integration Tests (depends: 1, 2, 3, 6)
├── Task 8: Tool Calling Loop Tests (depends: 3, 6)
└── Task 9: Streaming Response Tests (depends: 2, 6)

Wave 3 (Advanced Features - After Wave 2):
├── Task 10: Conversation History Tests (depends: 7)
├── Task 14: MCP Mock Tests (depends: 3, 6)
└── Task 15: Error Handling Tests (depends: 6, 7)

Wave 4 (Real-World Scenarios - After Wave 3):
├── Task 11: Data Quality Agent Scenario (depends: 7, 8)
├── Task 12: Fact-Checking Agent Scenario (depends: 7, 8)
└── Task 13: Multi-Agent Workflow Scenario (depends: 7, 10)

Wave 5 (Finalization - After Wave 4):
└── Task 16: Test Coverage Analysis (depends: all)

Critical Path: Task 1 → Task 6 → Task 7 → Task 10 → Task 13 → Task 16
Parallel Speedup: ~60% faster than sequential execution
```

## Tasks

### Task 1: Test Configuration Setup

**Description**: Create pytest configuration, conftest.py with shared fixtures, and test utilities.

**Delegation Recommendation**:
- Category: `quick` - Configuration setup is straightforward
- Skills: [`python-programmer`] - Python testing infrastructure knowledge

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for pytest configuration and fixture design
- OMITTED `git-master`: No git operations needed for this task
- OMITTED `frontend-ui-ux`: No UI work involved

**Depends On**: None

**Acceptance Criteria**:
- [ ] `pytest.ini` configured with asyncio mode
- [ ] `conftest.py` with shared fixtures (mock_provider, sample_messages, temp_agent)
- [ ] Test utilities module for common assertions
- [ ] All tests can be run with `pytest tests/`

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/conftest.py`
- `/home/tcmofashi/proj/general_agent/agentlite/tests/utils.py`

**Commit**: YES
- Message: `test: setup pytest configuration and shared fixtures`
- Files: `tests/conftest.py`, `tests/utils.py`

---

### Task 2: Message Types Unit Tests

**Description**: Test all message types: ContentPart, TextPart, ImageURLPart, AudioURLPart, ToolCall, ToolCallPart, Message.

**Delegation Recommendation**:
- Category: `quick` - Unit tests for data structures
- Skills: [`python-programmer`] - Python testing patterns

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for writing unit tests
- OMITTED `frontend-ui-ux`: No UI involved

**Depends On**: Task 1

**Acceptance Criteria**:
- [ ] ContentPart polymorphic validation works correctly
- [ ] TextPart merge_in_place works for streaming
- [ ] ToolCall merge_in_place works with ToolCallPart
- [ ] Message content coercion from string works
- [ ] Message.extract_text() returns correct text
- [ ] Message.has_tool_calls() returns correct boolean
- [ ] All edge cases covered (empty content, None values)

**Test Cases**:
1. `test_content_part_registry` - Verify subclass registration
2. `test_text_part_creation` - Basic TextPart instantiation
3. `test_text_part_merge` - Streaming text merge
4. `test_image_url_part` - ImageURLPart creation and serialization
5. `test_audio_url_part` - AudioURLPart creation and serialization
6. `test_tool_call_creation` - ToolCall instantiation
7. `test_tool_call_merge` - ToolCall merging with ToolCallPart
8. `test_message_string_content` - Message with string content coercion
9. `test_message_list_content` - Message with list of ContentParts
10. `test_message_extract_text` - Text extraction from mixed content
11. `test_message_has_tool_calls` - Tool call detection
12. `test_message_serialization` - Pydantic model_dump works

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_message.py`

**Commit**: YES
- Message: `test: add unit tests for message types`
- Files: `tests/unit/test_message.py`

---

### Task 3: Tool System Unit Tests

**Description**: Test tool system: Tool, CallableTool, CallableTool2, SimpleToolset, @tool decorator, ToolResult types.

**Delegation Recommendation**:
- Category: `unspecified-low` - Moderate complexity with async patterns
- Skills: [`python-programmer`] - Python async testing

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for async tool testing
- OMITTED `frontend-ui-ux`: No UI involved

**Depends On**: Task 1

**Acceptance Criteria**:
- [ ] Tool JSON schema validation works
- [ ] CallableTool validates arguments against schema
- [ ] CallableTool2 uses Pydantic for validation
- [ ] SimpleToolset manages tools correctly
- [ ] @tool decorator creates valid tools
- [ ] Tool execution handles errors gracefully
- [ ] Async tool execution works correctly

**Test Cases**:
1. `test_tool_schema_validation` - Invalid schema raises ValueError
2. `test_tool_ok_result` - ToolOk creation and properties
3. `test_tool_error_result` - ToolError creation and properties
4. `test_callable_tool_validation` - Argument validation against schema
5. `test_callable_tool_execution` - Successful tool execution
6. `test_callable_tool_error_handling` - Exception handling in tools
7. `test_callable_tool2_pydantic_validation` - Pydantic model validation
8. `test_callable_tool2_execution` - Type-safe tool execution
9. `test_simple_toolset_add_remove` - Tool management
10. `test_simple_toolset_handle` - Tool call handling
11. `test_simple_toolset_tool_not_found` - Missing tool error
12. `test_tool_decorator_basic` - @tool creates valid tool
13. `test_tool_decorator_with_params` - @tool with custom name/description
14. `test_tool_decorator_type_hints` - Type hint to schema conversion
15. `test_tool_concurrent_execution` - Multiple tools execute concurrently

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_tool.py`

**Commit**: YES
- Message: `test: add unit tests for tool system`
- Files: `tests/unit/test_tool.py`

---

### Task 4: Configuration Unit Tests

**Description**: Test Pydantic configuration models: ProviderConfig, ModelConfig, ToolConfig, AgentConfig.

**Delegation Recommendation**:
- Category: `quick` - Pydantic model validation tests
- Skills: [`python-programmer`] - Pydantic testing

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for Pydantic validation tests

**Depends On**: Task 1

**Acceptance Criteria**:
- [ ] ProviderConfig validates base_url format
- [ ] ProviderConfig stores api_key as SecretStr
- [ ] ModelConfig validates temperature range
- [ ] ModelConfig validates provider is not empty
- [ ] AgentConfig validates default_model exists in models
- [ ] AgentConfig validates all model providers exist
- [ ] get_provider_config and get_model_config work correctly

**Test Cases**:
1. `test_provider_config_validation` - Valid config creation
2. `test_provider_config_invalid_url` - Invalid base_url raises error
3. `test_provider_config_secret_str` - API key is SecretStr
4. `test_model_config_validation` - Valid model config
5. `test_model_config_temperature_range` - Temperature bounds checking
6. `test_model_config_empty_provider` - Empty provider raises error
7. `test_agent_config_validation` - Valid agent config
8. `test_agent_config_missing_default_model` - Missing default_model raises error
9. `test_agent_config_unknown_provider` - Unknown provider raises error
10. `test_agent_config_get_provider` - get_provider_config works
11. `test_agent_config_get_model` - get_model_config works

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_config.py`

**Commit**: YES
- Message: `test: add unit tests for configuration models`
- Files: `tests/unit/test_config.py`

---

### Task 5: Provider Protocol Unit Tests

**Description**: Test provider protocol and exception types: ChatProvider, StreamedMessage, TokenUsage, exception hierarchy.

**Delegation Recommendation**:
- Category: `quick` - Protocol and exception testing
- Skills: [`python-programmer`] - Python protocol testing

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for protocol testing

**Depends On**: Task 1

**Acceptance Criteria**:
- [ ] TokenUsage calculates total correctly
- [ ] Exception hierarchy is correct
- [ ] APIStatusError stores status_code
- [ ] ChatProvider protocol can be implemented

**Test Cases**:
1. `test_token_usage_total` - Total token calculation
2. `test_token_usage_defaults` - Default cached_tokens = 0
3. `test_chat_provider_error_base` - Base exception class
4. `test_api_connection_error` - APIConnectionError creation
5. `test_api_timeout_error` - APITimeoutError creation
6. `test_api_status_error` - APIStatusError with status_code
7. `test_api_empty_response_error` - APIEmptyResponseError creation
8. `test_chat_provider_protocol` - Protocol implementation check

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_provider.py`

**Commit**: YES
- Message: `test: add unit tests for provider protocol`
- Files: `tests/unit/test_provider.py`

---

### Task 6: Mock Provider Implementation

**Description**: Create a comprehensive mock provider for testing that simulates OpenAI API responses without real API calls.

**Delegation Recommendation**:
- Category: `unspecified-low` - Requires understanding of streaming and async patterns
- Skills: [`python-programmer`] - Async generator implementation

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for mock provider implementation

**Depends On**: Task 1

**Acceptance Criteria**:
- [ ] MockProvider implements ChatProvider protocol
- [ ] Can simulate text responses
- [ ] Can simulate tool calls
- [ ] Can simulate streaming responses
- [ ] Can simulate errors
- [ ] Configurable response sequences
- [ ] Tracks calls for verification

**Implementation Details**:
```python
class MockProvider:
    """Mock provider for testing.

    Usage:
        provider = MockProvider()
        provider.add_response("Hello!")
        provider.add_tool_call("add", {"a": 1, "b": 2}, "3")

        agent = Agent(provider=provider)
        response = await agent.run("Hi")

        assert provider.calls == [...]
    """
```

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/mocks/provider.py`

**Commit**: YES
- Message: `test: add mock provider for testing`
- Files: `tests/mocks/provider.py`

---

### Task 7: Agent Integration Tests

**Description**: Test Agent class with mocked provider: initialization, run(), generate(), history management.

**Delegation Recommendation**:
- Category: `unspecified-low` - Integration testing with async
- Skills: [`python-programmer`] - Async integration testing

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for agent integration testing

**Depends On**: Tasks 1, 2, 3, 6

**Acceptance Criteria**:
- [ ] Agent initializes correctly with provider
- [ ] Agent.run() returns string response
- [ ] Agent.run(stream=True) returns async iterator
- [ ] Agent.generate() returns Message
- [ ] Agent adds messages to history
- [ ] Agent.clear_history() clears history
- [ ] Agent respects max_iterations

**Test Cases**:
1. `test_agent_initialization` - Basic agent creation
2. `test_agent_with_tools` - Agent with toolset
3. `test_agent_run_simple` - Simple non-streaming run
4. `test_agent_run_streaming` - Streaming response
5. `test_agent_generate` - Generate without tool loop
6. `test_agent_history_tracking` - Messages added to history
7. `test_agent_clear_history` - History cleared correctly
8. `test_agent_max_iterations` - Respects iteration limit
9. `test_agent_system_prompt` - System prompt used

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_agent.py`

**Commit**: YES
- Message: `test: add agent integration tests`
- Files: `tests/integration/test_agent.py`

---

### Task 8: Tool Calling Loop Tests

**Description**: Test the complete tool calling loop: agent requests tool, tool executes, result returned.

**Delegation Recommendation**:
- Category: `unspecified-low` - Complex async flow testing
- Skills: [`python-programmer`] - Async flow testing

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for tool loop testing

**Depends On**: Tasks 3, 6

**Acceptance Criteria**:
- [ ] Agent calls tool when requested by LLM
- [ ] Tool result is added to history
- [ ] Agent continues conversation after tool result
- [ ] Multiple tool calls in one response handled
- [ ] Tool errors are handled gracefully
- [ ] Tool calls are concurrent

**Test Cases**:
1. `test_single_tool_call` - One tool call in conversation
2. `test_multiple_tool_calls` - Multiple tools in one response
3. `test_tool_call_chain` - Sequential tool calls
4. `test_tool_error_handling` - Tool returns error
5. `test_tool_not_found` - Unknown tool requested
6. `test_tool_concurrent_execution` - Tools execute concurrently
7. `test_tool_result_in_history` - Tool results in conversation history
8. `test_tool_call_with_arguments` - Arguments passed correctly

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_tool_loop.py`

**Commit**: YES
- Message: `test: add tool calling loop tests`
- Files: `tests/integration/test_tool_loop.py`

---

### Task 9: Streaming Response Tests

**Description**: Test streaming responses: text streaming, tool call streaming, mixed content.

**Delegation Recommendation**:
- Category: `unspecified-low` - Async streaming testing
- Skills: [`python-programmer`] - Async generator testing

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for streaming testing

**Depends On**: Tasks 2, 6

**Acceptance Criteria**:
- [ ] Text streams in chunks
- [ ] Tool calls stream correctly
- [ ] Mixed content (text + tool) streams correctly
- [ ] Complete response can be reconstructed
- [ ] Streaming works with tool calling loop

**Test Cases**:
1. `test_stream_text_only` - Simple text streaming
2. `test_stream_tool_call` - Tool call streaming
3. `test_stream_mixed_content` - Text then tool call
4. `test_stream_reconstruction` - Rebuild full response
5. `test_stream_with_tool_loop` - Streaming in tool loop
6. `test_stream_empty_response` - Empty stream handling

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_streaming.py`

**Commit**: YES
- Message: `test: add streaming response tests`
- Files: `tests/integration/test_streaming.py`

---

### Task 10: Conversation History Tests

**Description**: Test conversation history management: message ordering, role tracking, history limits.

**Delegation Recommendation**:
- Category: `quick` - History management testing
- Skills: [`python-programmer`] - State management testing

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for history testing

**Depends On**: Task 7

**Acceptance Criteria**:
- [ ] Messages added in correct order
- [ ] Roles tracked correctly (user, assistant, tool)
- [ ] Tool call IDs preserved
- [ ] History can be inspected
- [ ] History can be cleared
- [ ] History persists across multiple runs

**Test Cases**:
1. `test_history_message_order` - Messages in correct order
2. `test_history_roles` - Correct role tracking
3. `test_history_tool_responses` - Tool call IDs preserved
4. `test_history_persistence` - History across multiple runs
5. `test_history_clear` - Clear history works
6. `test_history_manual_add` - Manually add messages
7. `test_history_copy` - history property returns copy

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_history.py`

**Commit**: YES
- Message: `test: add conversation history tests`
- Files: `tests/integration/test_history.py`

---

### Task 11: Real-World Scenario - Data Quality Agent

**Description**: Test a realistic data quality improvement agent that validates and cleans data.

**Delegation Recommendation**:
- Category: `unspecified-high` - Complex scenario testing
- Skills: [`python-programmer`] - Complex test scenario design

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for scenario implementation

**Depends On**: Tasks 7, 8

**Acceptance Criteria**:
- [ ] Agent validates data format
- [ ] Agent identifies data quality issues
- [ ] Agent suggests corrections
- [ ] Uses multiple tools (validate, clean, analyze)
- [ ] Handles edge cases (empty data, invalid format)

**Scenario**:
```python
# Data Quality Agent validates CSV data
# Tools: validate_csv, detect_anomalies, suggest_fixes
# Test with sample data containing errors
```

**Test Cases**:
1. `test_data_quality_valid_data` - Clean data passes validation
2. `test_data_quality_detects_errors` - Errors detected and reported
3. `test_data_quality_suggests_fixes` - Corrections suggested
4. `test_data_quality_empty_data` - Handles empty input
5. `test_data_quality_invalid_format` - Handles format errors

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/scenarios/test_data_quality.py`

**Commit**: YES
- Message: `test: add data quality agent scenario tests`
- Files: `tests/scenarios/test_data_quality.py`

---

### Task 12: Real-World Scenario - Fact-Checking Agent

**Description**: Test a fact-checking agent that verifies claims using tools.

**Delegation Recommendation**:
- Category: `unspecified-high` - Complex scenario testing
- Skills: [`python-programmer`] - Complex test scenario design

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for scenario implementation

**Depends On**: Tasks 7, 8

**Acceptance Criteria**:
- [ ] Agent extracts claims from text
- [ ] Agent uses search tool to verify
- [ ] Agent provides verdict with evidence
- [ ] Handles uncertain claims appropriately
- [ ] Multiple claims in one text handled

**Scenario**:
```python
# Fact-Checking Agent verifies statements
# Tools: search_facts, calculate_statistics, check_date
# Test with verifiable and unverifiable claims
```

**Test Cases**:
1. `test_fact_check_true_claim` - Correctly identifies true claim
2. `test_fact_check_false_claim` - Correctly identifies false claim
3. `test_fact_check_multiple_claims` - Multiple claims in one text
4. `test_fact_check_uncertain` - Handles uncertain claims
5. `test_fact_check_with_evidence` - Provides supporting evidence

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/scenarios/test_fact_checking.py`

**Commit**: YES
- Message: `test: add fact-checking agent scenario tests`
- Files: `tests/scenarios/test_fact_checking.py`

---

### Task 13: Real-World Scenario - Multi-Agent Workflow

**Description**: Test multiple agents collaborating on a complex task.

**Delegation Recommendation**:
- Category: `unspecified-high` - Complex multi-agent testing
- Skills: [`python-programmer`] - Complex scenario design

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for multi-agent testing

**Depends On**: Tasks 7, 10

**Acceptance Criteria**:
- [ ] Multiple agents can share a provider
- [ ] Agents maintain separate histories
- [ ] Workflow stages execute in order
- [ ] Output from one agent feeds into next
- [ ] Each agent has specialized role

**Scenario**:
```python
# Research → Write → Edit workflow
# Researcher gathers facts
# Writer creates content
# Editor reviews and improves
```

**Test Cases**:
1. `test_multi_agent_research_write` - Research to writer flow
2. `test_multi_agent_with_editor` - Three-agent workflow
3. `test_multi_agent_isolated_histories` - Histories don't leak
4. `test_multi_agent_shared_provider` - Provider shared correctly
5. `test_multi_agent_error_handling` - Errors don't break workflow

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/scenarios/test_multi_agent.py`

**Commit**: YES
- Message: `test: add multi-agent workflow scenario tests`
- Files: `tests/scenarios/test_multi_agent.py`

---

### Task 14: MCP Mock Tests

**Description**: Test MCP integration with mocked MCP server.

**Delegation Recommendation**:
- Category: `unspecified-low` - MCP protocol mocking
- Skills: [`python-programmer`] - Protocol mocking

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for MCP mocking

**Depends On**: Tasks 3, 6

**Acceptance Criteria**:
- [ ] MCPClient connects to mock server
- [ ] Tools load from mock server
- [ ] MCP tools execute correctly
- [ ] MCP errors handled gracefully
- [ ] Connection cleanup works

**Test Cases**:
1. `test_mcp_connect_stdio` - STDIO connection mock
2. `test_mcp_connect_sse` - SSE connection mock
3. `test_mcp_load_tools` - Load tools from mock
4. `test_mcp_tool_execution` - Execute MCP tool
5. `test_mcp_error_handling` - MCP errors handled
6. `test_mcp_context_manager` - Async context manager works

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/mocks/mcp_server.py`
- `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_mcp.py`

**Commit**: YES
- Message: `test: add MCP integration tests with mocks`
- Files: `tests/mocks/mcp_server.py`, `tests/integration/test_mcp.py`

---

### Task 15: Error Handling Tests

**Description**: Test error scenarios: provider errors, tool errors, timeout, connection issues.

**Delegation Recommendation**:
- Category: `unspecified-low` - Error scenario testing
- Skills: [`python-programmer`] - Error testing patterns

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for error testing

**Depends On**: Tasks 6, 7

**Acceptance Criteria**:
- [ ] APIConnectionError handled correctly
- [ ] APITimeoutError handled correctly
- [ ] APIStatusError handled correctly
- [ ] Tool execution errors don't crash agent
- [ ] Invalid tool arguments handled
- [ ] Max iterations prevents infinite loops

**Test Cases**:
1. `test_provider_connection_error` - Connection failure
2. `test_provider_timeout_error` - Request timeout
3. `test_provider_status_error` - HTTP error status
4. `test_provider_empty_response` - Empty response handling
5. `test_tool_execution_error` - Tool raises exception
6. `test_tool_invalid_arguments` - Invalid args to tool
7. `test_tool_not_found_error` - Unknown tool called
8. `test_max_iterations_reached` - Loop prevention
9. `test_json_decode_error` - Invalid JSON in tool args

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_errors.py`

**Commit**: YES
- Message: `test: add error handling tests`
- Files: `tests/integration/test_errors.py`

---

### Task 16: Test Coverage Analysis

**Description**: Analyze test coverage and ensure targets are met.

**Delegation Recommendation**:
- Category: `quick` - Coverage analysis
- Skills: [`python-programmer`] - Coverage tooling

**Skills Evaluation**:
- INCLUDED `python-programmer`: Required for coverage analysis

**Depends On**: All previous tasks

**Acceptance Criteria**:
- [ ] Overall coverage >= 80%
- [ ] Core modules (message, tool, agent) >= 90%
- [ ] Provider module >= 70%
- [ ] MCP module >= 60%
- [ ] Coverage report generated
- [ ] Missing coverage documented

**Coverage Targets**:
| Module | Target | Priority |
|--------|--------|----------|
| agentlite.message | 95% | P0 |
| agentlite.tool | 95% | P0 |
| agentlite.agent | 90% | P0 |
| agentlite.config | 90% | P0 |
| agentlite.provider | 80% | P1 |
| agentlite.providers.openai | 70% | P1 |
| agentlite.mcp | 60% | P2 |

**Files to Create**:
- `/home/tcmofashi/proj/general_agent/agentlite/tests/.coveragerc`

**Commit**: YES
- Message: `test: add coverage configuration and analysis`
- Files: `tests/.coveragerc`

---

## Test File Structure

```
/home/tcmofashi/proj/general_agent/agentlite/tests/
├── conftest.py                    # Shared fixtures and configuration
├── utils.py                       # Test utilities and helpers
├── .coveragerc                    # Coverage configuration
├── unit/                          # Unit tests
│   ├── __init__.py
│   ├── test_message.py           # Message types tests
│   ├── test_tool.py              # Tool system tests
│   ├── test_config.py            # Configuration tests
│   └── test_provider.py          # Provider protocol tests
├── integration/                   # Integration tests
│   ├── __init__.py
│   ├── test_agent.py             # Agent integration tests
│   ├── test_tool_loop.py         # Tool calling loop tests
│   ├── test_streaming.py         # Streaming tests
│   ├── test_history.py           # History management tests
│   ├── test_mcp.py               # MCP integration tests
│   └── test_errors.py            # Error handling tests
├── scenarios/                     # Real-world scenario tests
│   ├── __init__.py
│   ├── test_data_quality.py      # Data quality agent
│   ├── test_fact_checking.py     # Fact-checking agent
│   └── test_multi_agent.py       # Multi-agent workflow
└── mocks/                         # Mock implementations
    ├── __init__.py
    ├── provider.py               # Mock OpenAI provider
    └── mcp_server.py             # Mock MCP server
```

## Test Fixtures (conftest.py)

### Core Fixtures

```python
# Mock provider fixtures
@pytest.fixture
def mock_provider():
    """Create a mock provider with no responses configured."""
    return MockProvider()

@pytest.fixture
def mock_provider_with_response():
    """Create a mock provider that returns a simple text response."""
    provider = MockProvider()
    provider.add_text_response("Hello!")
    return provider

# Sample message fixtures
@pytest.fixture
def sample_text_message():
    """Create a sample text message."""
    return Message(role="user", content="Hello!")

@pytest.fixture
def sample_tool_call():
    """Create a sample tool call."""
    return ToolCall(
        id="call_123",
        function=ToolCall.FunctionBody(
            name="add",
            arguments='{"a": 1, "b": 2}'
        )
    )

# Tool fixtures
@pytest.fixture
def add_tool():
    """Create a simple add tool."""
    @tool()
    async def add(a: float, b: float) -> float:
        """Add two numbers."""
        return a + b
    return add

@pytest.fixture
def error_tool():
    """Create a tool that raises an error."""
    @tool()
    async def error() -> str:
        """Always raises an error."""
        raise ValueError("Test error")
    return error

# Agent fixtures
@pytest.fixture
async def simple_agent(mock_provider):
    """Create a simple agent with mocked provider."""
    return Agent(provider=mock_provider)

@pytest.fixture
async def agent_with_tools(mock_provider, add_tool):
    """Create an agent with tools."""
    return Agent(provider=mock_provider, tools=[add_tool])
```

## Mock Implementations

### MockProvider

```python
class MockProvider:
    """Mock provider for testing AgentLite without real API calls.

    This provider simulates OpenAI API responses and allows:
    - Configuring response sequences
    - Simulating tool calls
    - Simulating errors
    - Tracking all calls for verification

    Example:
        provider = MockProvider()
        provider.add_text_response("Hello!")
        provider.add_tool_call("add", {"a": 1, "b": 2}, "3")

        agent = Agent(provider=provider)
        response = await agent.run("Hi")

        # Verify calls
        assert len(provider.calls) == 1
        assert provider.calls[0].system_prompt == "You are helpful."
    """

    def __init__(self):
        self.responses = []
        self.calls = []
        self.model = "mock-model"

    def add_text_response(self, text: str):
        """Add a text response to the queue."""
        self.responses.append({"type": "text", "content": text})

    def add_tool_call(self, name: str, arguments: dict, result: str):
        """Add a tool call response to the queue."""
        self.responses.append({
            "type": "tool_call",
            "name": name,
            "arguments": arguments,
            "result": result
        })

    def add_error(self, error: Exception):
        """Add an error response to the queue."""
        self.responses.append({"type": "error", "error": error})

    async def generate(self, system_prompt, tools, history):
        """Generate a mock response."""
        self.calls.append(MockCall(
            system_prompt=system_prompt,
            tools=tools,
            history=list(history)
        ))

        if not self.responses:
            return MockStreamedMessage([TextPart(text="Mock response")])

        response = self.responses.pop(0)

        if response["type"] == "error":
            raise response["error"]
        elif response["type"] == "text":
            return MockStreamedMessage([TextPart(text=response["content"])])
        elif response["type"] == "tool_call":
            return MockStreamedMessage([
                ToolCall(
                    id="call_123",
                    function=ToolCall.FunctionBody(
                        name=response["name"],
                        arguments=json.dumps(response["arguments"])
                    )
                )
            ])
```

## Test Configuration (pytest.ini)

```ini
[pytest]
testpaths = tests
asyncio_mode = auto
asyncio_default_fixture_loop_scope = function
pythonpath = src
addopts = -v --tb=short --strict-markers
markers =
    unit: Unit tests
    integration: Integration tests
    scenario: Real-world scenario tests
    slow: Slow tests
```

## Running Tests

```bash
# Run all tests
cd /home/tcmofashi/proj/general_agent/agentlite
pytest tests/

# Run with coverage
pytest tests/ --cov=agentlite --cov-report=html --cov-report=term

# Run specific test categories
pytest tests/unit/ -v
pytest tests/integration/ -v
pytest tests/scenarios/ -v

# Run with markers
pytest -m unit
pytest -m integration
pytest -m "not slow"

# Run specific test file
pytest tests/unit/test_message.py -v

# Run with debugging
pytest tests/ -v --pdb
```

## Commit Strategy

| After Task | Commit Message | Files |
|------------|----------------|-------|
| Task 1 | `test: setup pytest configuration and shared fixtures` | `tests/conftest.py`, `tests/utils.py` |
| Task 2 | `test: add unit tests for message types` | `tests/unit/test_message.py` |
| Task 3 | `test: add unit tests for tool system` | `tests/unit/test_tool.py` |
| Task 4 | `test: add unit tests for configuration models` | `tests/unit/test_config.py` |
| Task 5 | `test: add unit tests for provider protocol` | `tests/unit/test_provider.py` |
| Task 6 | `test: add mock provider for testing` | `tests/mocks/provider.py` |
| Task 7 | `test: add agent integration tests` | `tests/integration/test_agent.py` |
| Task 8 | `test: add tool calling loop tests` | `tests/integration/test_tool_loop.py` |
| Task 9 | `test: add streaming response tests` | `tests/integration/test_streaming.py` |
| Task 10 | `test: add conversation history tests` | `tests/integration/test_history.py` |
| Task 11 | `test: add data quality agent scenario tests` | `tests/scenarios/test_data_quality.py` |
| Task 12 | `test: add fact-checking agent scenario tests` | `tests/scenarios/test_fact_checking.py` |
| Task 13 | `test: add multi-agent workflow scenario tests` | `tests/scenarios/test_multi_agent.py` |
| Task 14 | `test: add MCP integration tests with mocks` | `tests/mocks/mcp_server.py`, `tests/integration/test_mcp.py` |
| Task 15 | `test: add error handling tests` | `tests/integration/test_errors.py` |
| Task 16 | `test: add coverage configuration and analysis` | `tests/.coveragerc` |

## Success Criteria

### Verification Commands

```bash
# All tests pass
pytest tests/ -v

# Coverage meets targets
pytest tests/ --cov=agentlite --cov-report=term-missing

# No import errors
python -c "import agentlite; print('OK')"

# Type checking passes (if mypy configured)
mypy src/agentlite/
```

### Final Checklist

- [ ] All unit tests pass
- [ ] All integration tests pass
- [ ] All scenario tests pass
- [ ] Coverage >= 80% overall
- [ ] Core modules >= 90% coverage
- [ ] All mocks work correctly
- [ ] Tests run without real API keys
- [ ] Tests are deterministic
- [ ] Tests are well-documented
- [ ] Test files follow naming convention

## Notes

1. **No Real API Calls**: All tests must work without real API keys using mocks
2. **Deterministic**: Tests should produce consistent results
3. **Fast**: Unit tests should complete in < 1 second each
4. **Isolated**: Tests should not depend on each other
5. **Documented**: Complex scenarios should have docstrings explaining the use case