Feature toggles
The content here is under the Attribution 4.0 International (CC BY 4.0) license
Feature flags have become a standard practice for teams that want to deliver software fast into the hands of the users without disruption. With this approach, new challenges and ways of working come into play when writing code.
Pete Hodgson’s article, published on Martin Fowler’s blog, provides an introduction to feature toggle types and implementation patterns. We’ll use this taxonomy alongside the Feature Flag Best Practices handbook by Hodgson and Echagüe to explore the subject (Hodgson, 2017).
Why flags?
The first and foremost benefit of flags is the decoupling of deployment and release of a feature. For many software development teams, the release of a new functionality implies not only publicly opening it to their users, but it is also a stressful moment for the entire team, from the moment that the code is pushed until it gets into the users’ hands.
This model is all-in or nothing. If the feature that was released needs to be reverted, a new version needs to be deployed again. If it happens that a version contains an error or causes an unplanned outage in production, the team needs to roll back what was deployed code-wise. If things are related to data, it becomes even more complex. This scenario was described by Kent Beck as a “D-day”.
In my experience, D-day is often related with git-flow as a standard practice. With git-flow, the cycle to release changes for users goes like this:
- branch off from the stable branch
- apply the changes
- spin up an environment that is just with this code.
- do the manual testing
- generate a tag with this release
- integrate the work backward in the development branch
- integrate the work in any branch that requires it
At first, it sounds like a good plan, but teams that work in that manner often reach an integration hell. More recently, practitioners have scrutinized this approach, with many advocating for simpler workflows such as trunk-based development (Hammant, 2017). Multiple practitioners and organizations have documented challenges with Git Flow in favor of simpler branching strategies (GitFlow considered harmful, Please stop recommending Git Flow!). On the other end of the spectrum, there are simpler approaches to delivering software, but with that, there are disciplines that need to proceed with the adoption of this technique.
Teams in software development ignore that technical practices matter. For example, it is important to have a continuous delivery pipeline with tests that developers trust and an environment that replicates what production looks like. The feedback loop is one of the most important things while developing software. This becomes crucial for developers that are onboarding a new team and exploring unknown areas of the code base.
With the correct feedback loop and the trustability of developers, this is the first step towards reducing the complexity and exploring trunk-based development. Trunk-based development abstracts away the complexities of many versions, live syncing patches, and the environmental complexities by using a single branch and keeping everyone on the same page at all times.
It sounds simple and actually it is. However, the complexity now turns to the team and the technical practices that the team uses. This becomes even more important in the AI area.
Team maturity and practices
In the advent of AI, agents and LLM’s the dicussion about dificulties in the extensive usage of AI tools, has arrived in the social media. In this tweet on a X group, Joven shares “How do you deal with collegues that are using AI tools to generate code that they can’t fully explain how it works?”. This is a common problem that teams face when they start using AI tools to generate code. The code generated by AI tools can be complex and difficult to understand, especially if the developers are not familiar with the underlying algorithms and techniques used by the AI. My answer to that was based on the basis of eXtreme Programming practices:
that’s what I value the most about extreme programming because the values and the principles are a way to the practice, not the opposite.
You use a tool based on the principles and values that you have. Using AI is a reality and there is no go back, but fundamentals at the moment I am writing this are not yet automated.
One big shift is starting to use the feature flag concept. Flags are things that someone can turn on or off to view new functionalities for specific users, specific environments, or to target a percentage of users in an application. Note that the complexity now lies in the code and not anymore in the version control system. One positive side effect is that there is no need to keep merging and syncing different branches with different code. The only worry is to make changes compatible with what already exists and avoid introducing breaking changes.
In the realm of trunk-based development and feature toggles, code duplication is a particular issue worth highlighting. Maintaining proper feedback loops and comprehensive testing ensures code remains clean, effectively managing code duplication. In the open-source community, multiple projects use feature flags to control when specific features are enabled or disabled.
ReactJS, for example, uses feature flags to experiment with new features. In their documentation, there is a reference to their style of approaching it: How to Contribute to React (accessed 14 Apr, 2024).
How does it work?
At its core, a feature flag is a conditional statement that wraps code, allowing you to enable or disable functionality at runtime without deploying new code. The mechanism is conceptually simple but powerful in practice.
Basic Implementation Pattern
The most fundamental pattern looks like this:
if (featureFlags.isEnabled('new-checkout-flow')) {
// new implementation
return newCheckoutProcess(cart);
} else {
// old implementation
return legacyCheckoutProcess(cart);
}
This simple conditional allows the team to deploy both implementations and switch between them dynamically. When the new checkout flow is ready for production, you enable the flag. If issues arise, you disable it instantly—no deployment needed.
Storage Mechanisms
Feature flags need to be stored somewhere accessible at runtime. Common approaches include:
- Environment Variables: Simple boolean flags stored in your deployment configuration. Best for basic on/off switches that rarely change.
- Configuration Files: JSON or YAML files that can be updated without code changes. Suitable for flags that need occasional updates.
- Database: Flags stored in your application database, allowing runtime updates through admin interfaces.
- Dedicated Services: Tools like LaunchDarkly, ConfigCat, or AWS AppConfig provide specialized infrastructure for flag management with advanced features like percentage rollouts and user targeting.
Evaluation Context
The power of feature flags extends beyond simple booleans. Modern implementations evaluate flags based on context:
function isFeatureEnabled(flagName: string, userContext: UserContext): boolean {
const flag = getFlagConfig(flagName);
// Environment-based
if (flag.enabledEnvironments && !flag.enabledEnvironments.includes(currentEnv)) {
return false;
}
// User-based targeting
if (flag.targetUsers && flag.targetUsers.includes(userContext.id)) {
return true;
}
// Percentage rollout
if (flag.percentageRollout) {
const userHash = hashCode(userContext.id) % 100;
return userHash < flag.percentageRollout;
}
return flag.defaultEnabled;
}
This allows for sophisticated rollout strategies: start with 5% of users, monitor metrics, gradually increase to 25%, then 50%, and finally 100% once confidence is high.
The Lifecycle
A typical feature flag lifecycle involves:
- Configuration: Define the flag in your system with initial state (usually disabled in production)
- Implementation: Wrap new code with flag conditionals, keeping old code as fallback
- Testing: Validate both code paths work correctly
- Deployment: Ship code with flag disabled (or enabled for internal users only)
- Gradual Rollout: Enable for progressively larger user segments
- Monitoring: Track metrics and user feedback
- Full Release: Enable for 100% of users
- Cleanup: After confidence is established, remove the flag and old code path
A Concrete Example
Let’s walk through implementing a new payment provider:
Step 1: Add the feature flag configuration:
payment-provider-v2:
enabled: false
environments: [staging, production]
percentage: 0
Step 2: Implement with the flag:
function processPayment(order: Order, customer: Customer): Promise<PaymentResult> {
if (featureEnabled('payment-provider-v2', customer)) {
return NewPaymentProvider.process(order);
} else {
return LegacyPaymentProvider.process(order);
}
}
Step 3: Test both paths in your test suite:
describe('PaymentProcessor', () => {
describe('with payment-provider-v2 enabled', () => {
beforeEach(() => enableFeature('payment-provider-v2'));
it('uses new provider', () => {
// test new provider
});
});
describe('with payment-provider-v2 disabled', () => {
beforeEach(() => disableFeature('payment-provider-v2'));
it('uses legacy provider', () => {
// test legacy provider
});
});
});
Step 4: Deploy to staging with flag enabled, validate thoroughly, then deploy to production with flag disabled.
Step 5: Enable for 5% of production traffic, monitor error rates and payment success metrics.
Step 6: If metrics look good after 24 hours, increase to 25%, then 50%, then 100% over the following days.
Step 7: Once at 100% for a week with no issues, remove the flag and legacy code in a cleanup PR.
Development Workflow Changes
When working with feature flags, the first thing developers write is a test verifying the flag system works correctly. This includes testing that the flag properly enables/disables the feature and that both code paths function as expected. The testing strategy becomes more comprehensive as you need to validate all possible flag states.
Tools
The feature flag ecosystem ranges from simple configuration files to sophisticated SaaS platforms. The right choice depends on your team’s needs, scale, and budget. Let’s explore the spectrum of options.
The Spectrum: From Simple to Enterprise
At the simplest level, feature flags can be environment variables or boolean values in configuration files. As your needs grow—requiring dynamic updates, percentage rollouts, or A/B testing—dedicated tools become valuable. Understanding where your requirements fall on this spectrum guides your decision.
Open Source Solutions
Unleash
An open-source feature flag service that can be self-hosted or used as a managed service. Unleash provides a robust API, SDKs for multiple languages, and a web interface for managing flags. It’s particularly strong for teams that want control over their infrastructure while avoiding building from scratch.
- Primary use case: Teams wanting self-hosted solutions with enterprise features
- Key differentiator: Strong community, battle-tested in production environments
- Pricing: Open source with optional commercial support
Flagsmith
Another popular open-source option offering both self-hosted and cloud variants. Flagsmith emphasizes ease of use and provides features like remote config, A/B testing, and multi-environment support out of the box.
- Primary use case: Startups and mid-size companies wanting simple deployment
- Key differentiator: User-friendly interface, quick setup
- Pricing: Open source, with paid cloud hosting options
SaaS Platforms
ConfigCat
A developer-friendly service with a free tier that includes unlimited flags. Known for fast flag evaluation and simple APIs.
- Primary use case: Startups to mid-size companies wanting managed services affordably
- Key differentiator: Unlimited flags on free tier (up to 1,000 requests/month), transparent pricing, easy integration
- Pricing: Free tier available, usage-based pricing beyond free limits
GrowthBook
Open-source platform specializing in A/B testing and experimentation, with feature flags as a core component.
- Primary use case: Product teams focused on experimentation and metrics
- Key differentiator: Built-in statistical engine for experiment analysis
- Pricing: Open source with optional cloud hosting
PostHog
An all-in-one product analytics platform that includes feature flags as part of a broader toolkit.
- Primary use case: Teams wanting combined analytics and feature flag management
- Key differentiator: Single platform for analytics, session replay, and flags
- Pricing: Open source and cloud options, usage-based
Cloud Provider Solutions
AWS AppConfig
Part of AWS Systems Manager, AppConfig provides feature flag and configuration management integrated with AWS services. If you’re already on AWS, it offers native integration with other AWS tools.
- Primary use case: Teams heavily invested in AWS ecosystem
- Key differentiator: Native AWS integration, no additional vendor
- Pricing: Pay-per-use, typically lower cost for AWS-centric architectures
- Best practices for validating AWS AppConfig feature flags
Azure App Configuration and Google Cloud Config offer similar capabilities for their respective ecosystems.
Simple Solutions (Start Here)
For teams just adopting feature flags, don’t over-engineer:
Environment Variables
const ENABLE_NEW_DASHBOARD = process.env.ENABLE_NEW_DASHBOARD === 'true';
const ENABLE_BETA_FEATURES = process.env.ENABLE_BETA_FEATURES === 'true';
Works for simple on/off switches. Requires deployment to change but costs nothing and has zero complexity.
Configuration Files
interface FeatureConfig {
features: {
newCheckout: boolean;
experimentalUI: boolean;
rolloutPercentage: number;
};
}
const config: FeatureConfig = {
features: {
newCheckout: true,
experimentalUI: false,
rolloutPercentage: 50
}
};
Allows runtime updates if you implement config reloading. Good for early-stage feature flag adoption. Database Flags A
simple feature_flags table with admin UI for toggling. You’re building basic infrastructure but maintain full control.
Decision Framework
Starting out or small team (<10 developers)
- Environment variables or config files
- Graduate to simple database-backed solution
- Consider ConfigCat’s free tier if you want managed service
Mid-size team (10-50 developers)
- Unleash or Flagsmith for self-hosted
- ConfigCat or GrowthBook for managed service
- AWS/Azure/GCP native if committed to one cloud
Enterprise or heavy experimentation focus
- LaunchDarkly for maximum features and support
- Split.io if experimentation is core to your process
- GrowthBook if you want open-source with enterprise features
Critical considerations:
- API latency: Local evaluation (SDKs cache flags) vs. remote evaluation (every check hits API)
- Data residency: Some industries require self-hosted for compliance
- Lock-in risk: Open source solutions reduce vendor lock-in
- Team expertise: Managed services reduce operational burden
Types of feature toggles
Pete Hodgson’s taxonomy, as outlined in the Martin Fowler article on feature toggles, categorizes flags into four distinct types based on their purpose and longevity. Understanding these categories is crucial because conflating them leads to technical debt and maintenance headaches.
Release Toggles
Purpose: Enable trunk-based development by allowing incomplete features to be merged into the main branch without being visible to users.
Lifespan: Short-lived (days to weeks). Should be removed immediately after full rollout.
Dynamism: Can be changed without redeployment, typically toggled through configuration.
Who controls: Engineering team, sometimes product managers during rollout.
Example: You’re building a new dashboard layout. The work takes three weeks across multiple developers. Rather than
maintaining a long-lived feature branch, you merge incrementally to main with the feature behind a new-dashboard-layout
flag. Once complete and validated, you enable it for all users and remove the flag within days.
Key characteristic: These flags have an expiration date. If a release toggle lives for months, it’s a code smell indicating either over-engineering or poor cleanup discipline.
Experiment Toggles (A/B Testing)
Purpose: Run controlled experiments to measure the impact of different implementations on user behavior and business metrics.
Lifespan: Medium-lived (weeks to months), lasting for the duration of the experiment plus analysis period.
Dynamism: Highly dynamic, often requiring sophisticated distribution logic to ensure proper experimental cohorts.
Who controls: Product managers, data scientists, or growth teams.
Example: You want to test whether a green “Buy Now” button converts better than the current blue one. You create an experiment flag that shows green to 50% of users and blue to the other 50%, tracking conversion rates for each cohort. After two weeks of data collection, you analyze results and make the winning variant permanent.
Key characteristic: These flags require careful user bucketing to avoid contaminating experimental results. A user must consistently see the same variant throughout the experiment.
Ops Toggles (Circuit Breakers)
Purpose: Control operational aspects of the system, often for managing system behavior during incidents or peak load.
Lifespan: Long-lived, potentially permanent. They’re part of your operational toolkit.
Dynamism: Must be changeable without deployment—often in real-time during incidents.
Who controls: Operations team, SREs, or on-call engineers.
Example: Your application integrates with a third-party recommendation engine. You implement an enable-recommendations
ops toggle. During Black Friday, the recommendation service becomes overloaded and slow, dragging down your page load
times. You disable the toggle instantly, falling back to showing popular items instead. Your site stays fast while the
third party recovers.
Key characteristic: These flags are defensive mechanisms. They should gracefully degrade functionality rather than cause failures.
Permission Toggles (Feature Access)
Purpose: Control which users or accounts have access to specific features, often for premium tiers or beta programs.
Lifespan: Long-lived to permanent. These often evolve into your product’s access control system.
Dynamism: Changed through administrative interfaces, not requiring deployments.
Who controls: Customer success teams, account managers, or automated based on billing system.
Example: You offer a “Teams” feature only available to users on your Pro plan. The teams-feature permission toggle
checks the user’s subscription tier and enables/disables accordingly. This might also be used for invite-only beta
programs: “AI Assistant is available to users in the beta-testers group.”
Key characteristic: These flags are tied to your business model and user access patterns. They’re features themselves, not temporary development aids.
Comparison Table
| Type | Lifespan | Dynamic? | Who Controls? | Primary Purpose |
|---|---|---|---|---|
| Release | Days-weeks | Yes | Engineering | Safe integration |
| Experiment | Weeks-months | Yes | Product/Data | Measure impact |
| Ops | Long-lived | Yes (critical) | Operations | System stability |
| Permission | Long-lived/Permanent | Yes | Business teams | Access control |
Why This Matters
Teams that treat all flags the same accumulate technical debt rapidly. A common anti-pattern is keeping release toggles around “just in case,” turning them into permanent code complexity. The flag that was meant to ease a two-week feature merge becomes a permanent conditional that developers must understand forever.
Equally problematic: using release toggles for permissions. This creates specific security risks:
- Authorization bypass risks: If flag evaluation happens client-side or without proper authentication checks, users may manipulate flags to access features they shouldn’t
- Scattered authorization logic: Permission checks become distributed across feature flag code rather than centralized in access control systems, making security audits difficult
- Compliance gaps: Traditional IAM systems provide audit logging, role management, and compliance reporting that ad-hoc flag-based permissions lack
- Inconsistent enforcement: Different parts of the system might check flags differently, creating authorization gaps
The solution is ruthless cleanup of short-lived flags and proper infrastructure for long-lived ones. Release and experiment toggles should have automated reminders or CI checks that alert teams when they’ve overstayed their welcome.
Trade-offs and Challenges
While feature flags enable powerful capabilities, they introduce complexity and risks that teams must actively manage. Understanding these trade-offs is essential for successful adoption.
Increased Code Complexity
Every feature flag creates multiple code paths. What appears as a simple if statement doubles the possible execution
paths through your code:
// One flag: 2 paths
if (flagA) {
path1();
} else {
path2();
}
// Two flags: 4 paths
if (flagA) {
if (flagB) {
path1();
} else {
path2();
}
} else {
if (flagB) {
path3();
} else {
path4();
}
}
With n independent flags in a code path, you have 2^n possible combinations. This exponential growth makes reasoning about system behavior increasingly difficult. While you won’t typically have many independent flags in a single code path, even a few create substantial complexity.
Testing Burden
Each flag combination should ideally be tested, but exhaustive testing becomes impractical quickly. Teams must balance thorough testing with pragmatism:
- Test the most critical paths (flag enabled and disabled)
- Test interactions between related flags
- Use property-based testing to explore flag combinations
- Accept that some edge cases won’t be tested until production (with careful monitoring)
A common pattern is testing each flag independently while ensuring good observability to catch unexpected interactions in production.
Technical Debt Accumulation
The most insidious challenge is flag proliferation. Feature flags that aren’t removed become permanent complexity. Teams need discipline and processes to prevent this:
Anti-patterns that cause debt:
- “We might need to toggle this again someday” (YAGNI violation)
- Forgetting flags exist after successful rollouts
- Lacking ownership or process for flag cleanup
- Using release toggles for permanent configuration
Mitigation strategies:
- Set expiration dates for release and experiment toggles
- Create Jira tickets or reminders for flag removal
- Add CI warnings for flags older than expected lifespan
- Make flag cleanup part of definition-of-done
- Regular “flag debt” cleanup sprints
Some teams implement automated flag removal: if a release toggle has been 100% enabled for 30 days without issues, an automated PR removes it.
Configuration Sprawl
Managing flag states across multiple environments (development, staging, production) becomes complex:
- Which flags should be enabled where?
- How do you keep configuration synchronized?
- What happens when staging differs from production?
- How do you test production-like configurations locally?
Configuration management discipline becomes critical. Many teams adopt “configuration as code” approaches, storing flag states in version control and using CI/CD to deploy them consistently.
Performance Considerations
Every flag evaluation has a cost, even if small:
- Local evaluation: Minimal cost (microseconds) but requires SDK integration and cache updates
- Remote evaluation: Network latency on each check (milliseconds to seconds)
- Complex targeting: Sophisticated rules increase evaluation time
Best practices:
- Cache flag values when possible
- Use local SDKs that periodically sync from central service
- Avoid flag evaluation in hot code paths
- Pre-compute flag values at request start rather than checking repeatedly
Operational Risks
Incorrect flag configuration can cause production incidents:
- Accidentally disabling a feature for all users
- Enabling incomplete features
- Targeting rules that match wrong users
- Flag dependencies creating unexpected behavior
Mitigation requires:
- Audit logging of all flag changes
- Gradual rollout strategies (start small, expand gradually)
- Rollback procedures (know how to instantly disable a flag)
- Monitoring and alerting on flag changes
- Testing flag configurations in staging before production
Visibility and Discovery
As flag count grows, teams lose track of what flags exist and their purposes:
- What does
enable_exp_v2actually control? - Is this flag still in use?
- Who owns this flag?
- What’s the flag’s business purpose?
Documentation becomes essential:
- Descriptive flag names (
new-checkout-flownotflag123) - Metadata: owner, purpose, creation date, expected expiration
- Centralized flag registry or dashboard
- Regular audits of flag inventory
The 2^n Problem in Production
The combination explosion isn’t just a testing problem—it’s a production reality. With multiple flags, you cannot exhaustively test all combinations, meaning some paths only execute in production. This requires:
- Comprehensive observability (logging, metrics, tracing)
- Error tracking that captures flag states during incidents
- Gradual rollouts to limit blast radius of unexpected interactions
- Quick rollback capabilities
When Not to Use Feature Flags
Sometimes feature flags are the wrong tool:
- Simple, low-risk changes: Adding a logging statement doesn’t need a flag
- Database migrations: Flags can’t help with irreversible schema changes
- Security patches: These need immediate, consistent deployment
- When technical debt is already high: Adding flags to legacy code might worsen maintainability
Feature flags are powerful but not free. Teams must weigh the benefits (deployment flexibility, safe rollouts, A/B testing) against the costs (complexity, testing burden, technical debt risk). For teams practicing continuous delivery with robust testing and operational discipline, the trade-offs favor flag adoption. For teams lacking these foundations, investing in testing and deployment automation first may be more valuable.
Resources
- Feature Toggles (aka Feature Flags)
- Feature Flag Best Practices - Pete Hodgson, Patricio EchagĂĽe
- Trunk-Based Development: the ins and outs
Changelog
- Feb 02, 2025 - Major content expansion: added comprehensive “How does it work?” section with code examples, completed “Types of feature toggles” taxonomy, expanded “Tools” section with detailed comparisons, added “Trade-offs and Challenges” section, fixed grammar issues, added proper citations
- Jan 30, 2026 - Added aws app config
- May 08, 2024 - Initial version
References
- Hodgson, P. (2017). Feature Toggles (aka Feature Flags). https://martinfowler.com/articles/feature-toggles.html
- Hammant, P. (2017). Trunk Based Development. https://trunkbaseddevelopment.com/