How it works

Rigorous science. Calibrated to you.

Race results already acknowledge that age and gender matter. Age groups exist for a reason. Nobody compares a Boston time to a Berlin time without context.

But the same results publish an Overall ranking — a single list, sorted by clock time, presented as a measure of performance. A 25-year-old man finishing ahead of a 50-year-old woman becomes a “better performance” by virtue of appearing higher on the list. The physiology of who showed up — none of it counted, even though everyone watching the race knows it counted.

No one would publish a single Overall ranking that intermingled results from a flat marathon and a hilly one and called it a fair measure of effort. The structural unfairness would be obvious. The same unfairness exists when results intermingle athletes of different ages and genders — it’s just less visible. The Overall ranking, as currently published, isn’t a fair measure of performance. It’s a sorted list that pretends to be.

LevelField makes the unfairness visible, and the ranking fair.

Four ways to see your time.

AgeGenderConditionsCourse

Clock
Age
Gender
Conditions
Course
LevelField
Age
Gender
Conditions
Course
LevelField Base
Age
Gender
Conditions
Course
LevelField Global
Age
Gender
Conditions
Course

Illustrative.

Clock
Raw
Your finish time, as recorded. The raw input from which everything else is derived.
LevelField
Within a race
Your time at this race, leveled for age, gender, and the day’s conditions. Lets you see how you performed against the entire field, not just your bracket.
LevelField Base
Across races
Your time normalized for course and conditions, no demographics. Lets you compare your own performances across races, courses, and years on equal terms.
LevelField Global
Everyone vs. everyone
All four normalizations stacked. Lets any athlete be ranked against any other, regardless of which race they ran. The foundation for global rankings and series standings.

Race Equivalents — your time projected to a named reference race like the Boston Marathon or Kona — are a related concept covered later on this page.

How your time is leveled

This section is longer than the others. Leveling a race result honestly takes more than a multiplier.

The engine, both directions.

From its foundation, FitLogic has been engineered to validate its own outputs: the predictions it makes about race outcomes, and the efficacy of the training prescriptions it produces. Forecasts are made, training is prescribed, outcomes are observed, and the comparison improves the model, continuously, for over a decade. That validation only works if the engine can run in both directions: forward, to predict what an athlete will do at a race, and in reverse, to derive what an athlete’s body must have been doing to produce a given result.

Leveling race results uses that same reverse-direction capability, applied to a different question. The principles are familiar. The use case is new, and it’s why every LevelField result is grounded in the same physics and physiology that have been refining FitLogic’s predictions since long before LevelField existed.

Three of FitLogic’s named technologies do most of the work. PersonAlign™ calibrates physiology to the athlete’s specific age and gender, recognizing that the differences are profound and multi-dimensional rather than incremental. EnviroNorm® translates physiological response to environment, capturing how the body itself reacts to temperature, humidity, elevation, and similar factors. RaceX® simulates how that physiology expresses itself on an actual course, accounting for terrain, wind, air pressure, and the specific demands of the route. Together they let LevelField separate what an athlete did from the conditions in which they did it.

Running, then differences for swim and bike.

The leveling process is easiest to follow as a single discipline. We’ll walk through running first. Triathlon adds swim and bike, the same engine, the same logic, with the additional factors each discipline requires. Differences are noted at the end.

Step 1. Start with what actually happened.

Every result begins with the actual race: the actual finish time, the actual measured course, the actual day’s weather during the athlete’s specific finish window. Not a nominal distance assumed from the race name, since two marathons can differ by tens of meters and that difference matters. Not an average condition for the race date, since wave starts an hour apart can run different races on the same course.

This is the first place LevelField’s approach diverges from simpler methods. Adjusting a finish time by a single course difficulty rating, or by a generic typical-day condition, smooths over real variance that affected real athletes differently.

Step 2. Reverse the race to derive a base FTP.

Using the actual course profile (from a GPX file with elevation and bearings), the actual conditions (factors such as temperature, humidity, elevation, and wind), and physiological inputs about the athlete (such as weight, age, gender, and performance level derived from clock time), the engine works backward from the observed finish time to derive the athlete’s underlying FTP, Functional Threshold Power, the threshold capacity necessary to produce the outcome observed.

This is not a simple division. The engine simulates the race over the course profile, accounting for the energy cost of hills, the impact of weather on physiology, and the way an athlete’s sustainable pace changes with duration (the stamina curve, where the longer the effort, the lower the sustainable percentage of FTP). FTP is solved for recursively until the simulation matches what actually happened.

The output is a base FTP: a measure of the threshold capacity required for the result, with the noise of course, conditions, and duration accounted for.

Step 3. Normalize the FTP for age and gender.

This is where PersonAlign™ does its work. Age and gender affect raw FTP capacity and the shape of the stamina curve, and the relationships are not linear or one-dimensional. They vary across the performance and duration spectrum, among other things. A 30-year-old elite and a 50-year-old age-grouper aren’t separated by a single percentage; the relationship shifts depending on where each athlete sits on that spectrum.

The normalization step applies physiologically derived adjustments calibrated to this individual athlete to translate the base FTP and the stamina curve into a normalized form. The output is a leveled physiological foundation for everything that follows.

Step 4. Re-simulate the race.

The engine now runs forward again, using the normalized FTP and stamina curve, and projects the result onto whichever scenario the leveled time is meant to express:

For a LevelField result (the within-race leveled time), the engine re-simulates over the same course, with conditions normalized to the typical conditions for that course. Same race, same hills, same distance, but age and gender have been leveled and any unusual heat, wind, or humidity has been replaced with the course’s typical baseline.
For a LevelField Base result (comparable across races), the engine simulates over a defined base course under typical conditions. Course and conditions are held constant; demographics are not normalized.
For a LevelField Global result (used for global rankings and series standings), both normalizations are applied. The engine re-simulates over the base course under typical conditions, with the athlete’s normalized FTP and stamina curve. Every athlete’s performance, projected to the same hypothetical race.
For a Race Equivalent (your time projected to a named reference race like the Boston Marathon or Kona), the engine simulates over that race’s course profile, under that race’s typical conditions or a specific year’s actual conditions.

Every output is a re-simulation. Nothing is multiplied or scaled. The engine reconstructs what the athlete would have done under different inputs, using the same physics and physiology that produced the original result.

Differences for cycling.

Bike legs use the same four-step structure with additional factors. Cycling is more aerodynamically sensitive than running, so RaceX® accounts for factors including the rider’s body weight and CdA (aerodynamic drag profile), wind speed and direction against the rider’s bearings throughout the course, and air pressure (athletes ride faster at higher elevations because the air is thinner, producing lower drag at the same power output).

The bike’s target output is also adjusted for what comes next. In a triathlon, an athlete’s optimal bike effort isn’t the maximum the body could produce in isolation. It’s the effort that leaves the right physiological state for the run. The engine accounts for this brick effect using factors including the athlete’s performance level, age, gender, and body composition.

Differences for swimming.

Swim legs are calibrated to factors including water type (salt or fresh, for buoyancy), wetsuit eligibility, current, and water conditions. The same reverse-derivation logic applies: the engine works backward from the observed swim split to a normalized output, then forward again under the leveled conditions.

Transitions.

T1 and T2 are handled differently from the disciplines they connect. Within a single race, the transition time is what the athlete actually recorded. For LevelField Base and cross-race comparisons, transition areas vary substantially in distance and setup, far more than weather varies, and that variance has to be normalized statistically rather than physiologically. The engine uses the distribution of T1 and T2 durations across each race’s field, with age and gender accounted for, to translate transition time at one race into a comparable measure at another.

Why this works even with imperfect data.

A reasonable question: how can the engine derive your FTP without knowing your exact body weight or stamina curve?

The architecture is designed to handle this. Same-course leveling — a LevelField result at the race you ran — is robust to imperfect inputs because the system’s internal consistency mitigates assumption error. Cross-course comparisons (LevelField Base, LevelField Global, Race Equivalents) are more sensitive to input accuracy, which is why those experiences require verified data through a connected TriDot or RunDot account.

This is why LevelField uses progressive verification. Same-race leveling produces accurate results for everyone, with no friction and no required action. Competitive recognition across races, in series, or in global rankings requires more rigorous inputs, and the engine uses your connected device data to derive verified physiological values directly. The architecture is calibrated to the question being asked: more is required where results carry more weight.

The point.

Other approaches to leveling start with the finish time and adjust it. LevelField starts with the finish time, reverses through the actual race to derive the underlying physiological output, normalizes that output for the factors being leveled, then re-simulates the race forward with the new inputs. The output is a finish time, but it’s a finish time produced by physics and physiology, not by multipliers.

This matters because finish times are downstream of conditions, course, and field. Adjusting them after the fact can’t separate what an athlete did from what was done to them by the day’s race. The reverse-then-forward architecture can.

Why simpler approaches fall short

No leveling approach is perfect. Every model, including this one, makes assumptions about athletes whose data is incomplete, about conditions that can’t be measured directly, about physiology that varies in ways no equation captures fully. The right question isn’t whether a method is flawless. It’s whether it’s more accurate than the alternatives, and whether the alternatives are even trying to solve the same problem.

The simpler approaches commonly used to adjust race results aren’t trying to.

Demographic coarseness.

The most common approach to adjusting race results for age and gender is age-grading: a published table that assigns a single percentage adjustment to each five-year demographic bracket. A 50-year-old woman gets one factor, every other woman in F50–54 gets the same factor, and a 30-year-old man gets a different one. Apply the factor to the finish time and you have a “graded” result.

The mechanics fail in two opposite directions at once. Within a bracket, the table treats a 50-year-old and a 54-year-old as physiologically equivalent, even though four years of aging meaningfully affects threshold capacity and stamina-curve shape. At a bracket boundary, the table treats a 49-year-old and a 50-year-old as discontinuously different, even when the two are separated by days or weeks rather than years. The same continuous physiology — aging — is alternately ignored within a bracket and exaggerated at its edges. Neither answer is right.

The deeper failure is that the relationships these tables encode aren’t constant. The penalty an older athlete faces grows with the duration of the effort, varies with their performance level, and interacts with their underlying physiology in ways a single percentage can’t capture. A 50-year-old elite has aged differently than a 50-year-old recreational runner. Applying the same multiplier to both flattens that difference into noise.

Course factors don’t reflect the athlete.

A second common approach is to assign each course a “difficulty rating,” then adjust finish times by that rating. Hilly courses get one factor; flat courses get another. Apply the factor and the result is supposedly comparable across courses.

This treats every athlete on the course as if hills affected them the same way. They don’t. A hill imposes a non-linear cost that scales with body weight, with power-to-weight ratio, with the duration the athlete spends on the climb. A 4-hour marathoner spends nearly twice as long on the same incline as a 2:10 marathoner. The slower athlete accumulates more thermal load, more glycogen depletion, more time at elevated heart rate. The cost of the hill, expressed as a percentage of the athlete’s capacity, is much higher.

Age and gender amplify this. Power-to-weight ratios and stamina curves vary across demographics, and they interact with course profile. A hilly course is disproportionately harder for older athletes on long efforts. A flat course favors raw aerodynamic profile over climbing economy. None of this is captured by a single course factor that gets applied uniformly to every finisher.

Course factors don’t reflect the day.

A single course rating also assumes that the course has one difficulty, fixed across years and across the race day itself. Neither is true.

Year-over-year, conditions vary. Boston in 2018’s nor’easter and Boston in a calm year are different races on the same course. A factor calibrated against an average of past years’ conditions can’t correct for the current year’s actual weather. Worse, the factor itself was likely derived from data already averaged across years of varying conditions, so the baseline it adjusts toward isn’t a real day. It’s a statistical fiction.

Within a single race, conditions also shift. Wave starts an hour or two apart can finish under meaningfully different temperatures, wind directions, and humidity. A morning wave and an afternoon wave didn’t run the same race. A single rating applied to all of them treats their experiences as identical, when athletes who actually ran know they weren’t.

Output-only architecture.

The deeper limitation of these approaches isn’t any single mechanic. It’s the architecture: they all start with the finish time and adjust it. Nothing earlier in the chain is reconsidered.

This means they can’t separate the strength of the field from the performance of the individual. If the field at one race was unusually fast, all finish times are pulled down with the pace, and an output-only adjustment can’t distinguish a strong individual day from a strong collective day. It also means they can’t validate themselves. There’s no prediction step to test against, no feedback loop, no way to ask whether a given adjustment was right or wrong. The system produces an output and the output is final.

What output-only approaches cannot do is what LevelField’s architecture is built around: reverse the result back through the conditions that produced it, separate the athlete’s underlying performance from the field and the day, and re-simulate the race with leveled inputs. The point isn’t that adjustments are bad. The point is that adjusting outputs alone can’t ask the question that matters.

Placement-points ranking systems.

A different category of approach skips finish-time adjustment entirely and ranks athletes across multiple races by placement points. Win a race, get the most points. Finish second, get fewer. Add up the points across a season; the athlete with the most points wins the series.

The flaw is built into the structure. The strength of the field at any given race varies, often for reasons unrelated to the athletes’ performance. Geography matters: races in some regions consistently attract deeper fields than others. Timing matters: a series race held the same weekend as a major event will see top competitors choose the major, leaving the series race open to cherry-picking a top placement against a softer field. An athlete who wins that race earns the same maximum points as an athlete who wins a stacked field at full strength, even though the leveled performances were not remotely comparable.

Placement isn’t performance. It’s a ratio between an athlete and whoever happened to show up that day. Rankings built on placement reward strategic race selection as much as they reward fitness.

This is what LevelField Global addresses. By leveling each athlete’s performance to the same hypothetical race, the system can rank athletes across different events on the strength of what they did, not on the strength of who they ran against.

Progressive data verification

A reasonable concern: if the engine works by reverse-deriving an athlete’s underlying physiology from their finish time, what happens when the inputs to that derivation are unverified or estimated?

The answer depends on what kind of leveled time is being produced. Same-course leveling, a LevelField result at the race the athlete actually ran, is largely insensitive to imperfect inputs, because the same assumptions used to reverse the race are used to re-simulate it (the asymmetry covered in Section 3 cancels out). Cross-course leveling, LevelField Base, LevelField Global, and Race Equivalents, depends on inputs being close to actual values, not just internally consistent.

LevelField handles this with progressive verification. Each tier of verification unlocks a different layer of the experience. Same-race results are available to everyone with no friction. Cross-race comparisons, competitive formats, and global rankings require more.

Four tiers.

Default assumptions.

If you haven’t created a LevelField account, the engine uses defaults based on factors including age, gender, and performance level. Defaults are calibrated conservatively, biased in directions that prevent the kind of cross-course inflation that imprecise inputs could otherwise produce. You get a leveled result at the race you ran, with no friction and no required action. Cross-course results and rankings are not available at this tier.

Self-reported data.

Creating a LevelField account adds a layer of personal context: you supply your own basic data and indicate whether you’re a First-Timer or Novice, which makes you eligible for those experience-based categories. Self-reported data improves the precision of your same-race results.

Connected data.

If you connect a TriDot or RunDot account, you provide your training and race data directly. The engine uses that data to derive verified physiological values, factors such as body weight, sustainable power, stamina-curve shape, and aerodynamic profile, from what your own training and racing actually produced. Connected data unlocks the cross-course capabilities: LevelField Base, LevelField Global, Race Equivalents, and access to competitive formats that depend on cross-athlete fairness, including Clubs, Quad Squads, Series, and Most Improved.

Manual proof submission.

The highest tier applies only at top performance levels, or when an athlete’s data falls outside normal ranges or shows other indicators worth confirming. Manual verification is rare by design.

Why the burden scales with the stakes.

Verification isn’t a barrier to participation. Every athlete at every race gets a LevelField result; the default-assumption tier produces accurate same-race leveling for everyone. What verification enables is access to the parts of the experience that depend on cross-athlete and cross-course fairness.

This is deliberate. If you care about the precision of your results, or want access to cross-course comparisons, series rankings, and competitive formats like Quad Squads, Clubs, and Most Improved, you provide more data, because the integrity of those results and comparisons depends on the quality of every participant’s inputs. If your interest is in your conservative, approximate same-race leveled result, you need nothing beyond the default tier; the precision is sufficient and the result is your own. Verified data often produces results that better reflect what you actually did, since defaults are calibrated to be conservative. You’re welcome and encouraged to verify at whatever depth fits.

To whom much is given, much is required.

The path forward.

The competitive formats, the cross-race rankings, the recognition that depends on real comparison, these are accessible to any athlete who connects their data. The same connection also opens the door to FitLogic’s broader capabilities: the training intelligence engine that powers TriDot and RunDot was built to deliver before LevelField existed. Athletes who level their results and athletes who train with FitLogic are using the same engine, on the same data, for two different questions.

The base course

Cross-course comparisons require a defined reference. To say that one race performance is comparable to another, the system has to project both performances onto the same hypothetical course under the same hypothetical conditions. Without that shared reference, course profiles and weather variations make every comparison apples-to-oranges.

The base course is that reference. LevelField defines a base course for each distance and discipline. For running, there’s a base marathon, a base half-marathon, a base 5K, and so on. For triathlon, there’s a base sprint, a base standard, a base full distance, and so on. A marathon performance is only ever compared against the marathon base course. A sprint triathlon performance is only ever compared against the sprint triathlon base course. Cross-race comparison happens within a distance and discipline, never across them.

For each base course, LevelField uses a defined course profile and a defined set of typical conditions, factors such as terrain, temperature, humidity, elevation, and wind. When the system produces a LevelField Base result, a LevelField Global result, or any cross-race ranking, every athlete is being projected onto the same underlying course in the same underlying conditions. Differences in finish time at that hypothetical race reflect differences in performance, not differences in where each athlete actually raced.

The specific definition of each base course is held internally. A base course doesn’t need to be an actual race anyone has run. It’s a reference profile, calibrated to be representative of typical conditions for its distance and discipline, and held constant so that every athlete’s leveled time can be compared against every other athlete’s on equal terms.

What matters publicly is the principle: the same base course, applied identically to every result within a distance and discipline, makes cross-race comparison possible. The internal definition is the methodology infrastructure that supports the principle.

Race Equivalents

A leveled time tells you how you performed. A Race Equivalent tells you what that performance would have looked like at a race you’ve heard of.

Boston Marathon. Kona. The races that anchor the imagination of many endurance athletes. Race Equivalents project your performance onto these benchmarks, so the question becomes: not just how did I do, but what did I do, in terms anyone in the sport understands.

A Boston Equivalent of 3:36 means: if you had run the Boston Marathon course, under typical Boston conditions, your performance translates to a 3:36 finish. The number is leveled for course and conditions but not for demographics, Race Equivalents are designed to tell you what the named course would have given you, not what you’d have run as a different athlete.

Two ways to ask the question.

The default Race Equivalent projects your performance onto the named race’s typical conditions. Boston’s typical April weather, Kona’s typical October heat. This is the version that makes a result comparable to other Race Equivalents calculated the same way, your 3:36 Boston Equivalent today is on the same terms as someone else’s 3:36 Boston Equivalent a year ago.

The enhanced Race Equivalent projects your performance onto a specific year’s actual conditions. What would your time have been if you’d run Boston in 2024’s actual weather? Or 2018’s nor’easter? Or any other year on record? The enhanced version answers a different kind of question, narrower, more specific, more useful for athletes who want to know how their performance compares to a particular race they witnessed or remember.

Beyond Boston and Kona.

Race Equivalents work for any course in the system with a defined profile and conditions history. Boston and Kona are the canonical examples because they’re recognized across the sport, but the same architecture supports equivalents for other recognized races, and the set continues to grow as more races are added.

What Race Equivalents are not: a way to compare across distances or formats. A 5K Race Equivalent is calculated against a 5K reference race. A marathon Race Equivalent is calculated against a marathon reference race. The same constraint that governs the base course governs reference races: comparison happens within a distance and discipline, never across them.

Validation

Most race-result adjustment systems can’t validate themselves. They take a finish time, apply a formula, produce a number. There’s no prediction step to test against, no closed loop, no way to ask whether the adjustment was right or wrong.

No other leveling system is even attempting to.

FitLogic was built around the opposite premise. From its foundation, the engine has produced predictions about what athletes will do at races, given their training, their physiology, the course, the conditions, and then compared those predictions against what actually happened. Forecasts, training prescriptions, race outcomes: every one of them creates an opportunity to test whether the model was right and refine it where it wasn’t. That closed loop has been running, continuously, for over a decade, across hundreds of thousands of athletes.

This is why LevelField’s leveling can be trusted. It’s the same engine, the same physics and physiology, the same calibration that has been refining itself against real-world outcomes for years. The reverse-direction logic that turns a finish time into a leveled result is the inverse of the forward-direction logic that has, all along, been making predictions about what athletes can do, and being tested on those predictions against what they actually did.

Validation isn’t a feature LevelField added. It’s the architecture LevelField was built on. And no other leveling system has it, because no other leveling system was designed to.

A foundation for a global standard

Endurance sports have never had a real performance standard. There are clock times, age-group rankings, course difficulty discussions, generations of debate over which races count and which courses are fast and which years had hard weather. What’s been missing is a way to say, with precision: this athlete performed at this level, comparable to that athlete at that race, on terms anyone can recognize.

LevelField is the architecture for that standard. The same physics and physiology, applied to every result, projected onto reference points held constant for everyone, validated continuously by a closed-loop engine that’s been refining itself for over a decade. The technology is available to serve as a global standard for endurance performance, wherever the sport chooses to adopt one.

The model will keep improving. More data refines it. Edge cases reveal places it can be sharpened. New distances, new disciplines, new races extend it. The point isn’t to be perfect. The point is to be more honest than the alternatives, to give every athlete a result that means what it claims to mean, and a ranking earned against the same standard as everyone else’s.

A finish time, leveled. A performance, made comparable. A result, measured rather than sorted.

Why this exists