I have a rough outline of what I am calling the Real Won-Lost on teams 21-40 and am already seeing them move up or down from their ranking. I'm part way into 41-60, and have acquired bits about 1-20 without trying. I don't think I will go farther than 60, just run some numbers on the teams that actually make the tournament.
The next step, figuring in strength of schedule, has also hit a snag. I have a listing of the average ranking of each team's opponents. I was going to fool around with that by hand according to some linear factor, such as removing an adjusted loss from the record for each 5 or 10 points below 100 on average ranking and adding same for each 5 or 10 above. One plays with such things and settles on a number that seems to make things come out about right in some known situations, then applies them across the board.
But some teams with very few losses have them against a very weak average ranking of opponents, and my possible added losses result in a ridiculous number. Noticing that, I could see that this isn't linear. That is, there isn't that much difference between playing teams ranked around 110 and ranked around 150, and very little difference between playing teams ranked around 210 and 250. But there is a lot of difference between playing the teams ranked at 10 and those at 50. So I can't just pop in "add a win or subtract a loss for every seven points of stronger ranking."
Example: Georgetown's opponents' average ranking is 69. Belmont's is 247. I'm seeing that most tournament teams will have average opponents' ranking of around 120. Belmont is more than twice the distance south of average as Georgetown is north. Yet intuitively, I know that Georgetown's 69 represents about as much a harder schedule as Belmont's 247 is softer.
So the line bends, which reveals itself at the extremes. Difficulty increases as one moves toward 1 faster than it decreases as it moves away. The graph will have to be some some power of x (say, x to the 1.3 power) rather than linear. I would need enormous amounts of data, perhaps a decade's worth, to calculate that.
So I'll just estimate - 1.3 looks pretty good, and at least it's better than 1.0.
I may just look at the rank of the average team they lost to and the average team they beat instead, as that would likely hew closer to a linear equation.