Models are point clouds

Models

Do you ever think about point clouds?

To make a point, conceptually you shoot out a ray from some source with a known location and direction, and measure how far away it strikes. If you do that many times, you get a point cloud. An outline of a thing, from some perspectives.

pointcloud gif public domain from wikipedia

That's not a torus. But you can see how the points could conceivably have been made by recording a torus. It's just a bunch of observations.

Models are not too unlike this, except that instead of modeling an object, they purport to model a reality.

When you're training a model on some data, it does not have access to the original perspective. There's no context or meaning. So to compensate for lacking a perspective, models attempt to pretend something tantamount to all data represents all perspectives on all data. That is plainly nonsense, and probably gives a not-too-inaccurate mental model for why models hallucinate, but it does let you trace a form.

All models are wrong, but some are useful.

This saying from statistics is apt for ML models... which are just statistics anyway.

If you were to spray a laser array 400 billion times and record what it measured, you would be able to pick out the torus and the room housing it. That's not partifularly magical, and taking those 400 billion measurements is incredibly expensive. Having performed those measurements, how accurate is your torus model?

Let's make some simplifying assumptions:

Your taurus occupies a 1 cubic foot area in the center of the room
Your room is 12'x12'x12'
Your laser can be positioned with 1/16" precision in all 3 directions (and it is of size 0'x0'x0' for convenience)
Your laser can be rotated in 1/8 degree increments horizontally, and 1/8 degree units vertically

By my napkin math you have (1216)^3 * (3608)^2, or nearly 59 trillion possible measurements of a torus in a room. I'd need to run a simulation to figure out how many of these even hit the torus, but even for something as simple as this you can see that we'll only account for 0.15% of the perspectives in the small training room.

What you get will not be a torus. It will contain a point cloud with the trace of a torus. However legible it might be, it will not be the true representation.

Models train by trying to fit these points ever closer to the target "shape thought," which differs from the point cloud analogy. Imagine all 400 billion points randomly dispersed throughout the whole room space. Each new observation ostensibly shifts the points overall a little closer to the torus. Through this strategy you might get many points on the torus, particularly if you put more weight or attention toward heuristics like curved surfaces.

Okay, so a 400 billion parameter model can collapse points tightly around a torus. But what about... everything? How many things are there, and how many essential elements to each are there? Are "local minima" actually smaller, but still important, facets even in the presence of a "global minima?" By skipping over local minima, do you lose fidelity of the torus of reality?

For popular entities, there may be many examples around which one might fit many points. But for less popular entities, how can a point cloud capture their shape? And particularly for those less popular entities, how many popular facets have an improper or non-causal modality with respect to the less popular entity? How do you tell the difference between a coincidental modality and an essential modality when you can't even reason about what the entities are?

0.15% perspective coverage in a small toy room. Throw trillions of parameters at it, but increase the size of the room to a city, or a country, or a world: Parameter chasing is utterly hopeless in the pursuit of making a real map of reality; but it can possibly produce some nifty and useful point clouds!

The map is not the territory.

I think it's helpful to remember that models are not reality, no matter what they may look like. You can paint the prettiest picture, but it will never be a waterfall, because the map is not the territory. You can add all the parameters that modern science and all the computers in Texas can possibly count, and you still just have a map, a point cloud, a model. It might well be quite useful, but never mistake it for the territory!

Models ​

Models