Spatial Thinking

Research overview for spatial thinking to augment thought in framed two dimensional and spatial three dimensional environments for Author visionOS.

Internal & External Maps

It has been shown that information which relates to what we already know is more easily absorbed (van Kesteren, Ruiter, Fernández, Henson 2012*) (Bein, Reggev, Tompary 2018*). In other words, if you already have the knowledge structure, or ‘schema’ that new information fits, your brain doesn't have to work nearly as hard to build the synaptic connections, the Long Term Potentiation (LTP) for memorization to latch onto. A question then becomes how we can present and re-present knowledge structures outside our brains which can both connect to our internal schema, as well as to new information, helping us ‘make connections’.

Framed Maps

Framed, two dimensional depictions of knowledge structures have been studied and found to help learning, particularly well when the user is creating the structures and not merely consulting them (Schroeder, Nesbit, Anguiano, Adesope 2017*). This is why Author features a ‘Map’ view where the user can construct their own concept map/knowledge network. The theoretical backbone for this was my work on Liquid Information and interactions with my mentor Doug Engelbart.

The mechanism for this has been shown in what is called the ‘dual coding’ theory, by Allan Paivio which shows that the brain processes verbal information and visual/spatial information through two separate but interconnected systems, and that encoding something through both systems simultaneously creates more retrievable memory traces. This is a compelling example of the notion of Extended Mind, where our tools and minds merge.

Spatial Maps

The research and implementation work I am doing now is to augment our ability to learn and think spatially. Our species did not evolve looking and thinking at framed rectangles, the world was quite literally their oyster.

Though we have benefited from portable text, in books and on screens, because of their abstraction of what is around you, framed media necessarily features different affordances than spatial media, and we are only beginning to understand what that means, and what it should do. With the invention of printing, relationships had to be described rather than positioned in space. We can now extend this. We can now return to the cognitive architecture our brains were built for, by providing environments more like what our brains evolved in.

Liquid Information

Spatial knowledge work is different from working in framed environments, but the underlying information needs to be available in both environments to make such work viable.

Currently, import of user information when in the same ecosystem (Apple macOS and visionOS) is trivial, and it is not overly complicated in environments like Windows to Meta either, but the information format remains the same.

I have therefore been experimenting with how information from framed environments can be spatialized by implementing an ‘Ask AI’ function called ‘Define Concepts’ which analyses the user’s text in ‘Author’ to extract all defined concepts from the user’s document, such as student notes. These defined concepts are only as defined by the notes, this process does not gather external information. These are then displayed in the Map view, for the user to arrange as they work through them. When the user puts on the Apple Vision Pro it first appears as ‘nodes’ on a virtual wall where they can easily and smoothly be moved along the XY and Z axis.

Scaffolding

This freedom to place nodes of knowledge in space is what produces the “wow” with new users, as though it’s an ancient capability that’s been made available for them. The limit of this freedom quickly becomes apparent however, when they make a ‘mess’. This is why the system features many commands for the user to structure their space at will, through layouts, sorting and choosing what to Focus on, what to Hide and so on.

It is also the reason I have started to experiment with the notion of ‘Context’ nodes. These are exactly the same, technically, as any other node, with the expectation that the user imports them into new work to provide context. This means that contextual nodes will appear as expected and they can be excluded from interactions such as ’Select all Persons’ since persons who are context should remain where they are to be easily found and seen in relation to others. This relates to our hippocampus, which is central to LTP. It is also the brain’s spatial navigation engine, containing place cells and it has been repurposed for higher memory and learning.

The Research

The research question is a vector from the work of Doug Engelbart, simply asking: How richly can we augment thought, learning and communication with richly interactive information in spatial environments?

research questions for interactions

Basic interactions for selecting one or more objects, how to move them, organize them based on different criteria, using common and custom gestures, toolbar, speech, and on-node controls (when possible) will continue to be vital work to let users get to proficient in the spatial environment.
More advanced interactions for Focusing on specific nodes, how to show & hide them as groups or individual nodes, how to ‘favorite’ or ‘like’ some in a way which will be useful later, in and out of XR, will require continuous polish.
Nesting & Connecting information to provide access to large knowledge structures without overwhelming the user is a key aspect of managing the space.
Context versus current knowledge has proven to be an important aspect of this work, since the work space is so large and can easily be overwhelming, resulting in the user needing to be able to separate what is being worked on at the moment from the context they need to have at hand.
Generation of spatial environment knowledge objects from various sources, textual, media and LLM based, will also be continuous effort.
In addition to the clear research, looking into, and supporting, completely different ways of interacting with knowledge in XR will continue to be a focus of the Future of Text Initiative, of which Author visionOS is a key project.

research question for information flow

We cannot afford to have knowledge locked up only in proprietary formats, though they will sometimes be necessary to unlock the potential of specialized interactions. It will be imperative to be able to share spatial knowledge structures as robustly, simply and openly as we can today with linear framed knowledge in ‘plain text’ documents, with a ‘plain text for XR’ approach, using https://visual-meta.info or similar as a starting point.

Frode A. Hegland, PhD
augmentedtext.info
thefutureoftext.org