November 2020’s release of Visual Assist had some significant memory usage improvements, great for those with large projects. Here’s how we did it.
Written by David Millington (Product Manager), Goran Mitrovic (Senior Software Engineer) and Christopher Gardner (Engineering Lead)
Visual Assist is an extension which lives inside Visual Studio, and Visual Studio is a 32-bit process, meaning that its address space, even on a modern 64-bit version of Windows, is limited to a maximum of 4GB. Today 4GB doesn’t always go very far. A typical game developer, for example, may have a large solution, plus perhaps the Xbox and Playstation SDKs, plus other tools – already using a lot of memory – plus Visual Assist. A lot of Visual Assist is built around analysis of your solution, and that requires storing data about your source code, what we call the symbol database. Sometimes, with all of these things squashed into one process’ memory space together, some users with large and complex projects start running out of memory.
The latest version of Visual Assist (build 2393, 28 Oct 2020) reduces in-process memory consumption significantly. It’s a change we think will help many of you who are reading this, because those with large solutions who run into memory issues should find those memory issues alleviated or disappearing completely; and those with small or medium solutions may find Visual Assist runs a little faster.
Plus, it’s just part of being a good member of the Visual Studio ecosystem: as an extension relied on by thousands of developers, yet sitting in the same shared process as other tools, we should have as little impact as we can.
The chart above shows that the new build of Visual Assist reduces VA’s memory usage by about 50% – that is, half as much memory is used. You can see more charts below, and we consistently see 30%, 40% and 50% memory savings.
We’d like to share some technical information about the approaches we took to achieve this, and we hope that as C++ or Windows developers you’ll find the work interesting. This is a fairly long blog, and starts off light and gets more technical as it goes on. It covers:
- Concepts
- Background – Where Visual Assist Uses Memory
- String Storage Optimizations
- Moving Out Of Process: Entire Parser
- Moving Just Memory
- Chunk / Block Structure and Allocation / Freeing
- Optimizations
- Results
Table of Contents
Concepts
Many readers may know this, but to follow this post here’s a quick primer on some concepts. Feel free to skip to the next section if you’re already familiar.
Address space refers to the total amount of memory an application can use. It’s usually limited by the size of a pointer (because a pointer points to memory): a 32-bit pointer can address 232 (about 4 billion) bytes, which is 4GB. So we say that a 32-bit process has a 32-bit address space, or an address space of 4GB. (This is hand-wavy – although true, on 32-bit versions of Windows this 4GB was split between kernel and user mode, and a normal app can only access usermode memory; depending on settings, this actually meant your app could access only 2GB or 3GB of memory before it could not allocate any more. On a 64-bit version of Windows, a 32-bit app has the entire 4GB available. Plus, there are techniques to access more memory than this even on 32-bit systems, which is getting ahead of where this blog post is going.)
This address space is a virtual address space; the application uses virtual memory. This means that the 4GB is not actually a sequential block of memory in RAM. Pieces of it can be stored anywhere, including on disk (swapped out) and loaded in when needed. The operating system looks after mapping the logical address, the address your application uses, to the actual location in RAM. This is important because it means that any address you use has what backs it, where it actually points to, under Window’s control.
A process is how your application is represented by the operating system: it is what has the virtual address space above, and contains one or more threads which are the code that runs. It is isolated from other processes, both in permissions and memory addresses: that is, two processes do not share the same memory address space.
If you’re really familiar with those concepts, we hope you’ll forgive us for such a short introduction. OS architecture including processes, threads, virtual memory etc is a fascinating topic.
Background – Where Visual Assist Uses Memory
Visual Assist parses your solution and creates what we call the ‘symbol database’, which is what’s used for almost every operation – find references, refactoring, our syntax highlighting (which understands where symbols are introduced), generating code, etc. The database is very string-heavy. It stores the names of all your symbols: classes, variables and so forth. While relationships between symbols can be stored with relatively little memory, strings themselves take up a lot of space in memory.
Our first work on memory optimization focused on the relationships, the links between data, and metadata stored for each symbol, and we did indeed reduce the memory they used. But that left a large amount of memory used by strings generated from source code. In addition our string allocation patterns meant strings were not removed from memory when memory became full and there was allocation pressure, but instead at predefined points in code, and that also increased the risk for large projects of getting out of memory errors.
Clearly we needed to solve string memory usage.
We researched and prototyped several different solutions.
String Storage Optimizations
Stepping back in time a little, string memory usage is obviously not a new problem. Over time, Visual Assist has used several techniques to handle string storage. While there are many possible approaches, here’s an overview of three, one of which Visual Assist used and two of which it has never used, presented in case they are new or interesting to you.
String interning addresses when there are many copies of the same string in memory. For example, while Visual Assist has no need to store the word “class”, you can imagine that in C++ code “class” is repeated many times throughout any solution. Instead of storing it many times, store it once, usually looked up by a hash, and each place that refers to the string has a reference to the same single copy. (The model is to give a string to the interning code, and get back a reference to a string.) Visual Assist used to use string interning, but no longer: reference counting was an overhead, and today the amount of memory is not an issue itself, only the amount of memory used inside the Visual Studio process.
While we’re discussing strings, here are two other techniques that are not useful for us but may be useful for you:
- To compress strings. Visual Assist never used this technique: decompression takes CPU cycles, and strings are used a lot so this would occur often.
- To make use of common substrings in other strings: when a string contains another string, you can refer to that substring to save memory. A first cut might use substrings or string views. This is not useful for us due to the overhead of searching for substrings, which would be a significant slowdown, and the nature of the strings Visual Assist processes not leading to enough duplication of substrings to make it worthwhile. A more realistic option (in terms of performance, such as insertion) for sharing string data than shared substrings would be to share data via prefix, such as storing strings in tries. However, due to the nature of the strings in C++ codebases, our analysis shows this is not a valuable approach for us.
The approach we took to the recent work was not to focus on further optimising string storage, but to move where the string storage was located completely.
Moving Out Of Process: Entire Parser
Visual Assist is loaded into Visual Studio as a DLL. The obvious approach to reducing memory pressure in a 32-bit process is to move the memory – and whatever uses it – out of process, which is a phrase used to mean splitting your app up into multiple separate processes. Multi-process architectures are fairly common today. Browsers use multiple processes for security. Visual Studio Code hosts extensions in a helper process, or provides code completion through a separate process.
As noted in the primer above, a second process has its own address space, so if we split Visual Assist into two, even if the second was still a 32-bit process it could double the memory available to both in total. (Actually, more than double: the Visual Studio process where Visual Assist lives has memory used by many things; the second process would be 100% Visual Assist only. Also, once out of Visual Studio, we could make that second process 64-bit, ie it could use almost as much memory as it wanted.) Multi-process architectures can’t share memory directly (again hand-wavy, there are ways to read or write another process’s memory, or ways to share memory, but that’s too much for this blog); generally in a multi-process architecture the processes communicate via inter-process communication (IPC), such as sockets or named pipes, which form a channel where data is sent and received.
There were two ways to do this. The first is to move most of the Visual Assist logic out to a second process, with just a thin layer living inside Visual Studio. The second is just to move the memory-intensive areas. While the first is of interest, Visual Assist has very tight integration with Visual Studio – things like drawing in the editor, for example. It’s not trivial to split this and keep good performance. We focused instead on having only the parser or database in the second process.
Prototyping the parser and database in a second process, communicating with IPC back to the Visual Studio process, showed two problems:
- Very large code changes, since symbols are accessed in many places
- Performance problems. Serializing or marshalling data had an impact. Every release, we want VA to be at least the same speed, but preferably faster; any solution with a performance impact was not a solution at all
Moving Just Memory
Multi-process is not a required technique, just one approach. The goal is simply to reduce the memory usage in the Visual Studio process and move the memory elsewhere; we don’t have to do that by moving the memory to another process—and this is a key insight. While multiprocess is a fashionable technique today, it’s not the only one for the goal. Windows provides other tools, and the one we landed on is a memory mapped file.
In the primer above, we noted that virtual memory maps an address to another location. Memory-mapped files are a way to map a portion of your address space to something else – despite the name, and that it’s backed by a file, it may not be located on disk. The operating system can actually store that data in memory or a pagefile. That data or file can be any size; the only thing restricted is your view of it: because of the 32-bit size of your address space, if the mapped file is very large you can only see a portion of it at once. That is, a memory mapped file is a technique for making a portion of your address space be a view onto a larger set of data than can fit in your address space, and you can move that view or views so, although doing so piece by piece, ultimately a 32-bit process can read and write more data than fits in a 32-bit address space.
Benchmarking showed memory mapped files were significantly faster than an approach using any form of IPC.
The release notes for this build of Visual Assist referred to moving memory ‘out of process’. Normally that means to a second process. Here, we use the term to refer to accessing memory through a file mapping, that is, accessing memory stored by the OS, and potentially more memory than can fit in the process’ address space. Referring to it as out of process is in some ways a holdover from when the work was planned, because we had initially thought we would need multiple processes. But it’s an accurate term: after all, the memory is not in our process in a normal alloc/free sense; it’s an OS-managed (therefore out of process) resource into which our process has a view.
Note re ‘memory mapped files were significantly faster than … using any form of IPC’: sharing memory mapped files can be a form of IPC as well – you can create the mapped file in one process, open in another, and share data. However, we had no need to move logic to another process – just memory.
Chunk / Block Structure and Allocation / Freeing
We already mentioned that the data we stored in the memory mapped file is strings only. In-process, we have a database with metadata fields; in the mapped view are all strings. As you can imagine any tool that processes source code uses strings heavily; therefore, there are many areas in that mapped memory we want to access at or near-at once.
Our implementation uses many file mappings to read and write many areas at once. To reduce the number, each is a fixed size of 2MB. Every one of these 2MB chunks is a fixed heap, with blocks of a fixed size. These are stored in lists:
- 2MB chunks that are mapped, and locked as they are being used to read or write data
- Chunks that are mapped, but unused at the moment (this works like a most recently used cache, and saves the overhead of re-mapping)
- Chunks that are not mapped
- Chunks that are empty
Most C++ symbols fit within 256 bytes (note, of course, this is not always the case – something like Boost is famous for causing very long symbol names.) This means most chunks are subdivided into blocks 256 bytes long. To allocate or free is to mark one of these blocks as used or unused, which can be done by setting a single bit. Therefore, per two-megabyte chunk, one kilobyte of memory is enough to represent one bit per block and to allocate or deallocate blocks within a chunk. To find a free block, we can scan 32bits at a time in a single instruction, BitScanForward, which is handily an intrinsic in Visual C++.
This means that Visual Assist’s database now always has a 40MB impact on the Visual Studio memory space—twenty 2MB chunks are always mapped—plus a variable amount of memory often due to intermediate work, such as finding references, which can be seen in the charts below and which even in our most extreme stress test topped out at a couple of hundred megabytes. We think this is very low impact given the whole 4GB memory space, and given Visual Assist’s functionality.
Optimizations
There are a number of optimizations we introduced this release, including improvements for speedily parsing and resolving the type of type deduction / ‘auto’ variables, changes in symbol lookup, optimisations around template handling, and others – some of which we can’t discuss. I’d love to write about these but they verge into territory we keep confidential about how our technology works. We can say that things like deducing types, or template specialisation, are faster.
However we can mention some optimizations that aren’t specific to our approach.
The best fast string access code is code that never accesses strings at all. We hash strings and use that for string comparison (this is not new, but something Visual Assist has done for a long time.) Often it is not necessary to access a string in memory.
Interestingly, we find in very large codebases with many symbols (millions) that hash collisions are a significant problem. This is surprising because hash algorithms usually are good at avoiding collisions, and the reason is that we use an older algorithm, and one with a hash value only 32 bits wide. We plan to change algorithms in future.
We also have a number of other optimizations: for example, a very common operation is to look up the ancestor classes of a class, often resolving symbols on the way, and we have a cache for the results of these lookups. Similarly, we reviewed all database lookups for areas where we can detect ahead of time a lookup does not need to be done.
Our code is heavily multi-threaded, and needs to be performant, avoid deadlocks, and avoid contention. Code is carefully locked for minimal locking, both in time (duration) and scope (to prevent locking a resource that is not required to be locked, ie, fine-grained locking.) This has to be done very carefully to avoid deadlocks and we spent significant time analysing the codebase and architecture.
In contrast we also reduced our use of threads. In the past we would often create a new thread to handle a request, such as to look up symbol information. Creating a thread has a lot of overhead. Now, we much more aggressively re-use threads and often use Visual C++’s parallel invocation and thread pool (the concurrency runtime); the performance of blocking an information request until a thread is available to process it is usually less than creating a thread to process it.
Symbols are protected by custom read-write locks, which is written entirely in userspace to avoid the overhead of kernel locking primitives. This lock is custom written by us, and is based around a semaphore which we find provides good performance.
Results
All these changes were done with careful profiling and measurement at each stage, testing with several C++ projects. These included projects with 2 million symbols, 3 million symbols, and 6 million symbols.
The end result is the following:
- For very large projects – we’re talking games, operating systems, etc – you will see:
- 50% less memory usage between low/high peaks doing operations such as Find References (average memory usage difference is less once the operation is complete; this is measured during the work and so during memory access.)
- Identical performance to earlier versions of Visual Assist
- For smaller projects:
- Memory usage is reduced, though for a smaller project the effect is not so noticeable or important. But:
- Improved performance compared to earlier versions of Visual Assist
In other words, something for everyone: you either get far less memory pressure in Visual Studio, with the same performance, or if memory was not an issue for you then you get even faster Visual Assist results.
Here are the results we see. We’re measuring the memory usage in megabytes after the initial parse of a project, and then after doing work with that project’s data—here, doing a Find References, which is a good test of symbol database work. Memory usage here is ascribed to Visual Assist; the total memory usage in the Visual Studio process is often higher (for example, Visual Studio might use 2GB of memory with a project loaded between itself and every other extension, but we focus here on the memory used by Visual Assist’s database and processing.)
One good test project is Unreal Engine:
Another excellent large C++ codebase is Qt:
Our feedback has been very positive: we often work with customers with very large codebases and significant memory usage from many sources inside Visual C++, and the lower impact of Visual Assist has made a noticeable difference. That’s our goal: be useful, and be low-impact.
We hope you’ve enjoyed reading this deep dive into the changes we made in Visual Assist build 2393. Our audience is you – developers – and we hope you’ve been interested to get some insight into the internals of a tool you use. While as the product manager I get to have my name on the post, it’s not fair because I did nothing: the credit goes to the engineers who build Visual Assist, especially Goran Mitrovic and Sean Echevarria who were engineers developing this work, and Chris Gardner our team lead, all of whom helped write this blog post.
A build of VA with the changes described in this post is available for download now, including trying it for free if you’d like. If you’re new to VA, read the quick start or browse our feature list: our aim is not to have the longest feature list but instead to provide everything you need, and nothing you don’t. We also do so with minimal UI – VA sits very much in the background. Our aim is to interrupt or distract you as little as possible while being there when you need it. You can see the kind of engineering we put into VA. It’s built by developers for developers.
You’re mistaken when you say: “after all, the memory is not in our process.”
Yes, it is. There are 4 GB of space to share by any stakeholder in the same process. For example, if there are four visual studio addins to use the same technique as yours, they will have to share the 4 GB memory; they can’t have 4 GB each.
Therefore, apparently, what you did, is a dedicated memory manager to replace malloc/free. This can indeed increase performance if it is well-tailored to your needs.
But this can’t be called IPC in any way.
Loving VA btw 😉
Thanks for the comment!
Yes, the 4GB has to be shared by all of Visual Studio and its extensions — that’s what spurred this work. And we don’t use IPC at all.
We called it ‘out of process’ originally because we were planning multiple processes with IPC, but the term stuck and we didn’t feel it particularly inaccurate because when the string data is in a memory mapped file, that data as a whole does not take up this process’s address space, until portions of it are mapped in. That’s what we do (about 40MB is permanently mapped in.) So if we have, say, 200MB of string data, only 40MB are in-process (or, in-process-address-space.) Perhaps ‘out of process address space’ is a more accurate term.
And — thanks for using VA! 🙂