JSON Libraries and Source Code Generators

I would say most of my time over the course of a month is spent working on or maintaining my NuGet packages. As tools I want them to be as useful as possible to as many .NET developers as possible.

Problem

Most of the packages I've written deal with JSON serialization and deserialization. Be it Alexa, Slack, Tumblr - object models for JSON structures are a big part of the codebase. As a .NET developer I wrote these using the Newtonsoft.Json package as it was the default for most projects at the time.

Since then .NET has moved forward, and the default for .NET has changed to System.Text.Json and these libraries require different attributes and different wiring for areas like converters - which means that I'm a little stuck.

A lot of users and a lot of code still use Newtonsoft, so I can't just drop support for that, plus conversion is no cake walk on such a large amount of code (if I change a base library all the others need to be able to move acrosss pretty quickly) and the new code has to be in step whenever there's a change.
With System.Text.Json now being a .NET default - developers now need extra wiring up code to get Newtonsoft set up, especially in areas like serverless, so not doing anything has no cost to me but adds a little cost to each developer using the packages.
The mess involved in making the packages work with both alongside one another make for some unpleasant code, and my concern is that it would put off members of the community getting involved.

So with these items in mind the issue has been thorny enough that although I think there could be some benefits from supporting System.Text.Json I've left it alone, because until the cry was loud enough it didn't justify the ongoing cost.

Hmmm....I wonder if...

So this has been the situation for a while. But then DotNet Conf 2020 happened back in November and I was particularly interested in a talk by David Wengier about source code generators. A relatively new concept, they allow you to introduce new code into the compilation process, code that doesn't have to be persisted but can be included in the final result. Also they can't alter the existing code that has been sent to the compiler - so it has to be an additive process.

And this got me thinking...

My biggest concern with the JSON library switch is maintainance, but what if the libraries were able to maintain a base object model - and the compilation process filled in the gaps using source code generators? There would be some addition to maintainance but the overall complexity would stay within the realms of sanity.

And the idea of JsonLibraryGenerator is born! A POC source code generator that can be added to a project in order for that package to support both Newtonsoft and System.Text.Json with a reduced overhead. As I've said areas like converters etc have to be written twice - but the hope is that I can get it small enough to justify the effort to move my libraries across.

So as I start, here's the list of things I can think of I need to investigate

NuGet packages - have to be conditional so that the final package only has a reference to one library.
Generators - adding one and getting it working
Marker attributes - how do we know what we're trying to generate?
Classes - If a generator can only produce new code, what changes have to happen to the classes to ensure they can be added to correctly?

Oh yeah, I have a Sample project (almost forgot)

I have a JsonGeneratorSample project in the solution and there is a single test model inside it.

    [JsonSwitch]
    [Browsable(false)]
    [EditorBrowsable(EditorBrowsableState.Never)]
    public class TestModelBase
    {
        [JProperty("test")] public virtual string Test { get; set; }

    }

Don't worry about the JsonSwitch and JProperty attributes - these are marker attributes I've created so that I'm not reliant on either library in the model code I store.

Now I have the project I have to add the Source Code Generator project to it - each time the compiler runs against my sample project, it will run the source code generator code too. I add it by altering the sample project .csproj file and adding the following line

<ProjectReference Include="..\JsonLibraryGenerator\JsonLibraryGenerator.csproj" OutputItemType="Analyzer" ReferenceOutputAssembly="false" />

Nuget Packages (and the switch between frameworks)

Okay, so System.Text.Json is part of the .NET framework - so for now I'm going to use the existence of a Newtonsoft.Json reference in my sample project as my switch. If the reference exists, or both exist, I'll build a Newtonsoft compatible class - otherwise System.Text.Json (I should also add a failure condition if I can't find either)

Now the last thing I want is for devs to be adding and removing references all the time, so I found something I'd never needed to worry about before - you can make a reference dependent on a preprocessor directive.

So in your IDE you can do this

And then you can put this inside your .csproj file to toggle your package reference

  <Choose>
    <When Condition="$(DefineConstants.Contains(NEWTONSOFT))">
      <ItemGroup>
        <PackageReference Include="Newtonsoft.Json" Version="12.0.3" />
      </ItemGroup>
    </When>
  </Choose>

So that's not as bad a I thought it might be.

The actual source code generator

So to create a generator we need to create a new class of type ISourceGenerator

    [Generator]
    public class JsonLibraryGenerator:ISourceGenerator
    {
        public void Initialize(GeneratorInitializationContext context){}

        public void Execute(GeneratorExecutionContext context){}
    }

This is what you're going to add to your project to generate the code. The next thing I need to sort out is how I'm going to identify the classes I need to manipulate.

There are more complicated methods which could determine this - but for now I'm just going to ask that a specific attribute be placed on the class. So actually my code only needs to run when that kind of attribute is in play. This means I can create and register what's called a SyntaxReceiver - a class which can examine each node and only pay attention to those it's interested in.

In my case if I find the attribute I'm after then I want the class it's attached to (the tree goes Attribute<-AttributeList<-Class)

    public class JsonSwitchReceiver:ISyntaxReceiver
    {
        public ClassDeclarationSyntax ClassToSupercede { get; set; }

        public void OnVisitSyntaxNode(SyntaxNode syntaxNode)
        {
            if (syntaxNode is AttributeSyntax attribute && attribute.Name.ToFullString().Contains("JsonSwitch"))
            {
                ClassToSupercede = attribute.Parent.Parent as ClassDeclarationSyntax;
            }
        }
    }

and I can then register this with the generator. Now each time my execute method is called I can check to see if I have a JsonSwitchReceiver and if that receiver has a class I'm interested in - otherwise it can shortcut the execution.

public class JsonLibraryGenerator:ISourceGenerator
{
  public void Initialize(GeneratorInitializationContext context)
  {
      context.RegisterForSyntaxNotifications(() => new JsonSwitchReceiver());
  }

  public void Execute(GeneratorExecutionContext context)
  {
      if(context.SyntaxReceiver is JsonSwitchReceiver switchReceiver && switchReceiver.ClassToSupercede != null)
      {
        //logic goes here
      }
  }
}

Now I know I'm interested I can check to see if Newtonsoft is part of the compilation process.

The context object has a wealth of data on what's being compiled including the referenced assemblies (Note: yes I'm using Contains quite a bit - that's because I'm still learning and I want to get a positive result so I can further examine exactly how this information is stored and refine it as I move forward)

var hasNewtonsoft = context.Compilation.ReferencedAssemblyNames.Any(ran => ran.Name.Contains("Newtonsoft"));

I can also do the same for System.Text.Json - if neither are found then I need to raise an error as someone is expecting this class to be converted but I can't determine by which one. This appears in the error list the same as any other error would

context.ReportDiagnostic(Diagnostic.Create(new("JsonSwitch002",
            "No JSON Preference found",
            "Found no references to either Newtonsoft or System.Text.Json - unable to build without Json framework preference",
            "JsonSwitch",
            DiagnosticSeverity.Error,
            true), Location.None));

So at this point my source code generator is getting itself into a nice place where it might actually be able to, ya know, generate source code! Exciting stuff.

So this needs to be broken down into two parts. Figuring out what I'm generating, and then building the source code.

Note - yes, I know you can use strings!

I wanted to take this moment, before I start delving into the code generation part, to mention that almost all the examples you see out in the world tell you to build your code using strings. In my time I've dealt with several very different systems that built source code from ones for myself to those involving whole teams and although I agree you can use strings, my experience has been that any code generation of sufficient complexity isn't made easier by that fact. Bits of statements lying around in code are easy to scan past without really seeing them and I find raw strings much more error prone once you have more than one person altering code. Not only that - but in this case I'm building a class based on the existing structure of a class that already exists, so it's easier to manipulate a nicely parsed object model :)

This is why in my examples you'll see the use of object models to build my code. I can take full advantage of the IDE and I find it a much less error prone approach in the long run. This is my preference and is not a comment on yours, just wanted to make that really super clear to avoid comments later on.

Building the code!

Okay, so I'm not going to show the whole process for this, that's why I have a repo. But I think it's useful to see a simple example.

As I can't alter the existing code base (although really that's what I'd rather do) I can't change the existing class, but the properties on it need to be wired up. If this were new properties I could just say that the class has to be partial - but as we're altering bits of an object model I need to be able to override those properties and attach the appropriate attribute to my override.

So along with the marker attribute for the class, I'm going to look for all the virtual properties the class has - and I'm going to look for a marker property that says what the json name needs to be.

From that I create an overriding class and properties with the correct attribute on them (there's a lot of wiring that's going on around this, but this is the main bit.)

public string Generate(ClassDeclarationSyntax classToOverride, string newClassName)
{
  var props = GetProperties(classToOverride);
  return SyntaxFactory.ClassDeclaration(newClassName)
    .AddBaseListTypes(SyntaxFactory.SimpleBaseType(SyntaxFactory.ParseTypeName(classToOverride.Identifier.Text)))
    .AddModifiers(classToOverride.Modifiers.ToArray())
    .AddMembers(
        props.Select(p =>
            {
                var newModifierList = new SyntaxTokenList(p.Modifiers.Where(m => m.Kind() != SyntaxKind.VirtualKeyword)
                    .Concat(new[] {SyntaxFactory.Token(SyntaxKind.OverrideKeyword)}));
                return SyntaxFactory.PropertyDeclaration(p.Type, p.Identifier)
                    .WithModifiers(newModifierList)
                    .AddAttributeLists(JsonLibraryAttribute(GetJproperty(p)))
                    .AddAccessorListAccessors(p.AccessorList.Accessors.ToArray());
            })
            .Cast<MemberDeclarationSyntax>().ToArray());
    .NormalizeWhitespace().ToFullString();
}

private IEnumerable<PropertyDeclarationSyntax> GetProperties(ClassDeclarationSyntax classToSupercede)
{
    return classToSupercede.Members
        .OfType<PropertyDeclarationSyntax>()
        .Where(pds => pds.Modifiers.Any(m => m.Kind() == SyntaxKind.VirtualKeyword))
        .Where(pds => GetJproperty(pds) != null);
}

private AttributeSyntax GetJproperty(PropertyDeclarationSyntax syntax)
{
    return syntax.AttributeLists.SelectMany(al => al.Attributes.ToArray())
        .FirstOrDefault(a => a.Name.ToFullString().Contains("JProperty"));
}

protected abstract AttributeListSyntax[] JsonLibraryAttribute(AttributeSyntax getJproperty);

The objects you get back are immutable, so every .AddXXX() return an updated version of the object, allowing us to chain these together.

This block is pretty big, one of the downsides of using an object model is that the code is definitely more verbose. But it creates a new overriding class, adding in the modifiers from the original, and then adds a new property for each of the original properties both marked as virtual and with the marker JProperty attribute. This attribute doesn't do anything - it just allows me to use it so each of the generators can override the GetProperties method and add the appropriate attribute to the new class.

At the end we ask for some normalised whitespace so it can be read and parsed, and return the full string. We have our class!

Here's an example of the Newtonsoft class generator to show how it ensures that the right attribute is added to each property

    public class NewtonsoftGenerator:JsonClassGenerator
    {
        protected override AttributeListSyntax JsonLibraryAttribute(AttributeSyntax propertyAttribute)
        {
            var propertyName = propertyAttribute.ArgumentList.Arguments.First().Expression.ToFullString();

            var argument = SyntaxFactory.AttributeArgument(
                SyntaxFactory.LiteralExpression(SyntaxKind.StringLiteralExpression, SyntaxFactory.Token(default, SyntaxKind.StringLiteralToken, propertyName, propertyName, default)));

            return SyntaxFactory.AttributeList(SyntaxFactory.SeparatedList(new[]
            {
                SyntaxFactory.Attribute(SyntaxFactory.ParseName("JsonProperty")).AddArgumentListArguments(argument)
            }));
        }
    }

Again, this is a basic example - there are other arguments you may need to support, in which case you'd parse out the original list and work from there to build it up. But hopefully you can see how although this code might be complicated, it places all that complication into the generator and leaves your actual JSON libraries clean.

So now we have the class (and with the understanding we're skipping over a bit in terms of things like ensuring namespace matches etc) we have to add it into the compilation process.

var generatorReference = Guid.NewGuid().ToString("N");
var overrideClass = classGenerator.Generate(classToUpdate, "Testing123");
context.AddSource(generatorReference, overrideClass);

The generator reference should be something a little more meaningful - as it's what's shown in areas such as the Error List if your generator produces bad code, but this guid is purely for ease of getting my example working. Now the class has been added the compilation proceeds and (fingers crossed there's no compilation errors) I get the regular output. Except it's a little more than meets the eye.

If I take a look at my assembly now using DotPeek I see this!

A DotPeek window showing the new Testing123 class

Eureka! Only a simple example but the code is written in such a way that it would fly through any number of models and properties and spin this up (assuming the appropriate markers of course).

Possible Next Steps...

There's so much to do after this, as all I've done so far is satisfy my curiosity that it can be done.

Marking Newtonsoft properties to be ignored if null is a big one.
Wiring up custom converters is going to be interesting - I think that would require more preprocessor work so that we didn't end up with duplicate types.
Interfaces on System.Text.JSON! Huge problem is you can't currently place a custom converter on an interface in System.Text.Json so maybe a way of identifying interface properties and adding the custom converter to each property as the interface itself has to be kept blank?

And there's loads more after that, but this was just off the top of my head.

...and next source code generators?

I really like the way you can build source code generators. I have an existing library that builds code for .abc files for Alexa skills, but it's a dotnet tool you have to run and it uses CodeDom. I think I'd like to revisit that and wire it up so it uses SyntaxFactory and it works as a Source Code Generator too - that way it would build the code as you were altering the file - much cleaner and just a standard part of the build process.

Yeah - Source Code Generators? Definitely a fan!