Recently I was designing an AVRO schema and wanted to test how data would look like which conformed to this schema. I developed some Java code to generate sample data. This of course also has uses in more elaborate tests which require generation of random events. Because AVRO is not that specific, this is mainly useful to get an idea of the structure of a JSON which conforms to the definition. Here I’ll describes a simple Java (17 but will also work on 11) based solution to do this.

Generate random JSON data from an AVRO schema using Java avro to json 1 — AVRO schema to JSON data

Dependency

You only need a single dependency outside the regular JDK. Below this dependency as a Maven pom.xml snippet;

<dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro</artifactId>
    <version>1.11.0</version>
</dependency>

Schema

In the resources folder of my project I’ve put a file file.avsc witch contained my AVRO schema;

{
   "type" : "record",
   "namespace" : "nl.amis.smeetsm.schema",
   "name" : "Employee",
   "fields" : [
      { "name" : "name" , "type" : "string" },
      { "name" : "age" , "type" : "int" }
   ]
}

To develop the schema I’ve used the Apache Avro IDL Schema Support plugin in IntelliJ IDEA. This makes it especially easy to find errors in the schema during development.

Java

My (minimal) Java class to read the schema and generate random JSON which conforms to the schema;

import org.apache.avro.Schema;
import org.apache.avro.util.RandomData;
 
import java.io.*;
import java.util.Iterator;
 
public class AvroTest {
    public static void main(String [] args) throws IOException {
        AvroTest me = new AvroTest();
        ClassLoader classLoader = me.getClass().getClassLoader();
        InputStream is = classLoader.getResourceAsStream("file.avsc");
        Schema schema = new Schema.Parser().parse(is);
        Iterator<Object> it = new RandomData(schema, 1).iterator();
        System.out.println(it.next());
    }
}

The code is self-explanatory. It is easy to generate more random data this way for use in tests.

Output

When I run my Java class, it will generate output like;

1	`{"name":` `"cenmfi",` `"age": -746903563}`

Finally

AVRO schema are limited in how strict they can be. They are not specific like for example JSON Schema. It is for example not easy (or even possible?) using AVRO to limit an int type to a certain min and max value or to limit a text field to a regular expression. AVRO schema are mostly used to help encode JSON messages going over Kafka streams (mostly from Java) and to allow some minimal validation. Because AVRO is not that specific, it is relatively easy to generate random data which conforms to the schema. It is however not easy to only generate messages which make sense (notice the “age” field in my example). If this is a requirement, you might be better off using JSON Schema or Protobuf for JSON serialization/deserialization on Kafka since it allows for more specific validation and code generation. The Confluent platform supports all three options (here) and there are serializers/deserializers available for at least Java, .NET and Python (see here).

4 Comments

Ronan December 18, 2023

It does not include union data type indicators in the generated JSON data. For example, give this avaro schema:
{
“type”: “record”,
“name”: “myRecord”,
“namespace”: “com.myApps.myRecordData”,
“fields”: [
{
“name”: “eventId”,
“type”: [
“null”, “string”
],
“default”: null
}
]
}

The expected output JSON should include the union type as “string”:
{
“eventId”: {
“string” : “ID1254632”
}
}

But, the actual result is:
{
“eventId”: “ID1254632”
}

Nik March 31, 2022

Thanks it’s very useful. Is there any way it generated full json i.e. populate all fields in json?

Casel Chen January 28, 2022

How to generate avro schema from json document then?

- Maarten Smeets January 28, 2022
  
  Hi. You can check https://stackoverflow.com/questions/46556614/is-there-a-way-to-programmatically-convert-json-to-avro-schema for some suggestions on how to do this. The Apache AVRO library does not seem to provide this functionality. Hope this helps you. With kind regards, Maarten

Generate random JSON data from an AVRO schema using Java

Dependency

Schema

Java

Output

Finally

Like this:

About The Author

Maarten Smeets

4 Comments

Leave a ReplyCancel reply

Dependency

Schema

Java

Output

Finally

Share this:

Like this:

Related Posts

About The Author

Maarten Smeets

4 Comments

Leave a ReplyCancel reply