Recently I was designing an AVRO schema and wanted to test how data would look like which conformed to this schema. I developed some Java code to generate sample data. This of course also has uses in more elaborate tests which require generation of random events. Because AVRO is not that specific, this is mainly useful to get an idea of the structure of a JSON which conforms to the definition. Here I’ll describes a simple Java (17 but will also work on 11) based solution to do this.
Dependency
You only need a single dependency outside the regular JDK. Below this dependency as a Maven pom.xml snippet;
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.11.0</version>
</dependency>
Schema
In the resources folder of my project I’ve put a file file.avsc witch contained my AVRO schema;
{
"type" : "record",
"namespace" : "nl.amis.smeetsm.schema",
"name" : "Employee",
"fields" : [
{ "name" : "name" , "type" : "string" },
{ "name" : "age" , "type" : "int" }
]
}
To develop the schema I’ve used the Apache Avro IDL Schema Support plugin in IntelliJ IDEA. This makes it especially easy to find errors in the schema during development.
Java
My (minimal) Java class to read the schema and generate random JSON which conforms to the schema;
import org.apache.avro.Schema;
import org.apache.avro.util.RandomData;
import java.io.*;
import java.util.Iterator;
public class AvroTest {
public static void main(String [] args) throws IOException {
AvroTest me = new AvroTest();
ClassLoader classLoader = me.getClass().getClassLoader();
InputStream is = classLoader.getResourceAsStream("file.avsc");
Schema schema = new Schema.Parser().parse(is);
Iterator<Object> it = new RandomData(schema, 1).iterator();
System.out.println(it.next());
}
}
The code is self-explanatory. It is easy to generate more random data this way for use in tests.
Output
When I run my Java class, it will generate output like;
{"name": "cenmfi", "age": -746903563}
Finally
AVRO schema are limited in how strict they can be. They are not specific like for example JSON Schema. It is for example not easy (or even possible?) using AVRO to limit an int type to a certain min and max value or to limit a text field to a regular expression. AVRO schema are mostly used to help encode JSON messages going over Kafka streams (mostly from Java) and to allow some minimal validation. Because AVRO is not that specific, it is relatively easy to generate random data which conforms to the schema. It is however not easy to only generate messages which make sense (notice the “age” field in my example). If this is a requirement, you might be better off using JSON Schema or Protobuf for JSON serialization/deserialization on Kafka since it allows for more specific validation and code generation. The Confluent platform supports all three options (here) and there are serializers/deserializers available for at least Java, .NET and Python (see here).
It does not include union data type indicators in the generated JSON data. For example, give this avaro schema:
{
“type”: “record”,
“name”: “myRecord”,
“namespace”: “com.myApps.myRecordData”,
“fields”: [
{
“name”: “eventId”,
“type”: [
“null”, “string”
],
“default”: null
}
]
}
The expected output JSON should include the union type as “string”:
{
“eventId”: {
“string” : “ID1254632”
}
}
But, the actual result is:
{
“eventId”: “ID1254632”
}
Thanks it’s very useful. Is there any way it generated full json i.e. populate all fields in json?
How to generate avro schema from json document then?
Hi. You can check https://stackoverflow.com/questions/46556614/is-there-a-way-to-programmatically-convert-json-to-avro-schema for some suggestions on how to do this. The Apache AVRO library does not seem to provide this functionality. Hope this helps you. With kind regards, Maarten