In the ever-evolving world of software development, efficiency is key. As a developer / software architect, I’ve discovered that incorporating AI tools like ChatGPT into my workflow has been a game changer. From automating repetitive tasks to debugging complex codebases, ChatGPT has become an indispensable assistant, helping me write better code faster and solve challenges more effectively.
In this blog post, I’ll share how ChatGPT fits into my coding workflow, highlighting the strengths and limitations of the OpenAI models I’ve used, practical tips for crafting effective prompts, and strategies for handling large codebases. I’ll also touch on challenges like data privacy, model bias, and compliance concerns that developers should keep in mind when using these tools.
By the end of this post, you’ll understand how large language models like ChatGPT can enhance your coding process, whether you’re tackling simple scripts or navigating complex debugging scenarios.
Disclaimer: This blog reflects my experiences as of December 2024. AI evolves quickly, so your mileage may vary based on changes in models, subscriptions, or availability.
Choosing the Right Model
OpenAI offers different models via its ChatGPT webinterface. GPT 4o and o1 are the most popular ones at the moment.
For creating simple scripts, GPT 4o is often sufficient. It can generate scripts which (most of the time) work out of the box and might require some follow-up questions to tweak specific behavior. Some scripts I have created this way:
- Concatenate code into a single file to be able to easily supply it to ChatGPT. Especially when the concatenated code fits in the LLM context, this is very useful to ask questions over the entire code.
- Download Bitbucket repositories and repository code.
- Rename digital photo’s from different camera’s to a consistent fileformat based on the date for easy browsing. Resize images and add padding (here).
For more complex code GPT 4o however does not suffice. Its context is insufficient and its ability to deal with more complex problems is limited. Currently I use o1 when 4o does not suffice. It is also the most powerful model available in my ChatGPT Team account at the moment. I use o1 for use cases like;
- Analyse a complete codebase (when the codebase is small) or summaries of a complete codebase to generate diagrams or tests
- Write and edit applications which implement LLM interaction (for example Gradio applications or information gathering applications)
- Create code for using the SdWebUI and ComfyUI APIs for image generation using SDXL and Flux.
- Debug complex applications.
Recently OpenAI released o1-pro as part of a Pro subscription. Team currently has quota for o1 and o1-mini usage which the Pro subscription does not have. In addition, o1-pro is marginally better than o1 (based on OpenAI graphs). The Pro subscription costs more than Team but may be worth it for heavy users. I do not have personal experience with the Pro subscription.
Feature | GPT-4o | o1 | o1-pro |
Performance | Handles simple tasks effectively; may struggle with complex codebases. | Better at analyzing and generating complex code; improved reasoning capabilities. | Marginally better than o1; suitable for demanding applications. |
Context length | Supports up to 8,192 tokens (approximately 4,000-6,000 words). | Supports up to 128,000 tokens (approximately 96,000 words). | Similar to o1; supports up to 128,000 tokens. |
Limitations | May struggle with complex reasoning tasks. | Improved handling of complex concepts; still requires careful prompt structuring. | Similar to o1; marginal improvements. |
Subscription Cost | Included in Free and Plus ChatGPT subscriptions. | Available in Team subscriptions at $25 per user per month (annual plan). Usage quota. | Part of the Pro subscription at $200/month; no usage quotas. |
Ideal Use Cases | Automating simple tasks, generating small scripts. | Developing applications, debugging complex codebases. | Similar to o1. Likely marginally better. |
There is also the o1-mini, which I use only when the quota for o1 in my Team subscription runs out. While o1-mini is less capable than o1, it still performs better than 4o. In the Team subscription, o1-mini also has its own quota. Once that quota is exhausted, I either wait for it to become available again, switch to using 4o (when viable), Claude Sonnet which is also decent (and has free tier) or fall back to old-fashioned manual coding.
Combining o1 and 4o for up to date context
One limitation of o1 is its lack of internet access, meaning it cannot retrieve or process real-time information. In contrast, 4o has internet access, making it better suited for tasks requiring up-to-date information, especially when working with frameworks or tools introduced after o1’s training cutoff in October 2023.
To address this, you can combine the strengths of both models: use 4o to gather current feedback or documentation, then provide that information as context to o1. This approach ensures you benefit from o1’s superior reasoning while maintaining access to the latest information.
Crafting Effective Prompts
Here are some tips and examples to help you craft effective prompts for coding and related tasks:
- Be Clear and Specific
Clearly state what you want to achieve. Include specific requirements or constraints. Use direct instructions such as “Update,” “Fix,” or “Add.” - Provide Context
Explain the purpose or intended functionality of the code to ensure responses align with your goals. This is one of the most important pieces of advice. More and better context allows generation of more suitable responses. Directly supplying all relevant context works better than iterating with follow-up questions since that does not use the available context efficiently. - Iterate
If the initial response isn’t perfect, refine your prompt and ask for improvements. For example you can generate Mermaid diagram code in ChatGPT and refine the layout, colors etc by using natural language. Drawback in this is that ChatGPT at the moment cannot visualize Mermaid diagrams itself thus this needs to be done outside of the webinterface.
Some example prompts
- Update and Return Full Code
Update the following code to follow best practices, including logging, error handling, and documentation. Return the complete updated code. - Update Specific Parts Only
Add input validation to ensure the following function only accepts integers. Return only the updated function. - Debugging and Fix Errors
Fix the following code to preventZeroDivisionError
, while maintaining functionality. This is a dangerous one since functionality can get lost. You can aferwards validate this by providing the original code and ask it to check whether all functionality in the original code is still present in the new code (and if missing, add it) - Refactor and Enhance
Refactor this code to remove redundancy, improve readability, and add logging and comments. - Add New Functionality
Modify the code to save the output to a file namedoutput.txt
. Include file operation error handling. - Optimize and Explain
Optimize this code for performance and add comments to explain the improvements.
Another tip is to be careful with long conversations. ChatGPT 4o and o1 tend to use previous context from the chat which can be a good thing but can also make it to repeat past mistakes or use old code. If you want to be sure of a clean answer, start a new chat and provide as much as possible relevant context to make the answers better. It helps to paste in text directly as it seems (but I cannot be sure of course since ChatGPT remains a black box) that less information is used from attachments than from directly supplied text.
Managing Model Bias
I noticed that ChatGPT has certain prejudices which can interfere with coding. Bias can for example show itself in stereotypical roles of males/females. There are likely other biases with regards to topics like beliefs, cultures, skin colors. This is related to the training data used to create the model. In the below example I’m asking ChatGPT for an image of a typical woman in a realistic setting which the model considers to be in the kitchen.
When asking the same for a man, something different happens.
On the couch with a remote and a can of maybe beer on the table. The interpretation of the model of what a typical man in a typical man setting is and what a typical woman in a typical woman setting is clearly shows this bias.
How does this relate to coding? There are other model biases that can influence the generated code.This is not specific for ChatGPT but something to take into account when using models as coding assistant. When for example you generate code to implement fraud detection, there might be a form of ethnic profiling in it which is generally prohibited by the GDPR legislation. Another example of model bias, in this case supplier specific, is that you should be very careful when using ChatGPT to write a blog post about OpenAI products like ChatGPT. When writing code to use the Ollama API, do not be surprised if it mixes in the OpenAI API (which is very similar).
Model understanding
Less advanced models have less understanding of certain complicated concepts which might limit you in the use cases for which you can generate code. For example a concept less advanced models have trouble understanding is the concept of inception. “I want you to improve the following code and update the prompt used to analyse the code so that it will be able to analyse its own code” or something like that. See the sample code which caused issues in generation and editing here. When using GPT 4o to do things like these, it will have difficulty executing the instructions. o1 is better at it because of its reasoning capabilities. I noticed GPT 4o had difficulties editing code to analyse or edit code while o1 did much better. Image generation models have a similar challenge. Try to create an image using an image generation model of a painting of a partially finished painter who holds a brush and is working on his own unfinished painting. Likely you will not succeed. Below is what I did manage to create.
And how does this relate to coding? Well, if you supply a lot of context, o1 will be better able to understand the flow and structure of the code or abstract its purpose and coding standards from supplied examples and it will be better able to update complex code while often 4o will generate code which will be less usable.
Debugging tests
When debugging tests, ChatGPT can also help. I typically provide the context by including the code under test, the test code, and the error message. If the content is concise enough, I request the model to return the complete, updated code in a code block for straightforward copy-pasting. For larger inputs, I narrow the focus and ask for updates to specific methods or functions instead. When there is a large section in the code which will not change I ask the model not to explicitely mention this in the output and consider it to be there. If you do not ask it to consider it to be there, it will start refactoring code to work without the section.
Depending on the issue, I explicitly instruct the model to either fix the code under test or adjust the test itself. After applying the suggested fix, I start a new chat (to avoid the model using old and sometimes faulty content) and repeat the process, supplying the updated code, the test code, and the latest output. Iterating in this way allows me to progressively refine the code until I achieve working compiling code or tests.
Once the code compiles and the tests pass, I perform a final review to compare the updated code with the original version. This ensures that no functionality has been inadvertently removed or bypassed such as troublesome code or tests being removed instead of fixed. This iterative, focused approach helps me maintain confidence in the final solution while leveraging the efficiency of AI assistance.
Tips for Handling Large Codebases
In my experience, the model’s ability to handle code effectively depends heavily on context length and the size of the code you’re working with. GPT 4o can handle minor updates to shorter codebases—around 700 lines or fewer—without much trouble, but it struggles with larger or more complex inputs, which often results in incomplete outputs. By contrast, o1 is better equipped to handle up to 1,500 lines of code, making it more suitable for larger or more intricate tasks.
To maximize the model’s effectiveness, it’s crucial to keep the context small and focused. Smaller, well-structured code or using abstractions ensures the model has all the necessary details without overwhelming its context window. Readable, modular code not only improves output quality but also simplifies troubleshooting and updates. This applies to both small scripts and larger applications.
When working with larger codebases, it’s also important to manage how you interact with the model. For instance, when asking the model to update or create a script, requesting a complete return of the code often works fine for smaller projects. However, with larger bodies of code, the model may omit certain functions or simply comment that “the rest of the code remains unchanged.” If you’re not careful, these omissions can introduce bugs or lead to unexpected issues in your program.
To avoid this, you can ask the model to only return the specific functions or sections of code that need to be updated. This targeted approach minimizes the risk of overlooking edits and reduces the time spent manually copying and pasting minor changes. For larger projects, breaking the code into smaller chunks or modules makes this process even more efficient, enabling the model to focus on each part with greater precision.
Jurisdiction and Data Privacy Concerns
When using ChatGPT, understanding jurisdictional regulations is essential, especially for organizations in highly regulated industries. While paid subscriptions like OpenAI Team ensure your data isn’t used for training, this guarantee may not be sufficient for full compliance with standards like GDPR (EU) or ISO 27001.
Key Differences Between US and EU Privacy Laws
- EU: Enforces strict regulations on data storage, processing, and transfers. Personal data must remain within the EU or be transferred only to jurisdictions with compliant privacy standards (e.g., GDPR).
- US: Privacy laws are less restrictive and more fragmented, varying significantly between federal and state levels, leading to fewer uniform protections and looser controls on data transfers.
Why “No Training” Isn’t Always Enough
Compliance requires more than avoiding model training. Specific standards often mandate data residency, encryption, and security controls:
- ISO/IEC 27001: Requires certified security measures for data handling.
- SOC 2 Type II: Ensures data security, availability, and confidentiality—critical for finance and healthcare.
- GDPR & HIPAA: Strict mandates for sensitive personal data, such as health or financial records.
Example: A European financial institution handling customer data must ensure GDPR compliance. While OpenAI Team guarantees “no training,” it doesn’t explicitly promise data stays within the EU or adheres to standards like ISO 27001. This can result in non-compliance.
Finally
For smaller scripts and simpler tasks, GPT 4o often gets the job done quickly and effectively. Automating repetitive tasks with these models can save you significant time, especially now that creating scripts is faster than ever compared to doing the tasks manually. With powerful tools like ChatGPT, the balance has shifted—making it more worthwhile to invest time in automation, even for tasks you only do occasionally.
When you move beyond basic scripts and start working on larger projects or full-fledged applications, you’ll need more advanced models like o1. These models handle complex requirements better, but it’s important to structure your code thoughtfully to make the most of the limited context length. For codebases exceeding 1,500 lines, breaking your project into smaller, modular components can greatly improve both model efficiency and output quality.
While I’ve found ChatGPT to be the most effective tool for generating code, it’s worth exploring alternatives for specific use cases. For instance, Gemini excels at summarizing large codebases since it has a much larger input context than ChatGPT, though it’s less effective for generating longer scripts since the output context is limited (in the version I’ve used which is not from a paid subscription). Tools like GitHub Copilot can also complement ChatGPT, offering localized assistance for smaller tasks like code completion and commit messages. There are also other tools such as Cline which can help to automate the debugging workflow since using ChatGPT to manually copy/paste code, tests and error messages can become bothersome.
Using ChatGPT for coding has been a game changer for my productivity, and I encourage you to try it out for your own projects. Whether you’re automating routine tasks, building applications, or debugging complex code, using LLMs can make all the difference.
However, there’s a potential downside to keep in mind: over-relying on tools like ChatGPT can hinder your understanding of the underlying principles, especially for junior developers. If you let ChatGPT do all the heavy lifting without fully comprehending the generated code, you risk becoming disconnected from the critical thinking and problem-solving skills that are essential for growth as a developer. For beginners, it’s best to use ChatGPT as a learning aid rather than a crutch—ensuring that you take the time to understand, review, and apply the concepts independently.
Love your post Maarten. Good tips and considerations. One small addition on my side. I have had the best experience with Claude 3.5 Sonnet model, which is probably comparable with o1. Definitely better than 4o.