LOSS: A Simple Text Sanitization Tool

January 16, 2026

LOSS: A Simple Text Sanitization Tool

LOSS: A Practical Text Sanitization Tool

Text Processing CLI Tool Go
Posted by Debaditya Malakar
Project repository: https://github.com/uriel-flame-of-god/LOSS

LOSS is a command line tool written in Go that cleans and normalizes text. It is designed primarily for text produced by large language models, but it works just as well on ordinary prose.

Rather than rewriting content from scratch, LOSS focuses on removing stylistic artifacts that make text feel artificial or templated.

Contents

Why LOSS Exists
How LOSS Works
Command Line Usage
Determinism and Seeds
What LOSS Is Not

Why LOSS Exists

Modern language models tend to produce recognizable patterns. These include boilerplate phrases, overly balanced sentence rhythm, excessive polish, markdown formatting, and emojis. While none of these are inherently wrong, they often make text feel less human.

LOSS exists to remove those patterns while preserving meaning. It does not attempt to disguise authorship or guarantee detection avoidance. Its goal is normalization, not obfuscation.

LOSS intentionally allows small imperfections. Human writing is uneven, and perfect polish is often a red flag.

How LOSS Works

LOSS processes text through a sequence of simple, ordered stages. Each stage makes small changes and passes the result to the next stage. No single step performs aggressive rewriting.

Structural normalization and markdown removal
Punctuation normalization
Emoji removal
Sentence rhythm variation
Vocabulary flattening
Partial sentence rephrasing
LLM phrase suppression
Final cleanup and capitalization

The cumulative effect is text that feels less mechanical and more naturally written.

Command Line Usage

LOSS reads from standard input and writes to standard output. This makes it easy to integrate into scripts, pipelines, and build tools.

type input.txt | loss

Behavior can be tuned using flags. You can control how much vocabulary flattening or rephrasing occurs, or provide a seed for reproducible output.

type input.txt | loss --flatten-vocab=high --rephrase-ratio=0.3 --seed=42

Determinism and Seeds

LOSS uses randomness to avoid uniform output. This randomness affects punctuation choices, sentence rhythm, and rephrasing selection.

When a seed is provided, all randomness becomes deterministic. The same input will always produce the same output.

Use a fixed seed when debugging or when you need reproducible results in automated workflows.

Optional Local Model Support

By default, LOSS is fully rule based and does not require any language model.

Optionally, LOSS can call a local OpenAI compatible API to improve sentence rephrasing and vocabulary simplification. When enabled, all processing remains local to your environment.

What LOSS Is Not

LOSS is not a detection evasion tool
LOSS does not guarantee perfect grammar
LOSS does not perform full paraphrasing
LOSS does not alter meaning

LOSS is intentionally conservative. It cleans text, it does not reinvent it.

Conclusion

LOSS is a small, focused utility for people who want cleaner, more natural text without aggressive rewriting. It favors transparency, control, and simplicity over heavy automation.

If you want text that feels less templated and more human, LOSS is worth a look.

About the Project

LOSS is an open source project designed to be simple to understand and easy to reimplement.

Source code: https://github.com/uriel-flame-of-god/LOSS

Search This Blog

The Procrastination Chronicles

LOSS: A Simple Text Sanitization Tool

LOSS: A Practical Text Sanitization Tool

Why LOSS Exists

How LOSS Works

Command Line Usage

Determinism and Seeds

Optional Local Model Support

What LOSS Is Not

Conclusion

Comments

Post a Comment

Popular Posts

SympJS: When Your Math Library Has Better Type Safety Than Your Life Choices

Choosing Aurelia: Why I Spent 6+ Hours Wrestling When Everyone Said "Just Use React"